6.25M
Total tokens in codebase
↓
~15K
After semantic search (top chunks)
↓
~2K
After graph traversal (traced functions)
↓
<400
Final context sent to LLM
THE 4-LAYER FUNNEL
1
LAYER 1
Every function, class, and file gets a pre-computed plain-English summary stored in the DB.
These summaries are tiny (2–4 sentences each) but capture the intent — not the implementation.
Built in a pyramid: function summaries → file summaries → module summaries → system overview.
When a query comes in, the agent reads summaries first — never raw code. This lets it navigate 500k lines with the same ease as reading a table of contents.
When a query comes in, the agent reads summaries first — never raw code. This lets it navigate 500k lines with the same ease as reading a table of contents.
SUMMARY PYRAMID
🏛 System: "A real-time 3D renderer with shader pipeline, framebuffer management, and asset loader"📁 Module: "renderer/ handles framebuffer lifecycle, shader compilation and draw calls"
📄 File: "renderer.go — manages framebuffer creation, binding, and resize events"
⚡ Fn: "setFramebuffer(id int) — binds a framebuffer by ID, validates bounds, emits resize event"
2
LAYER 2
RAG finds similar code. But bugs don't care about similarity — they care about relationships.
Function A breaks because Function B changed, which is called by File C, which imports from Package D.
Tree-sitter parses every file and builds a graph: every function, class, and variable is a node. Every call, import, and inheritance is an edge. Stored in SQLite as an adjacency list. The agent traverses this graph to find what's truly connected to the issue — not just what sounds similar.
Tree-sitter parses every file and builds a graph: every function, class, and variable is a node. Every call, import, and inheritance is an edge. Stored in SQLite as an adjacency list. The agent traverses this graph to find what's truly connected to the issue — not just what sounds similar.
GRAPH TRAVERSAL EXAMPLE
Issue: "target is wrong"→ Semantic hit: setFramebuffer() in renderer.go
→ Graph: setFramebuffer() ← called by drawFrame() ← called by RenderLoop.tick()
→ Graph: setFramebuffer() → writes to FramebufferRegistry (shared state!)
→ Agent knows: changing setFramebuffer affects ALL callers + the registry
3
LAYER 3
Every function is chunked and embedded using nomic-embed-text (runs locally, ~270MB).
Chunks are stored in ChromaDB with metadata: file path, function name, language, last modified.
RAG is Layer 3 — not Layer 1 — because alone it's not enough. But combined with the graph and summaries, it becomes extremely precise. It finds semantically related code that the graph might miss (e.g. a comment in a different file that explains why the framebuffer ID scheme was designed this way).
RAG is Layer 3 — not Layer 1 — because alone it's not enough. But combined with the graph and summaries, it becomes extremely precise. It finds semantically related code that the graph might miss (e.g. a comment in a different file that explains why the framebuffer ID scheme was designed this way).
WHAT RAG FINDS THAT GRAPH MISSES
Query embedding: "framebuffer target binding wrong"→ Finds: a comment in ARCHITECTURE.md explaining the framebuffer ID convention
→ Finds: a similar bug fix from 6 months ago in a different file
→ Finds: the test that was written for this exact scenario
4
LAYER 4
Layers 1–3 might return more context than the model can handle. Layer 4 is the gatekeeper.
It has a fixed token budget (e.g. 3,000 tokens for code context) and fills it intelligently —
highest-relevance chunks first, then graph-connected context, then summaries for anything that didn't fit.
If something important is too large to fit fully, it substitutes its pre-computed summary instead. The LLM always gets a complete, coherent picture — even if parts of it are compressed summaries. This is the key insight: summaries are lossless for reasoning, lossy only for exact syntax.
If something important is too large to fit fully, it substitutes its pre-computed summary instead. The LLM always gets a complete, coherent picture — even if parts of it are compressed summaries. This is the key insight: summaries are lossless for reasoning, lossy only for exact syntax.
BUDGET ALLOCATION EXAMPLE (3000 token budget)
✓ setFramebuffer() full code — 180 tokens (high relevance, fits)✓ drawFrame() full code — 240 tokens (caller, fits)
✓ FramebufferRegistry summary — 60 tokens (too large to fit fully, summary used)
✓ Related bug fix context from RAG — 150 tokens
✓ Web docs snippet — 200 tokens
✓ Issue + plan — 400 tokens
Total: 1,230 tokens. Well within budget. LLM has everything it needs.