GroundTruth

A tree-sitter code knowledge graph that cuts LLM token usage by retrieving only the code that matters.

Share
GroundTruth

A tree-sitter code knowledge graph that cuts LLM token usage by retrieving only the code that matters.

Large language models waste tokens on code. To modify one function, you typically paste in whole files — sometimes the whole repo — so the model can figure out how everything connects. You pay for that context once to send it, and again while the model re-derives relationships it can't see.

GroundTruth takes a different approach. It parses your source into a knowledge graph of definitions and the relationships between them, then retrieves only the relevant slice for a given task: the target function, its callers and callees, and the types it depends on — packed under a token budget. The relationships are looked up, not re-inferred on every call.

How it works

GroundTruth runs as a four-stage pipeline. It parses each file with tree-sitter into nodes (functions, methods, classes) and edges (calls, inheritance, definitions). Those land in a SQLite store that re-indexes incrementally — only changed code is touched, thanks to per-node content hashing. A resolution pass turns bare call names into real graph edges. Finally, retrieval walks the dependency neighborhood of whatever you're working on, ranks it by relevance, and trims it to fit a token budget — degrading distant code to just its signature so the context pack stays compact.

The result is a deterministic, queryable map of your codebase that hands an LLM exactly what it needs and nothing it doesn't.

Built with

Python · tree-sitter · SQLite. MIT licensed and open to contributions.

from groundtruth import index_path, GraphStore, context_pack

result = index_path("your_package/")
store = GraphStore("graph.db")
store.upsert(result)
store.resolve_calls()

pack = context_pack(store, "your_package/mod.py::ClassName.method",
                    max_tokens=2200, hops=2)
print(pack["prompt"])   # feed this to your model
GitHub - vinodnarayanswamy/groundtruth: A tree-sitter code knowledge graph that retrieves minimal, relevant context for LLM code generation — cutting tokens by sending only a symbol’s ranked dependency neighborhood instead of whole files.
A tree-sitter code knowledge graph that retrieves minimal, relevant context for LLM code generation — cutting tokens by sending only a symbol's ranked dependency neighborhood instead of whole f…