Three layers make up any serious LLM application in 2026, and two of them have names that stuck. The model generates tokens. The harness wraps the model with tools, an orchestration loop, prompt assembly, retries, and state, turning a useful conversational responder into something that can take action against real systems. The third layer, the structured context the harness feeds the model on every call, has no agreed-upon name yet. Most teams treat it as glue: a vector store somebody stood up in 2024, a Slack export, a Notion page that was half-finished, a system prompt that grew by accretion through six product reviews.
Treating the third layer as glue has consequences. In March, a16z partners Jason Cui and Jennifer Li argued that "data and analytics agents are essentially useless without the right context." MIT NANDA's 2025 study put a number on the failure mode: 95% of enterprise AI pilots returned no measurable ROI. The post-mortems converged on the same diagnosis. The models were competent. The harnesses were competent. The data was there. What was missing was the layer that turned that data into something an agent could actually use, consistently, on the call.
That layer needs a name. The honest one is the context constructor.
Part One
Before defining the third, it helps to be precise about the first two.
A model is the thing that produces tokens. A frontier LLM by itself does plenty of useful work. Hundreds of millions of people use ChatGPT and Claude every day with no harness more elaborate than a chat box, and the answers are genuinely useful. The model is not the bottleneck. What a model on its own cannot do is take action against your systems, read from your files, execute tools, write back, or maintain state across turns. To make it do those things you wrap it in a runtime.
That runtime is the harness. Hashimoto coined the term in February 2026 with a pragmatic definition: anytime an agent makes a mistake, you engineer the environment so it does not make that mistake again. The mechanics are tool definitions, the system prompt, output parsing, error handling, retry policy, state, prompt caching, and the orchestration loop that ties them together. Anthropic calls a closely related practice context engineering, defined as the set of strategies for curating the tokens that enter the model on each step. The vocabulary is converging because the practice is converging.
Claude Code is a harness. Cursor is a harness. LangGraph is a harness. OpenAI's Agents SDK, Google's ADK, and Anthropic's own Agent SDK are harnesses. CrewAI, AutoGen, Letta, AWS AgentCore are harnesses. The differences matter at the margins. Picking one is no longer where you spend your engineering budget. The category has settled.
The smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome.
You have a model, wrapped by a harness. The harness manages tools, prompts, conversation state, and the execution loop. The obvious next question is what they read from on every call. That is the third layer. It does not have a settled name yet.
Part Two
The honest answer most teams give is that they wired it up. Somebody on the team picked a vector store, ran a folder of documents through an embedder, added a Slack connector, wrote a system prompt naming the tables that mattered, dropped in a glossary of business terms, and called it done. The harness pulls whatever cosine similarity returns. The agent is grounded in a soup of half-relevant chunks plus whatever the system prompt remembered to mention.
This works for the demo. In production it produces failure modes anyone who has shipped an agent will recognize. The agent uses the wrong definition of a business term because two departments define it differently and the embedded documents include both versions. It references stale schema because the embedder ran in March and nobody re-embedded after the migration in April. It misses entity resolution because "Acme Corp" in Salesforce is the same legal entity as "Acme, Inc." in Stripe, and cosine similarity has no way to know that. It hallucinates a product feature that was deprecated six months ago because the deprecation lives in a Confluence page that ranked below the original spec on the relevant query. None of these are model failures. All of them are context failures.
Gorgias's engineering team described this clearly in their write-up on building an internal data agent. The model and the harness were the easy parts. The hard part was giving the agent enough context to answer questions correctly and consistently across the company. They ended up building what they call a context layer, with structured metadata, when-to-use guidance, how-to-use examples, versioned alongside their dbt models.
OpenAI's in-house data agent team made the same observation and reached the same conclusion. Their solution is multiple layers of grounding: table usage metadata for schema understanding, human annotations from domain experts to explain what datasets mean, Codex-powered code enrichment that extracts semantic meaning from how the data is built, and a memory that captures corrections from users so the agent improves over time. They built a context constructor before the term existed.
You can run a small AI product without one. You cannot run a serious one without one. The only real question is whether you build it yourself or buy a system that does the job for you.
Part Three
A context constructor is a system that ingests heterogeneous sources, extracts the entities, relationships, definitions, and provenance worth keeping, resolves the same entity across sources so it appears once, maintains that artifact as the sources change, and exposes the result through an interface any harness can read from on every call.
The load-bearing word in that sentence is any. If the artifact is bolted tightly to one agent framework, what you have built is a feature inside that framework. A context constructor is interface-shaped. The harness should not need to know whether the artifact was built by cognee, by Atlan, by gbrain, or by a Python script some engineer wrote on a Sunday. The artifact is the contract.
The model is table stakes. The harness is converging. Context is the moat.
A few things this is decidedly not.
A vector store is not a context constructor. Pinecone, Weaviate, and Chroma are containers. They hold vectors. They do not, on their own, know that ARR and annual recurring revenue are the same metric, that the finance team's revenue definition diverges from the GTM team's, or that the Acme Corp account in Salesforce is the same legal entity as the Acme, Inc. deal mentioned in Slack. A constructor sits on top of a vector store and gives that store a schema.
A RAG pipeline is not a context constructor. RAG is a retrieval strategy. A constructor decides what goes in the corpus to begin with, how the pieces relate, and what metadata travels with each chunk. "RAG is dead" became a fashionable claim through 2025 once enough teams discovered that top-k cosine similarity was insufficient on its own. RAG is not dead. It is subsumed. Retrieval is one operation a constructor exposes, alongside lookup, graph traversal, structured query, and increasingly, prompt-cached blocks served by an MCP server.
A connector library is not a context constructor. Pulling rows out of Salesforce or messages out of Slack is the easy half of the problem. Making those sources speak a common semantic language, resolving entities across them, and capturing the implicit rules a domain expert would carry in their head is the hard half. Connectors feed the constructor. They do not replace it.
What a working constructor does, concretely:
- Ingests from arbitrary sources: SQL warehouses, email, chat, files, code, tickets, calendar events.
- Combines deterministic parsing with LLM-assisted extraction. Deterministic parsers handle the parts that are actually structured. The model handles the parts that look structured to a human but are not, like a Slack thread that decides a pricing policy.
- Resolves entities across sources so the same customer, person, or document appears once, with the right cross-references attached.
- Maintains the artifact as sources change. Drift is the constructor's problem, not the harness's. If the embedder was last run in March and the schema changed in April, the harness should not be the layer that notices.
- Exposes the result through a stable interface. In practice that means an MCP server, an HTTP API, and a queryable graph, with prompt-cache-friendly chunking on the read side.
The clearest reference implementation in the open right now is gbrain, released under MIT license in April by Y Combinator's Garry Tan. The architecture has three layers: a Brain Repo of Markdown files as the human-readable source of truth, a retrieval layer in Postgres with pgvector running hybrid search, and a skills layer that exposes operations to Claude Code via a small set of slash commands. Tan's daily setup runs on ten thousand markdown files, three thousand pages of people he tracks, 280 meeting transcripts, three hundred captured original ideas, thirteen years of calendar data, forty skills, and twenty-plus cron jobs maintaining freshness. It is what Andrej Karpathy's "Software 3.0" looks like when one person builds it for themselves. The category of products in the table below are the same pattern at company scale, with permission inheritance, multi-tenant isolation, and audit trails added.
The output is the point. A context constructor produces a thing the harness can read. That is the contract.
Part Four
The shape of this category is easiest to read by reference to what dbt did for analytics. The two situations rhyme closely enough that the analogy actually informs the strategy.
Before dbt, the analytics workflow inside most companies was a swamp. Analysts wrote SQL directly against raw warehouse tables. The same business metric was defined three different ways in three different dashboards. There were no tests, no version control, no documentation, no shared semantics. Smart analysts shipped reports that contradicted each other and nobody could tell which one was right.
dbt did not store the data. The data lived in Snowflake, BigQuery, Redshift. dbt did not visualize the data either. That was Looker, Tableau, Mode. What dbt did was sit between the warehouse and the BI tools and turn raw tables into a tested, modular, version-controlled, documented schema. It compiled SQL, ran tests, generated lineage, and handed the BI tools a clean, governed thing to read. The category dbt sits in is called the transformation layer of the modern data stack, and dbt is the product that defined it.
It worked. Forty thousand companies run dbt in production. The analytics engineer job title exists because of it. The category went from "a script Tristan Handy wrote" to a billion-dollar company in roughly seven years, and Coalesce, SQLMesh, and a half-dozen others built around the same idea.
Context constructors are the equivalent layer for AI applications, with two differences that matter.
The first is that the inputs are not just SQL. They include unstructured documents, conversations, semi-structured tickets, code, calendar events, and whatever else carries meaning inside a company. A constructor has to handle a much wider class of source material than dbt ever did. This is also why LLM-assisted extraction is necessary in a way it was not for dbt: deterministic parsers cannot make sense of a Slack thread the way a model can.
The second is that the consumer is not a BI tool but an agent that can act. The agent is faster than a human analyst and worse-calibrated. Bad analytics produces a wrong number on a dashboard that someone might catch. Bad context produces a wrong action that an agent takes against a real system before anyone notices. Same architectural shape, much higher operational stakes.
Part Five
Three forces converged in the last twelve months.
The model gap closed for the typical task. The frontier models cluster within a few points of each other on the benchmarks anyone cares about, and the gap shrinks on every release. Picking a model is rarely the bottleneck on a production task anymore. It is a decision about price, latency, integration surface, and which provider's safety policies are tolerable to your legal team.
The harness consolidated. OpenAI shipped its Agents SDK in early 2026. Google shipped ADK shortly after. Anthropic shipped its Agent SDK alongside Claude 4.6. LangGraph passed CrewAI in GitHub stars. Cursor and Claude Code defined the shape of an interactive coding harness. The vocabulary, agent loops, tool calling, structured outputs, memory, has stabilized across all of them. The marginal engineering hour spent picking a harness produces less than it used to.
Context emerged as the actual constraint. Anthropic positioned context engineering, not prompt engineering, as the next discipline, with a working definition that emphasizes finding the smallest set of high-signal tokens that produces the desired behavior. Chroma's "Context Rot" study evaluated 18 frontier models, including GPT-4.1, Claude 4, Gemini 2.5, and Qwen3, and showed that performance degrades as input length grows. More context is not better. Beyond a point, excess context degrades reasoning, which means curation matters more than volume. Datadog's 2026 State of AI Engineering report, drawn from real customer traces, found that 69% of input tokens are spent on system prompts repeated on every call, mostly ungoverned and unversioned, with only 28% of calls using prompt caching. The waste is enormous. The quality penalty is enormous.
Andrej Karpathy's framing makes the cost concrete. In his Software 3.0 talks, the LLM is the CPU. The context window is the RAM. Anything outside the window is on disk. The harness is the runtime, the thing that decides what to load into RAM before each step and what to evict. A context constructor is the thing that decides how the disk is organized in the first place: what gets paged in, with what schema, with what relationships preserved, with what provenance attached. Get that wrong and the CPU runs on garbage. Get it right and you get the behavior the benchmark numbers imply you should be getting.
At the March 2026 Gartner Data and Analytics Summit, Rita Sallam framed the moment in the way that actually moves boards: only about one in five enterprise AI investments shows measurable ROI, and the determinant is whether the company has treated context as critical infrastructure on par with cybersecurity. The model is table stakes. The context is the moat. The products in the table below are positioning themselves around exactly that thesis.