A whitepaper for AI engineers

What Is a Context Constructor?

The third layer in any production LLM application, the one between the model and the work. What the transformation layer was to analytics, the context layer is becoming to AI.

Published  May 2026 Reading time  14 minutes Format  Technical whitepaper

Three layers make up any serious LLM application in 2026, and two of them have names that stuck. The model generates tokens. The harness wraps the model with tools, an orchestration loop, prompt assembly, retries, and state, turning a useful conversational responder into something that can take action against real systems. The third layer, the structured context the harness feeds the model on every call, has no agreed-upon name yet. Most teams treat it as glue: a vector store somebody stood up in 2024, a Slack export, a Notion page that was half-finished, a system prompt that grew by accretion through six product reviews.

Treating the third layer as glue has consequences. In March, a16z partners Jason Cui and Jennifer Li argued that "data and analytics agents are essentially useless without the right context." MIT NANDA's 2025 study put a number on the failure mode: 95% of enterprise AI pilots returned no measurable ROI. The post-mortems converged on the same diagnosis. The models were competent. The harnesses were competent. The data was there. What was missing was the layer that turned that data into something an agent could actually use, consistently, on the call.

That layer needs a name. The honest one is the context constructor.

Part One

The two layers that already have names

Before defining the third, it helps to be precise about the first two.

A model is the thing that produces tokens. A frontier LLM by itself does plenty of useful work. Hundreds of millions of people use ChatGPT and Claude every day with no harness more elaborate than a chat box, and the answers are genuinely useful. The model is not the bottleneck. What a model on its own cannot do is take action against your systems, read from your files, execute tools, write back, or maintain state across turns. To make it do those things you wrap it in a runtime.

That runtime is the harness. Hashimoto coined the term in February 2026 with a pragmatic definition: anytime an agent makes a mistake, you engineer the environment so it does not make that mistake again. The mechanics are tool definitions, the system prompt, output parsing, error handling, retry policy, state, prompt caching, and the orchestration loop that ties them together. Anthropic calls a closely related practice context engineering, defined as the set of strategies for curating the tokens that enter the model on each step. The vocabulary is converging because the practice is converging.

Claude Code is a harness. Cursor is a harness. LangGraph is a harness. OpenAI's Agents SDK, Google's ADK, and Anthropic's own Agent SDK are harnesses. CrewAI, AutoGen, Letta, AWS AgentCore are harnesses. The differences matter at the margins. Picking one is no longer where you spend your engineering budget. The category has settled.

The smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome.
Anthropic, "Effective context engineering for AI agents," 2025. The working definition of the discipline.

You have a model, wrapped by a harness. The harness manages tools, prompts, conversation state, and the execution loop. The obvious next question is what they read from on every call. That is the third layer. It does not have a settled name yet.

What the harness composes on every call
System prompt version-controlled in code
Tool definitions declared in code
Conversation state managed by the harness
Output schema structured-output APIs
User message from the caller
Context no settled production layer
Model · generates tokens
Five of the six inputs have settled patterns and well-known tooling. The sixth is the subject of this paper.

Part Two

What teams actually do today

The honest answer most teams give is that they wired it up. Somebody on the team picked a vector store, ran a folder of documents through an embedder, added a Slack connector, wrote a system prompt naming the tables that mattered, dropped in a glossary of business terms, and called it done. The harness pulls whatever cosine similarity returns. The agent is grounded in a soup of half-relevant chunks plus whatever the system prompt remembered to mention.

This works for the demo. In production it produces failure modes anyone who has shipped an agent will recognize. The agent uses the wrong definition of a business term because two departments define it differently and the embedded documents include both versions. It references stale schema because the embedder ran in March and nobody re-embedded after the migration in April. It misses entity resolution because "Acme Corp" in Salesforce is the same legal entity as "Acme, Inc." in Stripe, and cosine similarity has no way to know that. It hallucinates a product feature that was deprecated six months ago because the deprecation lives in a Confluence page that ranked below the original spec on the relevant query. None of these are model failures. All of them are context failures.

Gorgias's engineering team described this clearly in their write-up on building an internal data agent. The model and the harness were the easy parts. The hard part was giving the agent enough context to answer questions correctly and consistently across the company. They ended up building what they call a context layer, with structured metadata, when-to-use guidance, how-to-use examples, versioned alongside their dbt models.

OpenAI's in-house data agent team made the same observation and reached the same conclusion. Their solution is multiple layers of grounding: table usage metadata for schema understanding, human annotations from domain experts to explain what datasets mean, Codex-powered code enrichment that extracts semantic meaning from how the data is built, and a memory that captures corrections from users so the agent improves over time. They built a context constructor before the term existed.

You can run a small AI product without one. You cannot run a serious one without one. The only real question is whether you build it yourself or buy a system that does the job for you.

Part Three

A working definition

A context constructor is a system that ingests heterogeneous sources, extracts the entities, relationships, definitions, and provenance worth keeping, resolves the same entity across sources so it appears once, maintains that artifact as the sources change, and exposes the result through an interface any harness can read from on every call.

The load-bearing word in that sentence is any. If the artifact is bolted tightly to one agent framework, what you have built is a feature inside that framework. A context constructor is interface-shaped. The harness should not need to know whether the artifact was built by cognee, by Atlan, by gbrain, or by a Python script some engineer wrote on a Sunday. The artifact is the contract.

The model is table stakes. The harness is converging. Context is the moat.

A few things this is decidedly not.

A vector store is not a context constructor. Pinecone, Weaviate, and Chroma are containers. They hold vectors. They do not, on their own, know that ARR and annual recurring revenue are the same metric, that the finance team's revenue definition diverges from the GTM team's, or that the Acme Corp account in Salesforce is the same legal entity as the Acme, Inc. deal mentioned in Slack. A constructor sits on top of a vector store and gives that store a schema.

A RAG pipeline is not a context constructor. RAG is a retrieval strategy. A constructor decides what goes in the corpus to begin with, how the pieces relate, and what metadata travels with each chunk. "RAG is dead" became a fashionable claim through 2025 once enough teams discovered that top-k cosine similarity was insufficient on its own. RAG is not dead. It is subsumed. Retrieval is one operation a constructor exposes, alongside lookup, graph traversal, structured query, and increasingly, prompt-cached blocks served by an MCP server.

A connector library is not a context constructor. Pulling rows out of Salesforce or messages out of Slack is the easy half of the problem. Making those sources speak a common semantic language, resolving entities across them, and capturing the implicit rules a domain expert would carry in their head is the hard half. Connectors feed the constructor. They do not replace it.

What a working constructor does, concretely:

The clearest reference implementation in the open right now is gbrain, released under MIT license in April by Y Combinator's Garry Tan. The architecture has three layers: a Brain Repo of Markdown files as the human-readable source of truth, a retrieval layer in Postgres with pgvector running hybrid search, and a skills layer that exposes operations to Claude Code via a small set of slash commands. Tan's daily setup runs on ten thousand markdown files, three thousand pages of people he tracks, 280 meeting transcripts, three hundred captured original ideas, thirteen years of calendar data, forty skills, and twenty-plus cron jobs maintaining freshness. It is what Andrej Karpathy's "Software 3.0" looks like when one person builds it for themselves. The category of products in the table below are the same pattern at company scale, with permission inheritance, multi-tenant isolation, and audit trails added.

The output is the point. A context constructor produces a thing the harness can read. That is the contract.

How the context layer gets built
Sources
SQL, files, chat, code, tickets, email, calendar.
raw, heterogeneous
Constructor
Ingests. Extracts entities. Resolves across sources. Maintains and versions.
builds the layer
Artifact
Knowledge graph, schema, MCP server, queryable API.
is the layer
Harness
Reads the artifact on every call. Composes it with the model.
consumer
§ § §

Part Four

The transformation-layer analogy, in full

The shape of this category is easiest to read by reference to what dbt did for analytics. The two situations rhyme closely enough that the analogy actually informs the strategy.

Before dbt, the analytics workflow inside most companies was a swamp. Analysts wrote SQL directly against raw warehouse tables. The same business metric was defined three different ways in three different dashboards. There were no tests, no version control, no documentation, no shared semantics. Smart analysts shipped reports that contradicted each other and nobody could tell which one was right.

dbt did not store the data. The data lived in Snowflake, BigQuery, Redshift. dbt did not visualize the data either. That was Looker, Tableau, Mode. What dbt did was sit between the warehouse and the BI tools and turn raw tables into a tested, modular, version-controlled, documented schema. It compiled SQL, ran tests, generated lineage, and handed the BI tools a clean, governed thing to read. The category dbt sits in is called the transformation layer of the modern data stack, and dbt is the product that defined it.

It worked. Forty thousand companies run dbt in production. The analytics engineer job title exists because of it. The category went from "a script Tristan Handy wrote" to a billion-dollar company in roughly seven years, and Coalesce, SQLMesh, and a half-dozen others built around the same idea.

Context constructors are the equivalent layer for AI applications, with two differences that matter.

The first is that the inputs are not just SQL. They include unstructured documents, conversations, semi-structured tickets, code, calendar events, and whatever else carries meaning inside a company. A constructor has to handle a much wider class of source material than dbt ever did. This is also why LLM-assisted extraction is necessary in a way it was not for dbt: deterministic parsers cannot make sense of a Slack thread the way a model can.

The second is that the consumer is not a BI tool but an agent that can act. The agent is faster than a human analyst and worse-calibrated. Bad analytics produces a wrong number on a dashboard that someone might catch. Bad context produces a wrong action that an agent takes against a real system before anyone notices. Same architectural shape, much higher operational stakes.

Part Five

Why now

Three forces converged in the last twelve months.

The model gap closed for the typical task. The frontier models cluster within a few points of each other on the benchmarks anyone cares about, and the gap shrinks on every release. Picking a model is rarely the bottleneck on a production task anymore. It is a decision about price, latency, integration surface, and which provider's safety policies are tolerable to your legal team.

The harness consolidated. OpenAI shipped its Agents SDK in early 2026. Google shipped ADK shortly after. Anthropic shipped its Agent SDK alongside Claude 4.6. LangGraph passed CrewAI in GitHub stars. Cursor and Claude Code defined the shape of an interactive coding harness. The vocabulary, agent loops, tool calling, structured outputs, memory, has stabilized across all of them. The marginal engineering hour spent picking a harness produces less than it used to.

Context emerged as the actual constraint. Anthropic positioned context engineering, not prompt engineering, as the next discipline, with a working definition that emphasizes finding the smallest set of high-signal tokens that produces the desired behavior. Chroma's "Context Rot" study evaluated 18 frontier models, including GPT-4.1, Claude 4, Gemini 2.5, and Qwen3, and showed that performance degrades as input length grows. More context is not better. Beyond a point, excess context degrades reasoning, which means curation matters more than volume. Datadog's 2026 State of AI Engineering report, drawn from real customer traces, found that 69% of input tokens are spent on system prompts repeated on every call, mostly ungoverned and unversioned, with only 28% of calls using prompt caching. The waste is enormous. The quality penalty is enormous.

Andrej Karpathy's framing makes the cost concrete. In his Software 3.0 talks, the LLM is the CPU. The context window is the RAM. Anything outside the window is on disk. The harness is the runtime, the thing that decides what to load into RAM before each step and what to evict. A context constructor is the thing that decides how the disk is organized in the first place: what gets paged in, with what schema, with what relationships preserved, with what provenance attached. Get that wrong and the CPU runs on garbage. Get it right and you get the behavior the benchmark numbers imply you should be getting.

At the March 2026 Gartner Data and Analytics Summit, Rita Sallam framed the moment in the way that actually moves boards: only about one in five enterprise AI investments shows measurable ROI, and the determinant is whether the company has treated context as critical infrastructure on par with cybersecurity. The model is table stakes. The context is the moat. The products in the table below are positioning themselves around exactly that thesis.

Part Six  ·  The players in the context constructor space
Pricing reflects publicly stated rates as of May 2026.
Product Constructor Harness What it is Best fit for Pricing
An Apache 2.0 npm package framed as the database your AI has been missing. A context and data layer that sits as an observation sidecar on top of an existing file system. Two-tier model: a local-first CLI and GUI, and a multi-node enterprise tier with SSO and audit. Mid-market tech companies (roughly 100 to 500 employees) where the CTO wants a structured context layer that developers can adopt bottom-up without a procurement cycle. Individual: free, open source.
Org: enterprise pricing, custom.
A secure enterprise agent platform built around permission inheritance, sensitive-data redaction, and a control plane for agents. Connects to GSuite, Salesforce, Slack, Confluence and similar systems, with SOC 2 Type 2 and on-prem deployment options. Founded by ex-Palantir engineers in 2022. Security-conscious enterprises with strict compliance requirements (HIPAA, SOC 2, GDPR) that need agents to inherit existing access controls from source systems. Custom enterprise.
Sales-led, no public tier.
An open-source memory engine that turns unstructured data into a persistent knowledge graph using an Extract, Cognify, Load pipeline. Connects to roughly 38 data sources. Used in production by Bayer and others. Berlin-based, recently closed a $7.5M seed. Engineering teams that want a graph-first, open-source constructor with cross-session persistence and the ability to be self-hosted. The closest pure-play reference implementation of the category. Free: open source, self-hosted.
Developer: $35 / month.
Team (cloud): $200 / month.
An enterprise AI platform that started as workplace search and now markets itself, explicitly, as a "system of context." 100+ native connectors, a dual-graph architecture (enterprise plus personal), hybrid search, and MCP-server interop with the major agent frameworks. Large enterprises (1,000+ employees) with sprawling SaaS footprints and executive sponsorship for an organization-wide context program. Bundles the connectors, graph, governance, and a UX in one platform. $40 to $50 per user per month base.
+$15 / user / month for AI suite.
Typical first-year TCO $300K to $1M+.
A dedicated memory layer for agents, focused on extraction from conversations and runtime context. Has the largest community of the agent-memory frameworks, with broad ecosystem integrations and managed cloud with compliance certifications. Independent developers and product teams building consumer-facing agents where personalization is the use case. Less suitable when the primary data source is organizational documents rather than user interactions. Free: 10K memories.
Pro: $19 / month for 50K.
Pro+: $249 / month, includes graph.
An open-source framework plus a commercial managed platform (LlamaCloud) for parsing, extracting, and indexing documents into retrievable form. The most mature ingestion and extraction pipeline in the open ecosystem, with 160+ data source connectors via LlamaHub. Teams that need industrial-grade document parsing and structured extraction as the foundation of their context layer, particularly for document-heavy domains (legal, financial, scientific). Framework: free, MIT licensed.
LlamaCloud: credit-based, $1.25 per 1,000 credits. Starter: $50 / month.
A metadata and active governance platform that has repositioned around the context layer for AI agents. Strong on business glossary, lineage, certification workflows, and the human-in-the-loop side of context. Internal benchmarks show 38% accuracy lift on SQL generation when context is layered in. Companies that already run dbt and a modern data stack and want to extend their existing semantic and metadata investments into an agent-ready context layer. Custom enterprise.
Sales-led, tiered by data volume.

A few notes on the table that do not fit cleanly into cells.

Lattice is the most opinionated implementation of the trojan-horse strategy: a free local tier that developers adopt for personal projects, with a paid organizational tier that becomes the obvious answer when the company eventually needs SSO, audit, and multi-node deployment. Credal sits at the opposite end of the distribution, sold top-down to security-conscious enterprises, with permission inheritance and PII redaction as the headline features. cognee is the closest pure-play memory engine in the open source world, built by researchers drawing on cognitive science and knowledge engineering, and is currently the reference implementation of the graph-based constructor pattern.

Glean is the interesting case because it openly repositioned. The phrase "system of context" is in their navigation. The acquisition cost is high, a six-figure floor in practice, and the deployment is heavyweight, but the bundle includes the connectors, the graph, the policies, and a polished assistant UX in one purchase order. mem0 is the developer-facing memory layer that most independent agent builders default to, with a free tier generous enough for a real prototype and a Pro+ tier that adds graph features. LlamaIndex and its commercial sibling LlamaCloud are the most mature ingestion and extraction pipeline in the open ecosystem, though LlamaCloud stops short of full constructor functionality on its own and is usually paired with another product upstream of the harness. Atlan is the only player in the table that came at this from the metadata and governance direction, which gives it a different shape from the others, and makes it the natural extension if your team already lives in dbt.

Part Seven

A short checklist for evaluating one

If you are about to commit three months and a real budget to this, here is the bar.

Most products in the market today fail at least two of these. Pick accordingly.

Part Eight

Where this goes

The pattern is familiar. A layer of infrastructure gets identified. A dozen companies fund themselves into the gap. Two or three consolidate the category. The rest become features inside larger platforms. The transformation layer took about six years to run that arc. Context constructors are roughly eighteen months in.

If you are an AI engineer, the practical implication is straightforward. The model you pick will be a commodity inside a year, and on most production tasks you will not feel the difference between this year's frontier model and next year's. The harness you pick will be one of three or four, and increasingly the choice is determined by which provider you are already buying inference from. The context constructor you pick, or build, will be the thing that decides whether the agents you ship inside your specific business actually work. It is the layer where domain knowledge gets encoded, where the agent meets the firm, where the work that distinguishes one company from another lives.

A model that does not know what your company means by "active customer" is a fast hallucination machine. A harness with no context is a tool with nothing to act on. The constructor is what makes the stack legible to itself.

That is the third layer. The category has been forming for eighteen months. Now it has a name.