What Is a Context Constructor?

Every serious LLM application has three layers. The model generates tokens. The harness wraps it with tools, state, and an orchestration loop. The context layer is the structured context the harness reads on every call. Two of the three have settled builder categories: foundation labs ship models, agent SDKs ship harnesses. The third does not. Most teams still build their context layer by hand.

The cost is visible in the data. a16z partners Jason Cui and Jennifer Li argued in March that "data and analytics agents are essentially useless without the right context." MIT NANDA's 2025 study put a number on it: 95% of enterprise AI pilots returned no measurable ROI. The post-mortems converged on the same diagnosis. The models, harnesses, and data were fine. The context layer wiring them together was not.

Part One

Two of the three layers have builders

A model produces tokens. A frontier LLM in a chat box does plenty of useful work; hundreds of millions of people use ChatGPT and Claude every day. What it cannot do alone is take action, read your files, execute tools, or hold state across turns. The category that builds models is mature: foundation labs sell them, or you run open weights.

A harness is the runtime that wraps the model and turns it into an agent. Hashimoto coined the term in February 2026: every time an agent makes a mistake, you engineer the environment so it does not make that mistake again. The mechanics are tools, the system prompt, output parsing, retries, state, prompt caching, and the orchestration loop. Anthropic calls a closely related practice context engineering. The category is the agent SDK: Claude Code, Cursor, LangGraph, OpenAI's Agents SDK, Google's ADK, Anthropic's Agent SDK, CrewAI, AutoGen, Letta, AWS AgentCore. Picking one is no longer where you spend your engineering budget.

The smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome.
Anthropic, "Effective context engineering for AI agents," 2025. The working definition of the discipline.

The context layer is the third: the structured context the harness reads on every call. Resolved entities, business definitions, lineage, metadata, provenance, the rules a domain expert keeps in their head. It exists in every production AI application. What does not exist around it is a settled category of tools that builds it. Most teams still hand-write the layer the way analysts hand-wrote SQL before dbt. That gap is the subject of this paper.

What the harness composes on every call

System prompt → version-controlled in code

Tool definitions → declared in code

Conversation state → managed by the harness

Output schema → structured-output APIs

User message → from the caller

Context layer → hand-built today; no settled tool category

↓

Model · generates tokens

Five of the six inputs have settled builder categories. The sixth, the context layer, does not. The AI-powered tools emerging to build it are what this paper calls context constructors.

Part Two

Twenty years of trying

This is not a new problem. Semantic layers tried to solve it in the 2000s by codifying metrics in MicroStrategy and Looker. Master data management tried it through the 2010s with canonical entities for customer, product, and employee. Knowledge graphs and ontology platforms tried it by modeling relationships. Data catalogs tried it by indexing metadata and curating glossaries. Enterprise search tried it by indexing every document. Memory systems tried it at the conversation level. Every category got real adoption. None produced a usable enterprise context layer at scale.

The failure mode was identical: human curation does not scale to a modern data estate. A semantic layer with two hundred metrics is maintainable; one that covers the analytical surface of a 5,000-person company is not. A knowledge graph that represents the org chart can be hand-curated; one that represents every customer, entitlement, and decision cannot. Catalog completeness above 30% is rare. Curation costs cross the data-growth curve early and stay there.

That is why most teams today still wire context together by hand. Somebody picks a vector store, runs documents through an embedder, adds a Slack connector, names the important tables in a system prompt, drops in a glossary, and calls it done. That is a context layer. It is built by humans in code and YAML and Notion pages, and the harness reads it on every call. It exists in every shipped AI product. What does not exist around it is a settled category of tools that builds it for you.

This works for the demo. In production it produces failures anyone who has shipped an agent recognizes. The agent uses the wrong definition of a business term because two departments define it differently and both versions are embedded. It references stale schema because the embedder ran in March and nobody re-embedded after April's migration. It misses entity resolution because "Acme Corp" in Salesforce is the same customer as "Acme, Inc." in Stripe and cosine similarity has no way to know. It hallucinates a deprecated product feature because the deprecation lives in a Confluence page that ranked below the original spec. None of these are model failures. All of them are context failures.

Gorgias's engineering team described the modern version clearly. The model and the harness were the easy parts. The hard part was giving the agent enough context to answer questions correctly across the company. They ended up building a context layer with structured metadata, when-to-use guidance, and examples, versioned alongside their dbt models.

OpenAI reached the same conclusion. Their published architecture has multiple layers of grounding: table-usage metadata, human annotations from domain experts, Codex-powered code enrichment, and a memory that captures user corrections. They built a context constructor before the term existed, because none of the off-the-shelf categories could deliver the layer they needed.

You can run a small AI product on a hand-built context layer. You cannot run a serious one. The question is whether to keep hand-building, in defiance of twenty years of evidence, or adopt the category that finally has the unit economics.

Part Three

A working definition, and a defense

A context constructor is an AI-powered system that builds and maintains a context layer. It ingests heterogeneous sources, extracts entities and relationships, resolves the same entity across systems, maintains the layer as sources change, surfaces conflicts for human review, and exposes the result through an interface any harness can read on every call. The constructor is the new category. The context layer is what it produces. Keeping the two straight is the point.

The load-bearing word is any. If the layer is bound to one agent framework, what the constructor produced is a feature inside that framework, not a layer. A real constructor is interface-shaped on the output side. The harness should not care whether the layer was built by Snowflake's Cortex stack, by Palantir's Ontology, by DataHub, or by a Python script. The layer is the contract.

One word deserves a defense. Constructor sounds build-centric, the way a constructor in object-oriented code runs once. That is not what the term means here. A context constructor is a continuous operational system. It runs daily ingest passes, refreshes definitions on every code commit, surfaces conflicts when sources disagree, and ships human corrections back into the layer. "Construct" is shorthand for the ongoing building and maintenance, not a bootstrap phase. Some analysts use "enterprise context platform" as the broader umbrella; context constructor is the specific AI-powered builder role within it.

The category deserves a longer defense, because the obvious objection is that all of this is just semantic layers and catalogs with an AI wrapper on top.

It is not. The older categories share components with a context constructor: semantics, ontology, lineage, metadata, retrieval. They are not the same category because they failed at the job. Atlan, Collibra, DataHub, Palantir, and Stardog spent the last decade proving that human curation does not scale. What an AI-powered constructor adds is the unit economics those categories never had. It can ingest a Slack thread and pull the pricing policy embedded in the eighth message. It can propose entity resolution across Salesforce, Stripe, and the warehouse at a scale no stewardship team could staff. It can refresh a metric definition the moment a dbt model rebuilds. It can ship corrections from human review faster than a steward could file a ticket. None of that was operationally possible before 2024. The convergence the older vendors are now executing is the evidence the category is real, not the counter-argument. Collibra ships semantic agents. Alation refounded around agents. Databricks shipped Genie Code in March. dbt exposes its semantic layer through an MCP server. They did not wake up in 2026 and decide to AI-wash; they moved because the unit economics of building on their existing primitives finally collapsed.

The model is table stakes. The harness is converging. Context is the moat.

A few things this is decidedly not.

A vector store is not a context constructor. Pinecone, Weaviate, and Chroma are containers. They hold vectors. They do not know that ARR and annual recurring revenue are the same metric, that finance's revenue definition diverges from GTM's, or that "Acme Corp" in Salesforce is the same legal entity as "Acme, Inc." in Slack. A constructor sits on top of a vector store and gives it a schema.

A RAG pipeline is not a context constructor. RAG is a retrieval strategy. A constructor decides what goes in the corpus to begin with, how the pieces relate, and what metadata travels with each chunk. "RAG is dead" became a fashionable claim through 2025 once teams discovered that top-k cosine similarity was insufficient on its own. RAG is not dead; it is subsumed. Retrieval is one operation a constructor exposes, alongside lookup, graph traversal, structured query, and prompt-cached blocks served by an MCP server.

A connector library is not a context constructor. Pulling rows out of Salesforce or messages out of Slack is the easy half. Making those sources speak a common semantic language, resolving entities across them, and capturing the implicit rules a domain expert keeps in their head is the hard half. Connectors feed the constructor; they do not replace it.

The output is the point. A context constructor produces a thing the harness can read. That is the contract. The diagram below shows it. Part Four describes what the layer has to do once it is built.

How a context layer gets built

Sources

SQL, files, chat, code, tickets, email, calendar.

raw, heterogeneous

→

Context constructor

AI-powered: ingests, extracts, resolves entities, maintains, versions.

builds the layer

→

Context layer

Knowledge graph, schema, MCP server, queryable API.

what the harness reads

→

Harness

Reads the layer on every call. Composes it with the model.

consumer

§ § §

Part Four

The four jobs of a context layer

The layer the constructor produces has to do four jobs, not one. A product that does one job well is a useful component; a product that does all four is a context constructor. Most vendors today do two.

Semantic resolution. What "active customer" means in finance is not what it means in growth. What "MRR" means at the start of the quarter is not what it means at the end. A real layer holds governed definitions, metric formulas, and glossary terms, and the agent reads them instead of guessing from column names. This is dbt's territory with the Semantic Layer and MCP server, Collibra's with its semantic agents, and Atlan's with governed business meaning.

Identity and relationship modeling. The "Acme Corp" account in Salesforce, the "Acme, Inc." customer in Stripe, the contract in Notion, and the ticket in Zendesk are the same entity. A real layer resolves them once and exposes the relationships. Palantir's Ontology has done this for a decade as an operational layer, with actions and decision lineage bound to canonical objects. Stardog centers the enterprise knowledge graph as the substrate agents need. Snowflake reports that adding even a plain-text data ontology lifted final answer accuracy by 20% in their internal tests. Without identity resolution, agents hallucinate relationships the source systems already know.

Permissions, policy, and provenance. A layer is only safe if the agent sees what the user is allowed to see, computed at query time, and only useful if every claim it surfaces traces back to a source. Credal built its product around permission inheritance from source systems. Databricks Unity Catalog is the governance layer Genie reads through. Microsoft Graph grounds Copilot in user-scoped data and refuses access the user does not have, returning traceable sources. Skip this job and your context layer is a leak surface.

Freshness and change management. Definitions, schemas, ownership, and policies change every day. A layer that does not change with them is wrong by Tuesday. DataHub frames this as living context: event-based sync from a hundred-plus systems so the graph reflects operational reality, not stale documentation. OpenAI runs a daily offline refresh plus runtime retrieval. Collibra's semantic agents keep the layer alive as circumstances change. A constructor that builds once and never updates is not a constructor; it is a snapshot.

A useful test is to walk any vendor pitch through these four jobs. Most are excellent at one, plausible at a second, aspirational about the rest. gbrain, Garry Tan's MIT-licensed personal stack, is interesting because at the scale of one person's life it visibly does all four: Markdown files for semantics, Postgres + pgvector retrieval that resolves people and projects across 13 years of calendar data and 3,000 contact pages, per-repo trust policies for access, and 20+ cron jobs for freshness. The vendors in Part Seven do the same at company scale, with permission inheritance, multi-tenant isolation, and audit trails added.

§ § §

Part Five

The transformation-layer analogy, in full

The shape of this category is easiest to read by reference to what dbt did for analytics. The two situations rhyme.

Before dbt, analytics inside most companies was a swamp. Analysts wrote SQL directly against raw warehouse tables. The same metric was defined three ways in three dashboards. No tests, no version control, no shared semantics. Reports contradicted each other and nobody could tell which one was right.

dbt did not store the data (Snowflake, BigQuery, Redshift did) or visualize it (Looker, Tableau, Mode did). What dbt did was sit between the warehouse and the BI tools and turn raw tables into a tested, modular, version-controlled, documented schema. It compiled SQL, ran tests, generated lineage, and handed the BI tools a clean, governed thing to read. The category is called the transformation layer of the modern data stack, and dbt defined it.

It worked. Forty thousand companies run dbt in production. The analytics-engineer job title exists because of it. dbt went from "a script Tristan Handy wrote" to a billion-dollar company in seven years, with Coalesce, SQLMesh, and others built around the same idea.

The analogy maps cleanly. The transformation layer is to analytics what the context layer is to AI. In both cases the layer existed before any tool that managed it. The transformation layer existed in stored procedures and views; the context layer exists today in glossaries, Notion pages, and bespoke ETL. dbt turned the transformation layer from a hand-built mess into a managed system. Context constructors are doing the same for the context layer.

The shape rhymes, with two differences that matter.

The first is that the inputs are not just SQL. They include documents, conversations, tickets, code, calendar events, and whatever else carries meaning inside a company. A constructor has to handle a much wider class of source material than dbt ever did. This is also why LLM-assisted extraction is necessary in a way it was not for dbt: a deterministic parser cannot make sense of a Slack thread the way a model can.

The second is that the consumer is not a BI tool but an agent that can act. The agent is faster than a human analyst and worse-calibrated. Bad analytics produces a wrong number on a dashboard that someone might catch. Bad context produces a wrong action against a real system before anyone notices. Same shape, much higher stakes.

Part Six

Why now

Three forces converged in the last twelve months.

Models got good enough to expose the context bottleneck. The frontier did not stop moving. Stanford's 2026 AI Index reports agent task success on OSWorld jumped from 12% to roughly 66% in a year, and SWE-bench Verified from 60% to nearly 100%. The same index describes the "jagged frontier": the model that wins an olympiad reads an analog clock correctly half the time, and agents still fail one task in three on real computer-use benchmarks. The defensible read is not that capability is solved. It is that capability has crossed a threshold where the next biggest gain on a production agent comes from fixing what the model reads, not from upgrading the model. Picking a model is now a decision about price, latency, integration surface, and which provider's safety policies your legal team will tolerate.

The harness consolidated. OpenAI shipped Agents SDK in early 2026. Google shipped ADK. Anthropic shipped Agent SDK with Claude 4.6. LangGraph passed CrewAI in GitHub stars. Cursor and Claude Code defined the shape of an interactive coding harness. The vocabulary stabilized across all of them. The marginal engineering hour spent picking a harness produces less than it used to.

Context emerged as the actual constraint. Anthropic positioned context engineering, not prompt engineering, as the next discipline, defined as finding the smallest set of high-signal tokens that produces the desired behavior. Chroma's "Context Rot" study showed performance degrading as input length grows across 18 frontier models. Stanford's "Lost in the Middle" showed long-context models use information unevenly, with retrieval failing when relevant facts sit in the middle. More context is not better; beyond a point, excess context degrades reasoning. Datadog's 2026 State of AI Engineering report, from real customer traces, found 69% of input tokens are spent on system prompts repeated on every call, with only 28% of calls using prompt caching. The waste and the quality penalty are both enormous.

Karpathy's Software 3.0 framing makes the cost concrete. The LLM is the CPU. The context window is the RAM. Anything outside the window is on disk. The harness is the runtime, deciding what to load into RAM before each step. A context constructor decides how the disk is organized in the first place: what gets paged in, with what schema, what relationships, what provenance. Get that wrong and the CPU runs on garbage.

A common mistake: MCP solves none of this. The Model Context Protocol is a transport: the wire connecting LLM applications to external data and tools. It does not produce semantic meaning, resolve identities, enforce permissions, or maintain freshness. "MCP-enabled" is not "context-ready." Shipping an MCP server is useful plumbing; whether the vendor has a context constructor depends on whether the data flowing through that pipe is structured, governed, and current. dbt's MCP server is interesting precisely because it sits on top of dbt's existing semantic layer, lineage, and tests.

At the March 2026 Gartner Data and Analytics Summit, Rita Sallam framed it in the way that moves boards: about one in five enterprise AI investments shows measurable ROI, and the determinant is whether context has been treated as critical infrastructure on par with cybersecurity. The model is table stakes. The context is the moat. The vendors in the table below are positioning around exactly that thesis, from six different starting points.

Product	What it is	Best fit for	Pricing
Enterprise context & work-AI platforms unified search and an assistant over the org's apps
Glean glean.com	Started as workplace search; now markets itself as a "system of context." 100+ native connectors, a dual-graph architecture (enterprise + personal), hybrid search, and MCP-server interop with the major agent frameworks.	Large enterprises (1,000+ employees) with sprawling SaaS footprints. Bundles the connectors, graph, governance, and a polished assistant UX in one platform.	$40–50 / user / month base. +$15 / user / month for AI suite. First-year TCO $300K–$1M+.
M365 Copilot microsoft.com	Copilot grounded in Microsoft Graph: user-scoped enterprise data from Outlook, Teams, SharePoint, OneDrive, and connected systems. Permission inheritance computed at query time. Traceable sources in every response.	Microsoft 365 shops that want the context layer wired through the systems employees already use, with existing identity and compliance boundaries.	$30 / user / month for Copilot. M365 E3 or E5 prerequisite.
Gemini Enterprise cloud.google.com	Agent platform with a Knowledge Catalog that builds a dynamic context graph across the business, plus Memory Bank for long-running agents and Agentspace as the search and answer surface for workspace data.	Google Workspace and Google Cloud customers who want Gemini wired through their full data estate with autonomous memory and cross-app actions.	Sales-led for Enterprise. Workspace add-ons via marketplace.
Metadata & governance, pivoted toward AI catalog and lineage heritage, now agent-facing
Atlan atlan.com	Metadata and active-governance platform repositioned around the context layer for AI agents. Strong on business glossary, lineage, certification workflows, and human-in-the-loop curation. Internal benchmarks report a 38% accuracy lift on SQL generation when context is layered in.	Companies already running dbt and a modern data stack that want to extend their semantic and metadata investments into an agent-ready layer.	Custom enterprise. Tiered by data volume.
DataHub datahub.com	Enterprise context platform with an event-based architecture that continuously syncs metadata across 100+ data systems and exposes a governed context graph through MCP servers and semantic search APIs. Made the "living context" framing central to its 2026 positioning.	Teams that need freshness as a first-class property: the context graph reflecting operational reality, not last quarter's documentation.	Open-source core. DataHub Cloud sales-led.
Collibra collibra.com	Long-running data-governance platform that shipped semantic agents in 2026: a Semantic Model Generation Agent that drafts the layer from glossaries and metric catalogs, and a Semantic Mapping Agent that proposes links between physical and conceptual assets for steward review.	Enterprises with established Collibra catalogs that want to convert decade-old metadata into an agent-ready semantic layer without rebuilding from scratch.	Custom enterprise. Existing-customer expansion play.
Ontology & knowledge-graph platforms identity and relationships as first-class infrastructure
Palantir palantir.com	Foundry + AIP, anchored by the Palantir Ontology: an operational layer that binds digital assets to real-world objects, with actions, decision lineage, and dynamic security as first-class concerns. AIP Logic and Chatbot Studio sit on top as the agent runtime.	Defense, industrials, healthcare, and other large enterprises where the context layer has to drive governed actions and decisions, not just answer questions.	Sales-led, multi-year. Top-of-market pricing.
Stardog stardog.com	Knowledge-graph-powered semantic layer. Federates across cloud, on-prem, and legacy sources without copying data, and exposes the graph to agents through APIs and MCP. Voicebox is a graph-grounded assistant; the platform is mostly evaluated as constructor infrastructure.	Regulated industries where data cannot be centralized and where graph-based identity resolution is the bottleneck for AI grounding.	Enterprise license. AWS Marketplace available.
Platform-native data context warehouse and transformation vendors shipping their own constructors
Snowflake snowflake.com	Cortex Agents and Cortex Analyst, anchored by Semantic Models, and a five-layer "agent context layer" framework introduced in March 2026 that unifies lineage, governance, semantics, and organizational knowledge. The stack reports a 20% accuracy lift when a plain-text data ontology is added.	Snowflake-native data stacks that want the constructor work to live where the data already lives, with governance and identity inherited from the warehouse.	Consumption-based on Cortex credits. Enterprise contracts on top.
Databricks databricks.com	Genie + Unity Catalog. Genie Code (March 2026) is an autonomous data-engineering agent that turns ideas into production pipelines; Unity Catalog is the governance plane that grounds it. Databricks reports Genie Code more than doubled coding-agent success rates on their internal benchmark.	Lakehouse customers who want context construction, transformation, and an agent runtime in one stack under a single governance model.	DBU consumption. Unity Catalog included; agents metered.
dbt getdbt.com	Semantic Layer + MCP server + agent skills. Exposes governed metrics, lineage, documentation, and discovery to any harness through MCP, with the same SQL compilation and testing pipeline that already runs the transformation layer. The cleanest example of an older category extending itself into the constructor role.	Analytics-engineering teams already running dbt that want agents to read the same governed surface their BI tools already do.	Core: free, Apache. Cloud: per-seat tiers. Semantic Layer + MCP on Teams / Enterprise.
Memory & runtime context building blocks, not full enterprise constructors
mem0 mem0.ai	Dedicated memory layer for agents, focused on extraction from conversations and runtime context. Largest community of the agent-memory frameworks, broad ecosystem integrations, managed cloud with compliance certifications.	Developers and product teams building consumer-facing agents where personalization is the use case. Less suitable when the primary data source is organizational documents.	Free: 10K memories. Pro: $19 / month for 50K. Pro+: $249 / month, includes graph.
Lattice latticesql.com	Apache 2.0 npm package framed as the database your AI has been missing. Context and data layer that sits as an observation sidecar on top of an existing file system. Two-tier model: a local-first CLI and GUI, and a multi-node enterprise tier with SSO and audit.	Mid-market tech companies where the CTO wants a structured context layer developers can adopt bottom-up without a procurement cycle.	Individual: free, open source. Org: enterprise pricing, custom.
Document intelligence & ingestion upstream input pipelines, not full constructors
LlamaIndex llamaindex.ai	Open-source framework plus LlamaCloud managed platform for parsing, extracting, and indexing documents. The most mature ingestion pipeline in the open ecosystem, with 160+ data-source connectors via LlamaHub.	Teams that need industrial-grade document parsing and structured extraction as the foundation of a context layer, especially in document-heavy domains (legal, financial, scientific).	Framework: free, MIT. LlamaCloud: $1.25 per 1,000 credits. Starter $50 / month.
Unstructured unstructured.io	Document-intelligence platform that turns 64+ file types into structured outputs for RAG and agent workflows. OCR, layout analysis, extraction, and 30+ enterprise connectors. SOC 2, ISO 27001, HIPAA, GDPR.	Enterprises with large unstructured-document estates (PDFs, scanned forms, contracts) that need to be made agent-readable at scale.	Open source + serverless. Enterprise API tier.

A few notes on the table that do not fit cleanly into cells.

The six archetypes are not equivalent. They are six starting points for the same destination. Work-AI platforms (Glean, Microsoft, Google) come from the end-user surface; they already own the assistant employees use and are racing to ground it in the org's full data. Metadata and governance vendors (Atlan, DataHub, Collibra) come from a decade of catalog and lineage work; their job is to convert that into something an agent can read. Ontology and knowledge-graph platforms (Palantir, Stardog) were working on identity and relationship modeling before the current wave; they have the strongest identity story and the steepest acquisition cost. Platform-native stacks (Snowflake, Databricks, dbt) have an obvious advantage: the data and governance already live there. Memory and runtime components (Mem0, Lattice) are building blocks that usually show up inside larger constructors. Document intelligence (LlamaIndex, Unstructured) is upstream; they make raw inputs structured enough to be useful.

A buyer chooses on four questions. What data stack does the company already run, and where is the easiest graft? Which of the four jobs is the immediate bottleneck: semantics, identity, governance, freshness? Does the team need a bundled assistant UX, or will the harness come from elsewhere? What budget owns this: analytics engineering, governance, platform, or work-AI? The right archetype usually falls out of those four. Credal, Moveworks, and Writer are not in the table but are worth flagging: Credal for permission inheritance, Moveworks for cross-app action orchestration, Writer for context-grounded agents in regulated industries.

Two footnotes. cognee, an open-source memory engine with an Extract-Cognify-Load pipeline, sits in the same archetype as Mem0 and Lattice. Alation, which announced an agentic platform in 2026, belongs with Atlan, DataHub, and Collibra; the metadata/governance archetype is fuller than this table can show without padding.

Part Eight

How to evaluate one

There is no standardized cross-vendor benchmark in this category. Most performance claims are vendor-authored, drawn from internal experiments on internal data with internal definitions of correctness. That does not make them wrong; it makes them uncomparable. Treat published numbers as directional and build your own task suite on your own data.

With that caveat, here is the bar. Ten criteria, organized around the four jobs from Part Four.

Coverage of the four jobs. Does the product do semantics, identity, governance, and freshness, or only one or two? The shortlist is the set that does all four well.
Answer accuracy on a canonical task suite. Define 100–200 questions on your data that you already know the right answer to. Measure how many the agent gets right with this constructor versus a hand-built baseline. The first cut tells you whether the product is real.
Identity-resolution accuracy. Pull 50 entities that exist in three or more systems and check the constructor resolves them as one with the right cross-references. If it cannot get Salesforce and Stripe to agree on a customer, it cannot answer revenue questions.
Permission correctness at query time. Run the same question as three users with different access scopes and confirm the agent's answer changes. Ingest-time permission models break the first time a Salesforce role changes.
Provenance completeness. Every claim the agent makes should trace to a specific source row, document, or message. Ask the product to surface its sources and confirm they are real and recoverable.
Freshness and conflict surfacing. When two sources disagree, does the system surface the conflict or pick a winner silently? When a source changes, how long until the layer reflects it? Event-based is the strong answer. A nightly batch is the weak answer. A quarterly re-ingest is not an answer.
Correction incorporation. When a user tells the agent it is wrong, can a steward turn that correction into a durable layer update, with attribution and a review trail? Without this, the same mistake gets relitigated every week.
Versioning and diff. If you cannot diff this week's context against last week's, you cannot debug the agent when it behaves strangely. A git-style history of the artifact is the right answer; "trust us, we keep it fresh" is not.
Latency and token economics under load. Measure the constructor's contribution to per-call latency and tokens at p50 and p95 under a representative workload. The Datadog 69%-of-tokens-on-system-prompts finding should not be your number after this product is in place.
Interface stability. The constructor should expose its output through both an HTTP API and an MCP server, with prompt-cache-friendly chunking. Anything that requires the harness to live inside one framework is a feature, not a context constructor.

Most products in the market today fail at least three of these. Pick accordingly, and document which gaps you are accepting for other tooling or your team to fill.

Part Nine

Open questions

The architectural direction is settled. The market structure is not. Five things remain genuinely open.

Who owns this layer. Each archetype maps to a different budget. Platform-native lives in data engineering. Metadata and governance live in data governance. Work-AI lives in IT or employee productivity. Ontology lives in platform engineering. Memory and document intelligence live wherever the application team buys infrastructure. Category formation follows budget ownership as much as architecture, and budget ownership is contested.

Buy versus build. OpenAI's published architecture is the gold standard: multiple layers of grounding, a daily refresh pipeline, runtime retrieval, memory for corrections. It is also bespoke, and most enterprises will not reproduce it. The strategic question is which parts of the four jobs to inherit from a warehouse platform, a metadata system, an ontology vendor, or a work-AI suite, and which must remain company-specific. The likely answer for most enterprises is a multi-vendor stitch, not a single product.

Where the layer stops and the action layer begins. A layer that answers questions is one thing. A layer that drives governed actions, with audit and rollback, is another. Palantir's Ontology binds actions to objects natively. Snowflake's framework includes operational playbooks. Microsoft Copilot increasingly takes action across the org's apps. The defensible position: actions belong adjacent to the layer, not inside it. The constructor produces context; the harness composes it with tools; the tools execute actions. The boundary is contested, and a category that absorbs action into the layer would be much larger than one that does not.

The MCP trajectory. If MCP becomes the universal transport, lock-in inside any one vendor falls and stitching becomes easier. If it bifurcates, or is replaced by a higher-level protocol that includes semantic guarantees, the market structure changes. Today every serious vendor ships an MCP server. That uniformity is a year old and may not last.

Whether the category consolidates or fragments. The dbt arc took six years from "script Tristan Handy wrote" to billion-dollar company plus a half-dozen competitors. Context constructors are eighteen months in. The likely shape, given six starting points: two or three companies consolidate the constructor role across archetypes (likely from platform-native and work-AI), metadata and ontology incumbents extend deeper into AI, and memory and document-intelligence components get absorbed as features. None of that is locked.

The architectural direction is on solid ground. Across Anthropic, OpenAI, Snowflake, a16z, DataHub, Atlan, Palantir, dbt, Stardog, and Collibra, the same finding keeps surfacing: agents become reliable only when business meaning, governance, identity, provenance, and retrieval are treated as first-class system-design problems instead of after-the-fact prompt patches. That is the context layer. The category that finally has the unit economics to build and maintain it at enterprise scale is the context constructor. The market is messier than a clean greenfield, the convergence is real, and the new category is real anyway. Now it has a name.