Why We Put Provenance Inside Claude

A frontier language model is the best reasoning engine ever built and one of the least trustworthy sources of fact. It can walk you through a discounted-cash-flow model, then invent the cash flows. It can explain survivorship bias, then commit it. The intelligence is real. The grounding is not.

We spent four years building the grounding: 152 million classified SEC-filing sentences, 127 million news sentences, quarterly fundamentals back to 2009, 13F ownership trails, activist campaign histories, and survivorship-complete biotech pipelines — every fact carrying a link back to the primary filing that produced it. The question was never whether the data was good. It was how to get it into the place where the reasoning happens. The answer was MCP.

The three things scale will never fix

Frontier models share three structural limits, and no amount of additional training data closes them. They are properties of how the model is built, not how big it is.

01 · Frozen cutoff

The model knows nothing after its training date. This morning's 8-K, the latest M&A headline — invisible.

02 · No as-of

Ask what a company's pipeline looked like in Q3 2018 and you get today's pipeline, or a confident guess. It can't pin a fact to a date.

03 · Survivor bias

Training text is dominated by the companies and drugs that got written about. Any base rate the model computes is the success rate of the winners.

Each of these is exactly the kind of error that is fatal in research and invisible in casual use. A model that confabulates a plausible base rate is more dangerous than one that admits it doesn't know.

Why MCP, and not an API

We already had an API. APIs are for the software you write. The Model Context Protocol is for the reasoning the model does on your behalf — it lets Claude discover our tools, decide which to call, chain them together, and fold the structured results back into its answer, all inside a single conversation. No glue code. No bespoke integration. One connector URL and an OAuth handshake.

That distinction matters more than it sounds. With an API, a human decides in advance which endpoints to hit. With MCP, the model decides at question time — and because our tools each return provenance on every fact, the model's freedom to choose never costs us verifiability. It can take any path through the data and still cite every step.

Most financial data products return filings. Provenance Connect returns signal — with a source reference, a verbatim snippet, and an as-of date attached to every number it hands back.

The proof: one connection, measured

We ran the Finance Agent Benchmark — a public 50-question set built around exactly the "as-of corporate filing" queries that expose the three limits above — twice. Once with Claude alone. Once with Claude connected to Provenance over MCP. Nothing else changed.

CLAUDE — ALONE

0 / 50

questions correct

Contradictions: 21 / 50

CLAUDE + PROVENANCE CONNECT

10 / 50

questions correct · 20%

Contradictions: 10 / 50

Two things are worth reading carefully. First, every correct answer is Provenance-attributed. The questions sit past Claude's training cutoff; without external retrieval the model scores zero. The entire scorecard is the lift. Second, contradictions roughly halve. When the model has real data to ground in, it stops confabulating. That is the citation effect — not just more facts, but fewer invented ones.

How the MCP layer benefits the whole system

Internally we think of Provenance as a stack. Core is the classifier engine. Stream is the multi-channel data layer built on top of it. The MCP server — Provenance Connect — is the layer that makes all of it reachable by a reasoning agent in plain language. It changes what the rest of the system is worth in three concrete ways.

It turns a database into an analyst. A screen for "industrials with rising distress signals and restructuring-language 8-Ks in the last 90 days" used to be a query someone had to know how to write. Over MCP it's a sentence, and the answer comes back sub-second against pre-aggregated signals — each row cited to a filing accession.

It makes the provenance contract enforceable end-to-end. Because every tool returns its source on every fact, the chain from the model's sentence back to the primary SEC filing is never broken. The audit trail is complete before anyone asks for it — which is the difference between a signal a compliance team can defend and one they can't.

It lets the surface grow without getting worse. We ship a meta-tool, discover_tools, that the model calls when it's unsure which tool fits. New capabilities can be added behind it without overwhelming the model's tool-selection — so the catalog scales from today's 32 tools across seven domains without quality regression. Most MCP servers don't have this; it's the difference between a connector that stays sharp and one that degrades as it grows.

Eight things the connection makes possible

The clearest way to see the benefit is the set of questions that are simply unavailable to a model on its own, and routine once it can reach the data:

Point-in-time reconstruction

A pipeline as it was knowable at end-2020 — 18 assets, each cited to a 10-K accession.

Survivorship-complete base rates

37.5% of TTR programs succeeded — failures in the denominator, not just the winners.

Multi-year position trails

26 quarters of Berkshire's AAPL arc, share-based actuals — not price-inflated.

Cross-corpus aggregation

A multi-hour read reduced to one sub-second query across filings, news, and ownership.

Canonical entity resolution

"Drugs against PD-1" returns the right genes with HGNC IDs — deterministic, not a guess.

Structured comparables

Nearest neighbors by actual situation, with forward outcomes — leak-safe by construction.

Derived panel signals

A manager's replication return vs. SPY — computed from joined panels, labeled as replication.

Real-time + deep history

Daily-refreshed news measured against a multi-year baseline in a single answer.

Why the connection is the moat

The natural objection is: couldn't anyone point an LLM at EDGAR and do this? The short answer is no, and the reason is that none of the hard parts live in the model. They live in the layer the model connects to. The extraction pipelines took years. The survivorship-complete denominator required reconstructing companies that no longer exist. The panel signals — risk score, manager track record — are computed from joined data, not retrievable from any single source. And the provenance contract means every one of those facts arrives verifiable.

MCP is what turns that moat into a product. The data was always the asset. Putting it one tool-call away from the best reasoning engine available is what makes the asset usable — and what makes the answers trustworthy enough to act on.

One question. One tool. One citation. That is the whole idea: a model that reasons like an analyst and sources like one too.

Provenance Connect is live in the Claude connector directory, by invitation. If you want to put it to work, read how it works or request access.