Now Available — By Invitation

Turn Claude Into a
Citation-Bearing Analyst

Provenance Connect is a Model Context Protocol server that gives Claude the structured layer it's missing. Anything Claude could talk about — companies, biotech pipelines, fund managers, news — it otherwise hallucinates around. Connect it to Provenance and it answers from a graph it can cite, as of any date, including the failures.

Request Access → Technical Docs

Claude · Provenance Connect

› "Which industrials have rising distress signals in their last two 10-Qs?"

// screen_companies

// distress_echo > 10.0, streak >= 2

GEO echo 14.2 ↑ 10-Q · Mar 2026

ACCO echo 11.7 ↑ 10-Q · Mar 2026

AMRC echo 9.4 ↑ 10-Q · Feb 2026

TRNS echo 8.1 ↑ 10-Q · Feb 2026

// each result cited to a filing accession + snippet

152M Filing Sentences

127M News Sentences

32 Production Tools

7 Research Domains

529 Signal Classifiers

The Problem

Three limits no amount of model scale fixes

Frontier LLMs are extraordinary at language, reasoning, and pattern-matching across their training text. But three structural gaps make them untrustworthy for primary-source research.

01 · Frozen Cutoff

No knowledge after training

Claude knows nothing that happened after its training date. Real-time facts, the latest SEC filing, this morning's M&A announcement — none of it is in the model.

02 · No As-Of

No concept of "true on date X"

Ask Claude to reconstruct a company's pipeline in Q3 2018 and it returns either today's pipeline (wrong) or a confident reconstruction (untrustworthy). It can't pin a fact to a date.

03 · Survivor Bias

Trained on the winners

Training is dominated by the companies and drugs that got written about. The failures rarely make the corpus, so any base rate Claude computes is the success rate of the survivors — not the population.

The Proof

One hard number

We ran the Finance Agent Benchmark public 50-question set — exactly the kind of "as-of corporate filing" queries we built Provenance Connect for.

Claude (Sonnet 4.6) — alone

0 / 50

questions correct

Contradictions21 / 50

Claude + Provenance Connect

10 / 50

questions correct · 20%

Contradictions10 / 50

Every correct answer is Provenance-attributed. The questions sit past Claude's training cutoff — without external retrieval, the model gets nothing right. Provenance recovers 20%. The lift is the entire scorecard.

Contradictions halve. When Claude has real data to ground in, it stops confabulating — 21 contradictions drop to 10. That's the citation effect, not just the data.

What Provenance Adds

Eight things Claude can't do alone

Each one is a query Claude either gets wrong or can't attempt without a structured, citation-bearing layer underneath it.

1Point-in-time reconstruction, cited

Show me BridgeBio's drug pipeline as it was knowable at the end of 2020.

Claude alone returns today's pipeline or a hallucination. Provenance returns 18 assets cited to a specific 10-K accession, with the verbatim snippet that established each fact.

2Survivorship-complete base rates

What fraction of drug programs targeting TTR have historically succeeded?

Not the success rate of the winners — the real one. 37.5% (12 of 32 resolved), with the 20 discontinued programs in the denominator. Reconstructed from 17,296 trajectories across 1,676 biotech CIKs, including delisted.

3Multi-year position trails

How did Berkshire build, hold, and trim its AAPL position?

Claude knows Berkshire owns AAPL; it can't reconstruct the arc. Provenance returns 26 quarters back to 2019 — the Q4-2019 entry, the Q2-2023 peak, the 2025 trims — all share-based actuals, not price-inflated.

4Cross-corpus aggregation, instantly

Which industrials have rising distress and filed restructuring-language 8-Ks in the last 90 days?

A multi-hour batch job for any human. Provenance answers in sub-second — one query against pre-aggregated signals over 152M filing sentences, 127M news sentences, and 73M institutional holdings.

5Canonical entity resolution

Who's developing drugs against PD-1?

Claude can conflate PD-1 with PD-L1 or miss that "PD-1" was once a Parkinson's gene (now SNCA). Provenance returns both canonical genes with HGNC IDs, confidence scores, and full alias lists. Deterministic.

6Comparables on structured snapshots

What happened to other biotechs that looked like this one's current setup?

Narrative similarity finds companies that sound alike. Provenance embeds each (company, quarter) as a structured vector and finds true nearest neighbors with forward outcomes attached — leak-safe, retrieving only snapshots before the query date.

7Derived signals from joined panels

What's a fund manager's actual track record vs. the market?

Claude can quote AUM and famous trades; it can't compute return. Provenance returns Berkshire's disclosed-long-book replication: +12.4% trailing 1-year, −4.9% excess vs SPY — labeled clearly as a 13F-clone proxy, with methodology surfaced.

8Real-time fact + deep history

Most recent FDA approval activity, vs. the 2020–2024 baseline?

The "recent + historical" combination is structurally unavailable to a frozen model. Provenance pairs daily-refreshed news (event-typed, with materiality scores) against 2020–2024 history and continuously-updated filing signals.

What's in the Box

32 production tools across seven domains

Every tool returns provenance — source reference, snippet, and as-of date — on every fact. Every derived number carries its methodology version and caveats.

Domain	What it covers
SEC Filing Signals	Echo, distress, mutation, and entropy across millions of filings · 529 themed classifiers over 152M classified sentences.
Quarterly Fundamentals	XBRL revenue, COGS, capex, lease, interest expense, dividends, growth rates — 50 columns, 4,400 tickers, back to 2009.
8-K Event Timeline	Item-code filters — 4.01 auditor change, 4.02 non-reliance, 1.03 bankruptcy, 5.02 officer departure, and more.
News & Press Releases	127M sentences (2020–2026) plus 4M article-level rows with event type, materiality score, and post-publication returns.
13F Institutional Ownership	Top holders, full portfolios (not capped), position history, cohort flow, manager track records, all-notable consensus.
13D/13G Activist Positions	Per-stake campaign status, intent classification, and accumulation trail.
Biotech Pipelines	Point-in-time pipeline reconstruction, survivorship-complete base rates, target landscapes, comparables, and risk scores — across 1,676 CIKs, including delisted.

🔗

Plus discover_tools — a semantic catalog meta-tool Claude calls when it's uncertain, so the surface scales without quality regression. Most MCP servers don't have one. Magnitude signals are labeled as magnitude, not direction; replication returns are labeled as replication, not audited fund returns. Every fact is verifiable.

How It Works

From EDGAR to your AI assistant in one hop

No pipeline glue. No bespoke integration. One OAuth handshake and you're querying a corpus built over four years.

Find Provenance Connect in Claude

Search for "Provenance Connect" in the Claude connector directory. Click Connect — that's the only configuration step.

Authenticate with a Magic Link

We send a one-time sign-in link to your allowlisted email. One click issues an OAuth token. No passwords, no SDK, no API keys to manage.

Ask Research Questions

Claude calls the Provenance Connect tools automatically. Ask in plain English and get structured, signal-level, cited results.

Universal Endpoint

https://mcp.kscope.io/mcp/

Protocol

OAuth 2.1 · MCP

Access Model

Invitation · Read-Only

The Moat

Why this is hard to replicate

"Couldn't someone just point an LLM at SEC EDGAR?" The short answer is no — and the long answer is five things that took years to build.

Extraction pipelines that took years

The 529-classifier taxonomy over 152M sentences and 17K resolved biotech program trajectories aren't a corpus you can spin up — they're the output of an extraction system tuned over years against real outcomes.

The survivorship-complete denominator

Including delistings, failed Phase 3s, and acquired-for-pennies trajectories requires reconstructing companies that no longer exist, from filings no aggregator surfaces. The base rate is the moat.

Joined-panel signals

Risk score = XBRL × 13F × extraction graph × an out-of-sample harness. Manager track record = 13F × price × outcome resolution. These aren't retrievable — they're computed from the joined panel, then served as a single pre-validated number.

The provenance contract

Every fact carries its source filing. Most APIs return data; we return data plus verifiability — and that's exactly what makes an LLM agent trustworthy inside it.

Continuous freshness

New filings flow through the pipeline as the SEC publishes them; news refreshes daily; biotech extractions run continuously. A competitor would need to match not just the data, but the cadence.

Who It's For

Built for research professionals who need data they can defend

🔬

Quantitative Researchers

Testing hypotheses on filing language, signal decay, and base rates at scale.

👔

Equity Analysts & PMs

An LLM that grounds claims in source filings rather than confabulating.

📋

Credit & Distressed Debt

Early-warning signal patterns before price reflects credit stress.

🧬

Biotech Investors

Point-in-time pipeline reconstruction and survivorship-complete base rates.

📡

Event-Driven Desks

13D/13G campaign trails with stated intent and accumulation history.

🤖

Agent Builders

A citation-bearing retrieval layer their users can actually trust.

Invitation-Based Access

One question. One tool. One citation.

Access is currently by invitation. Request access and we'll add you to the allowlist — then you'll find Provenance Connect in the Claude connector directory and can authenticate in under a minute. Listed in Anthropic's connector directory.

Request Access → Read the Docs

Currently free for invited users · No credit card required

Every signal. Every source. Every time.

Looking for direct data access? Explore the data stream →

Turn Claude Into aCitation-Bearing Analyst