Now Available — By Invitation

Turn Claude Into a
Citation-Bearing Analyst

Provenance Connect is a Model Context Protocol server that gives Claude the structured layer it's missing. Anything Claude could talk about — companies, biotech pipelines, fund managers, news — it otherwise hallucinates around. Connect it to Provenance and it answers from a graph it can cite, as of any date, including the failures.

Claude · Provenance Connect
"Which industrials have rising distress signals in their last two 10-Qs?"
// screen_companies
// distress_echo > 10.0, streak >= 2
GEO echo 14.2 10-Q · Mar 2026
ACCO echo 11.7 10-Q · Mar 2026
AMRC echo 9.4 10-Q · Feb 2026
TRNS echo 8.1 10-Q · Feb 2026
// each result cited to a filing accession + snippet
152M Filing Sentences
127M News Sentences
32 Production Tools
7 Research Domains
529 Signal Classifiers

Three limits no amount of model scale fixes

Frontier LLMs are extraordinary at language, reasoning, and pattern-matching across their training text. But three structural gaps make them untrustworthy for primary-source research.

01 · Frozen Cutoff
No knowledge after training
Claude knows nothing that happened after its training date. Real-time facts, the latest SEC filing, this morning's M&A announcement — none of it is in the model.
02 · No As-Of
No concept of "true on date X"
Ask Claude to reconstruct a company's pipeline in Q3 2018 and it returns either today's pipeline (wrong) or a confident reconstruction (untrustworthy). It can't pin a fact to a date.
03 · Survivor Bias
Trained on the winners
Training is dominated by the companies and drugs that got written about. The failures rarely make the corpus, so any base rate Claude computes is the success rate of the survivors — not the population.

One hard number

We ran the Finance Agent Benchmark public 50-question set — exactly the kind of "as-of corporate filing" queries we built Provenance Connect for.

Claude (Sonnet 4.6) — alone
0 / 50
questions correct
Contradictions21 / 50
Claude + Provenance Connect
10 / 50
questions correct  ·  20%
Contradictions10 / 50
Every correct answer is Provenance-attributed. The questions sit past Claude's training cutoff — without external retrieval, the model gets nothing right. Provenance recovers 20%. The lift is the entire scorecard.
Contradictions halve. When Claude has real data to ground in, it stops confabulating — 21 contradictions drop to 10. That's the citation effect, not just the data.

Eight things Claude can't do alone

Each one is a query Claude either gets wrong or can't attempt without a structured, citation-bearing layer underneath it.

1Point-in-time reconstruction, cited
Show me BridgeBio's drug pipeline as it was knowable at the end of 2020.
Claude alone returns today's pipeline or a hallucination. Provenance returns 18 assets cited to a specific 10-K accession, with the verbatim snippet that established each fact.
2Survivorship-complete base rates
What fraction of drug programs targeting TTR have historically succeeded?
Not the success rate of the winners — the real one. 37.5% (12 of 32 resolved), with the 20 discontinued programs in the denominator. Reconstructed from 17,296 trajectories across 1,676 biotech CIKs, including delisted.
3Multi-year position trails
How did Berkshire build, hold, and trim its AAPL position?
Claude knows Berkshire owns AAPL; it can't reconstruct the arc. Provenance returns 26 quarters back to 2019 — the Q4-2019 entry, the Q2-2023 peak, the 2025 trims — all share-based actuals, not price-inflated.
4Cross-corpus aggregation, instantly
Which industrials have rising distress and filed restructuring-language 8-Ks in the last 90 days?
A multi-hour batch job for any human. Provenance answers in sub-second — one query against pre-aggregated signals over 152M filing sentences, 127M news sentences, and 73M institutional holdings.
5Canonical entity resolution
Who's developing drugs against PD-1?
Claude can conflate PD-1 with PD-L1 or miss that "PD-1" was once a Parkinson's gene (now SNCA). Provenance returns both canonical genes with HGNC IDs, confidence scores, and full alias lists. Deterministic.
6Comparables on structured snapshots
What happened to other biotechs that looked like this one's current setup?
Narrative similarity finds companies that sound alike. Provenance embeds each (company, quarter) as a structured vector and finds true nearest neighbors with forward outcomes attached — leak-safe, retrieving only snapshots before the query date.
7Derived signals from joined panels
What's a fund manager's actual track record vs. the market?
Claude can quote AUM and famous trades; it can't compute return. Provenance returns Berkshire's disclosed-long-book replication: +12.4% trailing 1-year, −4.9% excess vs SPY — labeled clearly as a 13F-clone proxy, with methodology surfaced.
8Real-time fact + deep history
Most recent FDA approval activity, vs. the 2020–2024 baseline?
The "recent + historical" combination is structurally unavailable to a frozen model. Provenance pairs daily-refreshed news (event-typed, with materiality scores) against 2020–2024 history and continuously-updated filing signals.

32 production tools across seven domains

Every tool returns provenance — source reference, snippet, and as-of date — on every fact. Every derived number carries its methodology version and caveats.

DomainWhat it covers
SEC Filing Signals Echo, distress, mutation, and entropy across millions of filings · 529 themed classifiers over 152M classified sentences.
Quarterly Fundamentals XBRL revenue, COGS, capex, lease, interest expense, dividends, growth rates — 50 columns, 4,400 tickers, back to 2009.
8-K Event Timeline Item-code filters — 4.01 auditor change, 4.02 non-reliance, 1.03 bankruptcy, 5.02 officer departure, and more.
News & Press Releases 127M sentences (2020–2026) plus 4M article-level rows with event type, materiality score, and post-publication returns.
13F Institutional Ownership Top holders, full portfolios (not capped), position history, cohort flow, manager track records, all-notable consensus.
13D/13G Activist Positions Per-stake campaign status, intent classification, and accumulation trail.
Biotech Pipelines Point-in-time pipeline reconstruction, survivorship-complete base rates, target landscapes, comparables, and risk scores — across 1,676 CIKs, including delisted.
🔗

Plus discover_tools — a semantic catalog meta-tool Claude calls when it's uncertain, so the surface scales without quality regression. Most MCP servers don't have one. Magnitude signals are labeled as magnitude, not direction; replication returns are labeled as replication, not audited fund returns. Every fact is verifiable.

From EDGAR to your AI assistant in one hop

No pipeline glue. No bespoke integration. One OAuth handshake and you're querying a corpus built over four years.

01
Find Provenance Connect in Claude
Search for "Provenance Connect" in the Claude connector directory. Click Connect — that's the only configuration step.
02
Authenticate with a Magic Link
We send a one-time sign-in link to your allowlisted email. One click issues an OAuth token. No passwords, no SDK, no API keys to manage.
03
Ask Research Questions
Claude calls the Provenance Connect tools automatically. Ask in plain English and get structured, signal-level, cited results.
Universal Endpoint
https://mcp.kscope.io/mcp/
Protocol
OAuth 2.1 · MCP
Access Model
Invitation · Read-Only

Why this is hard to replicate

"Couldn't someone just point an LLM at SEC EDGAR?" The short answer is no — and the long answer is five things that took years to build.

1
Extraction pipelines that took years
The 529-classifier taxonomy over 152M sentences and 17K resolved biotech program trajectories aren't a corpus you can spin up — they're the output of an extraction system tuned over years against real outcomes.
2
The survivorship-complete denominator
Including delistings, failed Phase 3s, and acquired-for-pennies trajectories requires reconstructing companies that no longer exist, from filings no aggregator surfaces. The base rate is the moat.
3
Joined-panel signals
Risk score = XBRL × 13F × extraction graph × an out-of-sample harness. Manager track record = 13F × price × outcome resolution. These aren't retrievable — they're computed from the joined panel, then served as a single pre-validated number.
4
The provenance contract
Every fact carries its source filing. Most APIs return data; we return data plus verifiability — and that's exactly what makes an LLM agent trustworthy inside it.
5
Continuous freshness
New filings flow through the pipeline as the SEC publishes them; news refreshes daily; biotech extractions run continuously. A competitor would need to match not just the data, but the cadence.

Built for research professionals who need data they can defend

🔬
Quantitative Researchers
Testing hypotheses on filing language, signal decay, and base rates at scale.
👔
Equity Analysts & PMs
An LLM that grounds claims in source filings rather than confabulating.
📋
Credit & Distressed Debt
Early-warning signal patterns before price reflects credit stress.
🧬
Biotech Investors
Point-in-time pipeline reconstruction and survivorship-complete base rates.
📡
Event-Driven Desks
13D/13G campaign trails with stated intent and accumulation history.
🤖
Agent Builders
A citation-bearing retrieval layer their users can actually trust.

One question. One tool. One citation.

Access is currently by invitation. Request access and we'll add you to the allowlist — then you'll find Provenance Connect in the Claude connector directory and can authenticate in under a minute. Listed in Anthropic's connector directory.

Currently free for invited users  ·  No credit card required

Every signal. Every source. Every time.

Looking for direct data access? Explore the data stream →