Every signal. Every source. Every time.
Alpha that can't be explained doesn't survive the next audit. Provenance links every signal to the exact filing sentence that triggered it — frozen at EDGAR acceptance, auditable on demand, backtestable without lookahead. Verifiable edge. Defensible process.
Most alternative data gives you a signal and asks you to trust it. Provenance gives you the signal — and the exact sentence, filing, and timestamp behind it.
Full Auditability
Every signal traces to a specific sentence in a specific filing. Not a vendor claim — a documented chain of evidence.
No Look-Ahead Bias
Scores are frozen at acceptance. Your backtest sees exactly what existed at each point in time — by design, not by promise.
Explainable Factors
When someone asks why a stock was flagged, you have an answer — down to the sentence. Named signals, not opaque model outputs.
Defensible Process
DDQs, IC presentations, allocator reviews — every signal-driven decision comes with a paper trail. Source-linked, timestamped, ready to share.
Product Architecture
Each layer is independent and useful on its own. Together they form a complete intelligence stack — from raw filing to consumable signal.
The Classifier Engine
A Provenance classifier is a trained NLP model that has been taught to recognize a specific, named business signal in free-form regulatory text. Each classifier answers a single question: does this sentence contain evidence of a particular business condition?
The classifiers are not keyword matchers or rule-based filters. Every sentence in a filing is encoded into a 768-dimensional vector that captures its meaning — not its keywords. Two completely different sentences like "orders have surged beyond our production capacity" and "we cannot fulfill backlog fast enough to meet customer commitments" land in the same region of this semantic space, even though they share zero keywords. The classifier finds that region. A keyword matcher would miss both.
This is the defining capability: semantically equivalent sentences with entirely different words are caught with equal precision. A classifier for Demand Accelerating does not search for the word 'demand' — it identifies the region of semantic space that genuine demand inflection occupies, and distinguishes it from routine mentions, prior-period comparisons, and hedged forward guidance, regardless of how management chooses to phrase it in any given filing.
Named signals
Trained classifiers that read SEC filings the way an analyst would — detecting what management is actually saying, not just what words appear on the page.
Sentence-level links
Every signal surfaces the exact sentence that triggered it, with a live link to the original filing on SEC EDGAR. No claim without evidence.
Institutional-grade
Asset managers, quant funds, and trading desks need data they can explain to compliance. Every Provenance Core output is auditable to a primary source.
"Consistent with our prior expectations, tendering activity and contract awards increased during the latter part of 2025."
Source: RIG 10-K · Filed 2026-02-23 · SEC Accession 0001451505-26-000018
The Converged Intelligence Platform
Most classifiers firing once tells you something happened. The pattern across quarters — building, persisting, breaking — tells you what it means.
A company where the Financial Distress category echo is rising, the credit spread z-score is in the 95th percentile of its own history, an activist_campaign classifier fired in the most recent 8-K, and a historically proven insider executed a conviction buy.
Four independent channels confirming the same inflection point simultaneously. No single channel produces that picture.
SEC Classifiers
Named business signals from 10-K, 10-Q, 8-K filings. Sentence-level evidence. Every signal linked to its source on SEC EDGAR. The foundational layer.
8-K Item Codes
Each disclosure type tracked with echo and streak, plus empirically backtested price-impact ratios. Filing density vs. company baseline is itself a signal.
Insider Transactions
Seven signal types including conviction buys, sustained accumulation windows, and historically proven buyers with verified track records.
Credit Spreads
Widening and tightening events with 4/8/12-week trajectory windows and spread z-scores vs. the company's own 12-month history.
Category Composites
11 thematic groups (Financial Distress, Governance Crisis, Recovery Signal, Catalyst Positive…) with category-level echo and cross-category interaction signals.
XBRL Financials
Structured financial statement data parsed directly from XBRL filings — revenue, margins, balance sheet items — cooked with the same echo and streak framework as every other channel.
Source-Linked Company Analysis Reports
Structured signal intelligence derived from Provenance Stream — formatted for analysts, LLM training, and institutional data rooms.
Three delivery formats
Human-readable .HTML/.MD for analysts, machine-readable .JSON for pipelines, and source filing links — all derived from the same underlying stream data.
Source linked at every level
Every signal in every report surfaces the verbatim sentence that triggered it and a live URL to the original SEC filing. No output without evidence.
LLM training alignment
The (classifier, sentence, confidence, source) structure maps directly to supervised fine-tuning, RAG construction, and instruction-following training objectives.
What every report contains
Classifiers fired
Named signals with echo, streak, and source quotes
Filing reference
Form type, date, accession number, direct SEC link
Source sentences
Verbatim text from the filing that triggered each classifier
Confidence scores
Per-sentence logistic regression probability (≥0.50 stored)
Filing history table
Multi-period signal history across all prior filings
Ranked signals
Up to 10 top signals ranked by echo persistence score
"Gross margin for the third quarter of fiscal 2026 decreased 130 basis points to 40.2% primarily due to higher tariffs in North America."
Source: NKE 10-Q · Filed 2026-04-01 · SEC Accession 0000320187-26-000032 · Confidence: 0.91 · Echo: 4.74
The Native Integration Layer
Kaleidoscope ships a native MCP server — AI tools query Provenance Core and Stream directly inside your workflow. No bespoke integration required.
Works where you work
Any MCP-compatible AI client — coding assistants, analyst workspaces, quant tools — can call Provenance signal queries natively without leaving the application.
No custom integration
No separate API credentials. No bespoke client library. No data pipeline to maintain. The MCP server is the integration — plug in and query.
Full Provenance access
Screen filings, look up classifier primitives, run signal queries, detect convergence across channels — all the same data available through the direct API.
The Native MCP Tools
screen_filings
Run the classifier library against any filing or set of filings — returns triggered signals with source sentences and confidence scores.
lookup_primitives
Retrieve the current echo, streak, stickiness, and z-score for any classifier across any company in the coverage universe.
run_signal_query
Filter and rank companies by signal criteria: e.g., 'Financial Distress echo > 3.0 AND credit spread z-score > 2.0 AND insider buying streak ≥ 2'.
get_filing_history
Retrieve the full temporal signal history for a company-classifier pair across all stored filings.
convergence_alert
Query for companies where two or more independent channels are showing elevated signal in the same 30/60/90-day window.
Example
"What are the most persistent Financial Distress signals across my coverage list this quarter, and which companies have credit spread widening confirming the same thesis?"
→ Provenance Connect queries Core + Stream, returns ranked results with source citations in seconds.
Provenance serves three distinct workflows. Same data product — different leverage points depending on how you use financial signals.
Audience 01
Systematic factor research & model building
How Provenance Helps
Audience 02
Fundamental research & filing triage
How Provenance Helps
Audience 03
Event-driven execution & risk management
How Provenance Helps
A live slice of the last 7 days — real companies, real filings, real signals.
14 signals fired
74 signals fired
24 signals fired
30 signals fired
From the Lab
Three diverging paths across IT services, staffing, and consulting as AI reshapes demand signals from 300+ companies over 20 years of filings.
29 classifiers, 1.2M sentences. COVID breakdown at -2.4σ, now recovering.
22 classifiers, 4.2M sentences. COVID shock at -4.7σ, commodity boom at +3.3σ.
Quality biotech oscillators. 82% win rate, 75-day median hold, +62.5% avg return per wave.
Get Started
Get in touch and we'll follow up with details on coverage, delivery format, and pricing.
We'll be in touch within 24 hours.