Structured Signal Intelligence · SEC Filings

Provenance:
Alpha You Can Trace.

Every signal.  Every source.  Every time.

Alpha that can't be explained doesn't survive the next audit. Provenance links every signal to the exact filing sentence that triggered it — frozen at EDGAR acceptance, auditable on demand, backtestable without lookahead. Verifiable edge. Defensible process.

500+
Classifiers
0.956
Median AUC
IC 0.095
145-wk OOS
4,831
Tickers
2014–
Coverage
scroll

The Alpha You Can
Actually Trace.

Most alternative data gives you a signal and asks you to trust it. Provenance gives you the signal — and the exact sentence, filing, and timestamp behind it.

Full Auditability

Every signal traces to a specific sentence in a specific filing. Not a vendor claim — a documented chain of evidence.

No Look-Ahead Bias

Scores are frozen at acceptance. Your backtest sees exactly what existed at each point in time — by design, not by promise.

Explainable Factors

When someone asks why a stock was flagged, you have an answer — down to the sentence. Named signals, not opaque model outputs.

Defensible Process

DDQs, IC presentations, allocator reviews — every signal-driven decision comes with a paper trail. Source-linked, timestamped, ready to share.

Product Architecture

Four layers.
One source of truth.

Each layer is independent and useful on its own. Together they form a complete intelligence stack — from raw filing to consumable signal.

01 Foundation

Provenance Core

The Classifier Engine

A Provenance classifier is a trained NLP model that has been taught to recognize a specific, named business signal in free-form regulatory text. Each classifier answers a single question: does this sentence contain evidence of a particular business condition?

The classifiers are not keyword matchers or rule-based filters. Every sentence in a filing is encoded into a 768-dimensional vector that captures its meaning — not its keywords. Two completely different sentences like "orders have surged beyond our production capacity" and "we cannot fulfill backlog fast enough to meet customer commitments" land in the same region of this semantic space, even though they share zero keywords. The classifier finds that region. A keyword matcher would miss both.

This is the defining capability: semantically equivalent sentences with entirely different words are caught with equal precision. A classifier for Demand Accelerating does not search for the word 'demand' — it identifies the region of semantic space that genuine demand inflection occupies, and distinguishes it from routine mentions, prior-period comparisons, and hedged forward guidance, regardless of how management chooses to phrase it in any given filing.

Named signals

Trained classifiers that read SEC filings the way an analyst would — detecting what management is actually saying, not just what words appear on the page.

Sentence-level links

Every signal surfaces the exact sentence that triggered it, with a live link to the original filing on SEC EDGAR. No claim without evidence.

Institutional-grade

Asset managers, quant funds, and trading desks need data they can explain to compliance. Every Provenance Core output is auditable to a primary source.

Backlog Building · RIG / Transocean

"Consistent with our prior expectations, tendering activity and contract awards increased during the latter part of 2025."

Source: RIG 10-K · Filed 2026-02-23 · SEC Accession 0001451505-26-000018

02 Temporal Layer

Provenance Stream

The Converged Intelligence Platform

Most classifiers firing once tells you something happened. The pattern across quarters — building, persisting, breaking — tells you what it means.

A company where the Financial Distress category echo is rising, the credit spread z-score is in the 95th percentile of its own history, an activist_campaign classifier fired in the most recent 8-K, and a historically proven insider executed a conviction buy.

Four independent channels confirming the same inflection point simultaneously. No single channel produces that picture.

SEC Classifiers 8-K Item Codes Insider Transactions Credit Spreads Category Composites XBRL Financials

SEC Classifiers

Named business signals from 10-K, 10-Q, 8-K filings. Sentence-level evidence. Every signal linked to its source on SEC EDGAR. The foundational layer.

8-K Item Codes

Each disclosure type tracked with echo and streak, plus empirically backtested price-impact ratios. Filing density vs. company baseline is itself a signal.

Insider Transactions

Seven signal types including conviction buys, sustained accumulation windows, and historically proven buyers with verified track records.

Credit Spreads

Widening and tightening events with 4/8/12-week trajectory windows and spread z-scores vs. the company's own 12-month history.

Category Composites

11 thematic groups (Financial Distress, Governance Crisis, Recovery Signal, Catalyst Positive…) with category-level echo and cross-category interaction signals.

XBRL Financials

Structured financial statement data parsed directly from XBRL filings — revenue, margins, balance sheet items — cooked with the same echo and streak framework as every other channel.

03 Analyst Output

Provenance Reports

Source-Linked Company Analysis Reports

Structured signal intelligence derived from Provenance Stream — formatted for analysts, LLM training, and institutional data rooms.

Three delivery formats

Human-readable .HTML/.MD for analysts, machine-readable .JSON for pipelines, and source filing links — all derived from the same underlying stream data.

Source linked at every level

Every signal in every report surfaces the verbatim sentence that triggered it and a live URL to the original SEC filing. No output without evidence.

LLM training alignment

The (classifier, sentence, confidence, source) structure maps directly to supervised fine-tuning, RAG construction, and instruction-following training objectives.

What every report contains

Classifiers fired

Named signals with echo, streak, and source quotes

Filing reference

Form type, date, accession number, direct SEC link

Source sentences

Verbatim text from the filing that triggered each classifier

Confidence scores

Per-sentence logistic regression probability (≥0.50 stored)

Filing history table

Multi-period signal history across all prior filings

Ranked signals

Up to 10 top signals ranked by echo persistence score

Gross Margin Compression · NKE / Nike

"Gross margin for the third quarter of fiscal 2026 decreased 130 basis points to 40.2% primarily due to higher tariffs in North America."

Source: NKE 10-Q · Filed 2026-04-01 · SEC Accession 0000320187-26-000032 · Confidence: 0.91 · Echo: 4.74

.HTML .MD .JSON for every report · 500K+ available
04 Integration Layer

Provenance Connect

The Native Integration Layer

Kaleidoscope ships a native MCP server — AI tools query Provenance Core and Stream directly inside your workflow. No bespoke integration required.

Works where you work

Any MCP-compatible AI client — coding assistants, analyst workspaces, quant tools — can call Provenance signal queries natively without leaving the application.

No custom integration

No separate API credentials. No bespoke client library. No data pipeline to maintain. The MCP server is the integration — plug in and query.

Full Provenance access

Screen filings, look up classifier primitives, run signal queries, detect convergence across channels — all the same data available through the direct API.

The Native MCP Tools

screen_filings

Run the classifier library against any filing or set of filings — returns triggered signals with source sentences and confidence scores.

lookup_primitives

Retrieve the current echo, streak, stickiness, and z-score for any classifier across any company in the coverage universe.

run_signal_query

Filter and rank companies by signal criteria: e.g., 'Financial Distress echo > 3.0 AND credit spread z-score > 2.0 AND insider buying streak ≥ 2'.

get_filing_history

Retrieve the full temporal signal history for a company-classifier pair across all stored filings.

convergence_alert

Query for companies where two or more independent channels are showing elevated signal in the same 30/60/90-day window.

Example

"What are the most persistent Financial Distress signals across my coverage list this quarter, and which companies have credit spread widening confirming the same thesis?"

→ Provenance Connect queries Core + Stream, returns ranked results with source citations in seconds.

Use Cases by Audience

Provenance serves three distinct workflows. Same data product — different leverage points depending on how you use financial signals.

Audience 01

Quantitative Analysts

Systematic factor research & model building

Scenario

You're extending an existing equity factor model and need interpretable, structured signals from regulatory filings — with a clean point-in-time history and a paper trail that survives a DDQ.

Core Stream Connect

How Provenance Helps

  • Add classifiers to existing factor libraries with a single daily table join
  • Echo weighting amplifies signals that persist across multiple consecutive filings
  • Streak metadata separates first occurrences from entrenched patterns — critical for alpha decay modeling
  • Frozen point-in-time scores guarantee no look-ahead bias in walk-forward backtests
  • DDQ-ready audit trail: every factor value traceable to a source sentence and EDGAR accession number

Audience 02

Investment Analysts

Fundamental research & filing triage

Scenario

You cover 25–40 names. Every quarter, dozens of 10-Ks and 10-Qs drop within days of each other. You need to know which filings matter before you read a single page.

Core Stream Reports

How Provenance Helps

  • Screen your entire coverage universe in minutes — which filings had unexpected signal changes
  • Streak tracking distinguishes first-mention disclosures from language management has repeated for six quarters
  • Source sentences link directly to EDGAR for instant one-click verification
  • Build source-linked investment notes — every claim backed by the exact filing sentence
  • Detect thesis drift early: track which bullish classifiers are fading or reversing quarter-over-quarter

Audience 03

Traders & Trading Desks

Event-driven execution & risk management

Scenario

You run event-driven strategies around SEC filings. You need structured, named signals near real-time after EDGAR acceptance — with enough context to size a position and justify it to risk management.

Stream Connect Core

How Provenance Helps

  • Classifiers fire near real-time after EDGAR acceptance — before price fully adjusts
  • Echo delta identifies genuinely new signals vs. confirmation of already-known management themes
  • Streak 1 flags mark first occurrences — highest surprise value, strongest pricing implications
  • Source sentences provide risk-desk justification for position sizing on every event trade
  • Classifier company profiles identify archetype context — same signal means different things for a biotech vs. an energy major
Live Feed

What's Firing Right Now

A live slice of the last 7 days — real companies, real filings, real signals.

GSBC GREAT SOUTHERN BANCORP, INC. 8-K Apr 16
Burn Rate Controlled Reimbursement Improving Quarterly Loan Deposit Growth Comparison Allowance For Loan Losses Disclosure Forward Looking Statement Definition +9 more

14 signals fired

BNAI Brand Engagement Network Inc. 10-K Apr 16
Burn Rate Controlled Management Exodus Institutional Investment Demand Softening Utilization Improving +10 more

74 signals fired

USB US BANCORP \DE\ 8-K Apr 16
Demand Softening Cost Structure Improving Same Client Revenue Growing Efficient Growth Recurring Revenue Growing +10 more

24 signals fired

IIIN INSTEEL INDUSTRIES INC 10-Q Apr 16
Demand Softening Pricing Power Services Dso Deteriorating Utilization Declining Services Capex Acceleration +10 more

30 signals fired

From the Lab

Research & Insights

Get Started

Request Access

Get in touch and we'll follow up with details on coverage, delivery format, and pricing.

Quant-First Transparent Reproducible