Every SEC filing is a wall of legal text. Thousands of sentences, hundreds of disclosures, infinite ambiguity. We extract 164 binary signals from each filing and plot them across time. From the noise, the truth reveals itself.
Each quarterly report (10-K, 10-Q) flows through a four-stage pipeline. No human reads every sentence — but every sentence gets read. The result: a time series of binary signals that reveals what companies are really saying beneath the boilerplate.
We use 7 distress signals and 7 recovery signals, each normalized by document length. The resulting "signal intensity" is comparable across companies and time periods.
The scatter plots tell stories that earnings calls can't hide. Each dot is a classifier firing in a specific filing. The regression line cuts through the noise to reveal the underlying trajectory.
Sometimes you just want one number. Net Health = (sum of recovery signals) - (sum of distress signals), normalized per 100 sentences. Positive = healthier, negative = distressed. Track it over time.
All signal counts are divided by document sentence count and multiplied by 100. This creates "signal intensity per 100 sentences" — comparable across 50-sentence and 500-sentence filings.
Simple linear regression (OLS) fitted to all data points per cluster. The slope indicates trend direction and magnitude. Shaded confidence bands show uncertainty.
Charts use 10-K and 10-Q filings from 2018 onwards. This captures pre-COVID, COVID, and post-COVID periods for meaningful trend analysis.
Signals are extracted using fine-tuned transformers trained on labeled SEC sentences. Each classifier outputs a binary yes/no per sentence, then aggregated per filing.
Sum of all 7 recovery signals minus sum of all 7 distress signals, after normalization. Positive values indicate recovery dominance; negative indicates distress dominance.
Filings with fewer than 50 sentences are excluded to ensure statistical stability. Company name changes are tracked to maintain continuity.