We built a transformer to read SEC filings and find hidden biotech signals. We thought we were building a moonshot finder. We actually built something better.
The original thesis: find beaten-down biotech companies with 8+ quarters of strong model conviction, buy at the low, hold until FDA approval sends the stock to the moon. One big bet, 12+ months of waiting. The backtest data told a different story.
The 19 companies that passed our conviction gate over 2.5 years don't just go up once and stay up. The market beats them down, the model stays bullish because the SEC filings still show strong fundamentals — and the stock eventually recovers, hits our exit target, and then the market forgets again. The wave resets.
When a stock drops 50% on no fundamental news, our transformer still reads the same SEC filings it read last quarter. Cash runway unchanged. Clinical progress unchanged. Risk flags unchanged. Same quality company at a cheaper price. Fresh wave entry.
Across the out-of-sample period (July 2023 → March 2026), 11 of the 19 screened companies returned for a second entry window. Some returned 4–6 times. AQST appeared 6 separate times. Each time: beaten-down price, model still bullish. Each time: recovery.
High-conviction names at sub-$1 can still explode on FDA approvals, clinical data, partnership announcements. The waves pay us while we wait for the bigger event. If the moonshot never comes, we still made money riding the oscillations. If it does — and we're holding — that's the bonus.
Every quarterly report from 378 biotech companies flows through a four-stage pipeline. No free-text NLP, no price data in the model — just binary yes/no questions about what the company actually filed with regulators.
Each primitive is a deterministic yes/no question applied to a company's SEC filing. No machine learning in the extraction — just hand-engineered rules applied at scale across 194,000 filings. The transformer learns which combinations of these primitives, sustained over time, predict outperformance.
Aquestive Therapeutics appeared on our screener 6 separate times over 2.5 years. Six separate entry windows. Each time: beaten-down price. Each time: model still bullish on the SEC filings. Each time: recovery to exit target. This is the wave thesis in its purest form.
The recycle backtest revealed a critical insight: some stocks oscillate around genuine quality (AQST, 6/6 wins), while others are in structural decline and just keep generating false re-entries (SYBX: 4 attempts, 0 wins). A simple filter separates them.
Treating every recycle window equally: 70% win rate, +45.5% average return. Structural decliners keep sneaking in. SYBX appeared 4 times after its 2023 spike — all 4 were losses averaging -61%. The model was right that SYBX is a quality company historically; the market had structurally moved on.
Rule: only re-enter a recycle window if the previous window hit its exit target. If the last attempt failed (stock never recovered), skip the next entry. Result: 82% win rate, +62.5% average return. 7 trades blocked — those 7 had an average return of −34%.
We ran an ablation study — systematically removing model layers to measure what each one contributes. Same backtest period (2022–2025), same price gate, same time stop. Only the conviction filters change. The answer is not subtle.
Four candidates currently pass all three gate conditions simultaneously. Note AQST — wave 6. The same company that has returned to target 5 previous times, now back below our entry threshold with the model still bullish.
| Ticker | Price | % of 52wk Hi | IC | mean_q8 | Upside to Target |
|---|---|---|---|---|---|
| ATYR | $0.89 | 13.5% | 0.698 | 0.897 | +529% |
| SYBX | $0.62 | 34.0% | 0.581 | 0.793 | +151% |
| AQST wave 6 ↑ | $3.65 | 48.7% | 0.554 | 0.793 | +75% |
| MRKR | $1.43 | 65.3% | 0.502 | 0.860 | +30% |
All results shown use a train/test cutoff of June 30, 2023. No data from July 2023 onward was used in model training. The screener results are fully out-of-sample.
Backtests use end-of-day prices. Real execution on sub-$1 names with thin floats would include slippage. We use 30-day entry delays after gate opens to reduce premature entries.
19 unique tickers over 2.5 years is a narrow but curated sample. The gate is intentionally selective — not every beat-down biotech qualifies. Selectivity is a feature.
The transformer model sees zero price data. All 164 primitives come from SEC filing text and financial tables. The price dip filter is applied after scoring — signal and entry condition are fully independent.
For repeat entries on the same ticker: only re-enter if the previous entry window hit its exit target. This eliminates structural decliners that repeatedly fail to recover. Validated: blocks trades averaging −34%.
Exit when price reaches 85% of the 52-week high. A conservative proxy for "thesis has resolved." Does not require predicting FDA decisions or specific catalysts — just that the market reprices the quality we identified.
486-configuration parameter sweep (2022–2025, includes bear market). Best config: Sharpe 1.44, +205.7% total return, 73-day avg hold, 45 trades, 55.6% win rate. SPY baseline over same period: +36.7%. The 73-day portfolio avg hold independently confirms the 75-day per-trade median — two methods, same answer.
The 90-day time stop (primary strategy) maximises Sharpe at 1.44 and captures full wave oscillations. A 14-day time stop is a distinct product — high-frequency oscillator with 141 trades at 13-day avg hold and Sharpe 1.18. Same screener, one parameter change. The primary strategy is the right baseline; the 14-day config is worth separate exploration at smaller size.