Consolidation Pass — from drills to reusable package
Context
After completing Levels 1-5 of the math curriculum (probability → statistics → linear algebra → calculus → stochastic calculus) and extracting techniques from Halls-Moore’s Successful Algorithmic Trading, we had:
- Seven one-off drill scripts that each rewrote the same yfinance / pct_change / dropna boilerplate
- No out-of-sample discipline — we were evaluating on the same window we calibrated on
- No shared performance reporting
- No formal bias audit on any strategy
- No event-driven backtester architecture, which will be required for the prediction-market track
The consolidation pass fixes all five of those before we start building actual strategies.
What got built
A new Python package autoinv installed in editable mode under 01-projects/automated-investing/autoinv/, along with a pytest suite and a full pipeline demo script.
Package structure:
autoinv/
├── __init__.py # version + module overview
├── README.md # quickstart and design principles
├── data.py # yfinance wrapper, returns, lagged features, Fama-French
├── stats.py # normality tests, t-MLE, factor regression, permutation
├── validation.py # TimeSeriesSplit helper, score_walk_forward, BiasAudit
├── metrics.py # Sharpe, drawdown decomposition, Brier score, strategy_report
├── portfolio.py # markowitz_with_cost, efficient_frontier, pca_factors
├── pricing.py # black_scholes, greeks, monte_carlo_call, put_call_parity
└── engine.py # event-driven backtester skeleton
tests/
├── test_stats.py # 6 tests
├── test_validation.py# 4 tests
├── test_metrics.py # 7 tests
├── test_portfolio.py # 5 tests
├── test_pricing.py # 5 tests
└── test_engine.py # 3 tests
scripts/
└── demo_buy_after_up_day_v2.py # full pipeline in ~100 lines using the package
pyproject.toml # hatch build, autoinv>=0.1.0 editable install
Test status: 30/30 passing.
Key design decisions
1. Time-series-aware cross-validation is the ONLY kind we expose
From the Halls-Moore gotcha (Ch 15, p. 275) proven out in level-2b-time-series-cv: shuffled k-fold leaks future data into training, which inflates measured accuracy and can flip a worthless model into one that “looks like” it has a small edge.
autoinv.validation.timeseries_cv() is the only CV constructor the package exposes. There is intentionally no shuffled-KFold helper. If someone using the package wants shuffled CV they have to reach around us into sklearn directly — making it a deliberate violation rather than an accident.
2. Every performance number is reported with a baseline
The score_walk_forward helper returns a CVComparison dataclass that includes the majority-class baseline and the lift over it — so the reviewer can never accidentally celebrate a 55% classifier that’s worse than always-predict-up.
metrics.strategy_report produces a one-shot summary including max drawdown, duration, and time-to-recover. The demo compares against buy-and-hold explicitly.
3. Bias audit as a first-class object
validation.BiasAudit is a dataclass with four flags (optimisation / look-ahead / survivorship / cognitive) and a passed property that requires all four to be False. It has a .report() method that prints a PASS/FAIL checklist. The demo pipeline runs this as the final gate.
This turns the Halls-Moore Ch 3 checklist from a thing to remember into a thing that’s forced on every strategy by the type system. If you don’t fill in the audit, the strategy doesn’t ship.
4. Event-driven backtester is a skeleton, not a platform
engine.py is Halls-Moore Ch 13’s architecture in ~200 lines: abstract Strategy / Portfolio / ExecutionHandler bases, a concrete HistoricalCSVDataHandler, SinglePositionPortfolio, and SimulatedExecutionHandler (with commission + slippage), plus a Backtest orchestrator.
Differences from Halls-Moore:
- Uses
collections.dequeinstead ofqueue.Queue(single-threaded, no need for thread-safe primitive) - Fewer classes, simpler event dispatch
- Explicitly labeled as a research skeleton with a NOT-PRODUCTION warning in the docstring
When we hit the PM track, the same Backtest orchestrator plugs into a Polymarket CLOB WebSocket DataHandler without changing the strategy or portfolio code. That’s the whole point of the event-driven decomposition.
5. No shrinkage estimator or Black-Litterman prior yet
The L3 Markowitz drill showed that sample means are a terrible expected-return estimate (JNJ 55.6% “annualized expected”). The fix is Ledoit-Wolf shrinkage on the covariance and Black-Litterman posteriors for the means. We haven’t built those yet — they’re coming when the first real strategy needs them. For now markowitz_with_cost is exposed with a loud comment in the docstring about the estimation-error trap.
The demo script — proof the abstraction works
../scripts/demo_buy_after_up_day_v2 rebuilds the L2 “buy after up day” permutation-test drill using only autoinv imports. It:
- Pulls data (
data.get_returns,data.get_prices) - Runs a normality audit (
stats.normality_report,stats.fit_student_t) - Runs the permutation test (
stats.permutation_test) - Runs the strategy through the event-driven backtester (
engine.Backtest) - Reports performance (
metrics.strategy_report) - Compares to buy-and-hold baseline
- Runs the four-bias audit (
validation.BiasAudit)
Full pipeline in 100 lines of strategy-logic-heavy code, almost all of which is the one custom piece (the BuyAfterUpDayStrategy class). The data plumbing and metrics plumbing are gone.
What the demo proved:
- The permutation test correctly fails the strategy (5.7 percentile, p=0.94)
- Event-driven backtest ran 1569 bars with 793 fills
- Strategy Sharpe 0.110 vs buy-and-hold Sharpe 0.717 — the backtest confirms the permutation test’s warning and quantifies just how bad it is after costs
- Max drawdown: strategy -25.11% vs buy-and-hold -33.72% (the one silver lining — flat periods dampen drawdowns)
- Annualized return: strategy +1.48% vs buy-and-hold +15.85%
- Bias audit correctly flagged cognitive bias (a strategy that underperforms buy-and-hold this badly is not one I’d stomach)
This is the template for every future strategy. New ideas get a script that looks structurally identical to this demo but plugs in different strategy logic.
What the consolidation pass did NOT do (deferred)
- Shrinkage covariance and Black-Litterman priors — deferred until the first real strategy needs them
- Walk-forward optimization (rolling-window re-calibration) — wait for the PM track
- Full pairs-trading with CADF / Hurst — Halls-Moore Ch 9 techniques, add when we need them
- Kelly criterion and position sizing — Halls-Moore Ch 12, add when the first profitable strategy is ready to size
- Alpha Vantage / Polygon data sources — stick with yfinance until it’s the bottleneck
- MySQL securities master — overkill at our scale
What this unlocks
The package is now the foundation for:
- Starting the Prediction Markets track (the PM levels) — the event-driven engine is ready to accept a Polymarket CLOB WebSocket
DataHandler - Rebuilding the L3 Markowitz and L4 portfolio drills using the package API (optional cleanup)
- New strategy experiments — template script is ~100 lines, no more copy-paste boilerplate
- Running strategies through proper walk-forward evaluation rather than in-sample-only scoring
Next actions
- Start PM1 — build a Polymarket CLOB WebSocket
DataHandlerthat plugs into the existingengine.Backtestorchestrator. Monitor tool is the right primitive here (see ../../../06-reference/2026-04-10-claude-code-monitor-tool). - Build
metrics.plot_equity_curveandplot_drawdown_charthelpers so the demo script produces the same plots the one-off drills did - Write a
level-6-consolidation.mdconcept doc if we want to cite this pass as a curriculum milestone - Consider a
strategies/subfolder for concrete strategy classes (Moving Average Crossover, Pairs, etc.) once we have more than 2-3 of them
Related
- Infrastructure plan — this consolidation was explicitly proposed there
- ../../../06-reference/2026-04-10-halls-moore-algo-trading — the source of most architectural decisions
- ../../../06-reference/2026-04-10-gemchange-quant-from-scratch — the math foundation
- ../../../06-reference/2026-04-10-gemchange-simulate-like-quant-desk — the PM track blueprint this package will support