autoinv
Ray Data Co’s Automated Investing toolkit. A small library of reusable utilities for quant research and strategy development, built from the L1-L5 curriculum drills (gemchange roadmap) and Halls-Moore’s Successful Algorithmic Trading.
Design principles
- Time-series aware by default. Cross-validation utilities use
TimeSeriesSplit, not shuffledKFold. We do not exposeshuffle=Trueon financial data. - Honest metrics. Every performance number is reported alongside the relevant baseline (majority class, buy-and-hold, random permutation).
- Bias-aware. The validation module exposes a four-bias audit checklist (optimisation, look-ahead, survivorship, cognitive) that should gate every strategy before it trades real capital.
- Small surface area. Each module is focused. Functions, not classes, unless state is genuinely required.
Modules
| Module | Purpose |
|---|---|
data | Price pulls (yfinance), returns, lagged features, Fama-French factors (direct from Dartmouth) |
stats | Normality tests, Student-t MLE, factor regression with Newey-West HAC, permutation test |
validation | TimeSeriesSplit helper, score_walk_forward, BiasAudit checklist |
metrics | Sharpe, drawdown decomposition, Brier score, one-shot strategy report |
portfolio | Cost-aware Markowitz, efficient frontier, PCA factor decomposition |
pricing | Black-Scholes, Greeks, Monte Carlo, put-call parity |
engine | Minimal event-driven backtester skeleton (DataHandler / Strategy / Portfolio / ExecutionHandler) |
polymarket | Anonymous read-only client for Polymarket Gamma + CLOB + Data APIs (markets, prices-history, orderbook) |
kalshi | Anonymous read-only client for Kalshi trading API v2 (markets, events, candlesticks, orderbook). Mirrors the polymarket module structure so the two can be used interchangeably for arbitrage scans. |
Quickstart
from autoinv import data, stats, validation, metrics, engine
# 1. Pull prices and returns
prices = data.get_prices("SPY", start="2020-01-01", end="2026-04-01")
returns = data.get_returns("SPY", start="2020-01-01", end="2026-04-01")
# 2. Normality audit
report = stats.normality_report(returns.to_numpy())
print(f"Fat tails? {report.rejected_at_5pct}")
# 3. Test a signal with permutation
signal = (returns.shift(1) > 0).fillna(False).to_numpy()
perm = stats.permutation_test(returns.to_numpy(), signal, n_permutations=10_000)
print(f"Permutation p: {perm.p_one_sided:.4f}")
# 4. Run a strategy through the backtester
# (see scripts/demo_buy_after_up_day_v2.py for a full example)
# 5. Performance report
strategy_returns = ... # from backtest
r = metrics.strategy_report(strategy_returns)
print(f"Sharpe: {r.sharpe:.3f}, MaxDD: {r.max_drawdown:.2%}")
# 6. Gate with the bias audit
audit = validation.BiasAudit(
optimisation_bias=False,
look_ahead_bias=False,
survivorship_bias=False,
cognitive_bias=False,
)
assert audit.passed, audit.report()
Tests
.venv/bin/python -m pytest tests/
All 30 tests green as of 2026-04-10.
What’s intentionally NOT here
- Live trading infrastructure.
engine.SimulatedExecutionHandleris a skeleton with fixed commission and linear slippage. Do not point it at a broker without replacing it. - ML model zoo. sklearn covers this already; we don’t wrap it.
- Intraday data / order book handling. We’re starting with daily bars and prediction markets. Intraday comes later if at all.
- Full MySQL securities master à la Halls-Moore Ch 6. yfinance + parquet is enough at our scale.
- Kelly criterion position sizing. Coming when we hit L4 risk-management curriculum or the PM track.