01-projects / automated-investing / experiments

level 2 statistics

Thu Apr 09 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·experiment-writeup ·status: complete

Level 2 — Statistics Drills

Three drills from the quant-from-scratch roadmap Level 2 homework. Where Level 1 was vibe-check math, Level 2 is where the “most of what looks like signal is noise” lesson lands. Each drill is built specifically to expose a place where naive statistical intuition gets burned on real market data.

Drill 1 — Normality test and Student-t MLE on SPY returns

Script: ../scripts/level_2_normality_and_t_fit.py

Setup: 1,568 days of SPY daily returns (2020-01-02 → 2026-03-31, auto-adjusted). Test normality with Jarque-Bera, D’Agostino-Pearson, and Shapiro-Wilk. Fit both a Normal and a Student-t distribution via MLE, compare log-likelihoods, and run a likelihood ratio test.

Results:

Mean return:  0.000584
Std dev:      0.012932
Skewness:    -0.2524
Kurtosis:    13.0531  (normal = 0, excess)

Normality tests:
  Jarque-Bera:        stat=11148.39, p=0.00e+00
  D'Agostino-Pearson: stat=344.54,   p=1.53e-75
  Shapiro-Wilk:       stat=0.8752,   p=1.11e-33

MLE fits:
  Normal:    scale=0.012932, log-lik=4592.81
  Student-t: df=2.91, scale=0.007603, log-lik=4851.12
  LR test:   stat=516.62, p=0.00e+00

Interpretation:

Why this matters for trading: if you assume returns are Gaussian and compute position sizes (or VaR, or option prices) under that assumption, you are systematically underestimating tail risk by orders of magnitude. Black Monday (1987) was a ~22 sigma event under Gaussian assumptions — something that should happen once every age-of-the-universe. It happened. This is why full-Kelly betting and default OLS standard errors kill accounts.

Plot: outputs/level_2_normality_and_t_fit.png — SPY return histogram (log-y) with Normal and Student-t density overlays. The tail difference is visually obvious.

Drill 2 — Fama-French 3-factor regression on AAPL with Newey-West SEs

Script: ../scripts/level_2_fama_french_regression.py

Setup: AAPL daily returns 2020-01-03 → 2026-02-27 (1,546 obs). Regress daily excess returns on the three Fama-French factors (Mkt-RF, SMB, HML) pulled directly from Ken French’s data library. Compare OLS default SEs against Newey-West HAC SEs with lag=5 to show why you need HAC on financial data.

Note on data source: pandas-datareader has a known incompatibility with recent pandas versions (missing deprecate_kwarg argument), so I fetch the Ken French CSV zip directly from Dartmouth and parse it. That’s the fallback documented in the script.

Results (Newey-West HAC SEs — the correct ones):

alpha:   0.031542  (p=0.3187)
Mkt-RF:  1.1734    (p~0)
SMB:    -0.3361    (p=9.19e-12)  →  large-cap tilt
HML:    -0.3235    (p=3.27e-28)  →  growth tilt
R²:      0.6416
Annualized alpha: 7.95%

Interpretation:

This is the “your edge is factor exposure” lesson in action. Once you regress out market, size, and value factors, “alpha” disappears. The quants who actually make money are the ones whose alpha survives a multi-factor regression — and even then, they have to worry about momentum, quality, low-vol, liquidity, and a dozen other known factors.

Drill 3 — Permutation test on a synthetic momentum strategy

Script: ../scripts/level_2_permutation_test.py

Setup: Test a toy “buy after up day” strategy on SPY: if yesterday’s return was positive, hold SPY today; otherwise sit in cash. Compare the observed strategy’s mean daily return against 10,000 random permutations of the entry signal.

Results:

Observations:         1568
Days in market:        858 (54.7%)
Strategy mean return:  0.000063/day  (1.60% annualized)
Buy-and-hold mean:     0.000584/day  (14.72% annualized)

Permutation test (10,000 shuffles):
  Permuted mean of means:  0.000322
  Strategy percentile:     5.7%
  One-sided p-value:       0.9431
  Verdict: FAIL — strategy is WORSE than 94% of random entry signals.

Interpretation: this is better than expected as a teaching moment. The strategy doesn’t just fail to beat buy-and-hold — it actively underperforms random entry timing. The observed strategy sits in the 5.7 percentile of random shuffles, meaning 94.3% of random entry signals produced higher returns than the “buy after up day” logic.

Why this happens: short-term reversal is a well-documented anomaly. On very short horizons (one day), up days tend to be followed by smaller or negative returns on average. A trader who naively thinks “momentum = buy after up days” is fighting the empirical short-term mean reversion. The permutation test catches this cleanly — no prior factor model needed, no normality assumption, just shuffle the signal and count.

The lesson for us: the permutation test is the fastest BS detector we have. No assumptions about distribution, no need to trust standard errors. It’s slow (10,000 passes over the data) but conceptually bulletproof. Every strategy we test in this project will get a permutation test as a final gate.

Plot: outputs/level_2_permutation_test.png — histogram of permuted strategy means with the observed strategy and buy-and-hold both marked.

Cross-drill observations

  1. Fat tails are real and they’re not a minor correction. A Student-t fit with df < 3 on daily SPY is a giant signal that Gaussian assumptions anywhere downstream (position sizing, VaR, option pricing, backtesting confidence intervals) will be systematically wrong on the tails.

  2. Apparent alpha is usually factor exposure. Even on a mega-cap stock as obviously “good” as AAPL, a three-factor regression strips out 64% of the variance and leaves alpha statistically indistinguishable from zero. Our first hundred strategy ideas will all look like this.

  3. Permutation tests are our friend. They’re distribution-free, assumption-light, and they kill bad strategies dead. The fact that a “buy after up day” strategy failed the permutation test is exactly the kind of result that saves you from deploying a bad idea with real money.

  4. Multiple comparisons is coming. We haven’t run into it yet — we tested one strategy, once. But as soon as we start sweeping parameter grids or trying different signals, Bonferroni / Benjamini-Hochberg become essential. That’s on the list for Level 3 or early Level 4.

Gate check → proceed to Level 3?

Yes. L2 drills all pass:

Next action (Level 3): linear algebra. S&P 500 PCA to reproduce the “5 eigenvectors explain ~70% of variance” result, and a Markowitz mean-variance optimizer from scratch with cvxpy. This is where we add cvxpy and scikit-learn to the requirements.