01-projects / automated-investing

infrastructure plan

·infrastructure-plan ·status: active

Automated Investing — Infrastructure Plan

Purpose

Define the tools, libraries, data sources, directory layout, and discipline gates for moving through the 5-level quant curriculum from 06-reference/2026-04-10-gemchange-quant-from-scratch and building toward the 5-layer prediction-market production stack from 06-reference/2026-04-10-gemchange-simulate-like-quant-desk.

This plan sits under the Automated Investing project index. Experiments run in the venv defined here. Nothing in this project uses real capital until we beat the discipline gate (0.12 Brier score on a live event).

Local dev environment

Python runtime: uv-managed Python 3.12 (installed at 01-projects/automated-investing/.venv/). uv is the package manager — fast, reproducible, handles Python version pinning.

Why uv: consistent across Mac Mini / any future machine, lockfile gives reproducibility, can run scripts via uv run without manual activation. Installed at /Users/ray/.local/bin/uv.

Activation options:

Requirements file: requirements.txt in the project root. Add libraries as we move up levels. Current state (L1-L2):

numpy>=2.0
scipy>=1.13
pandas>=2.2
matplotlib>=3.9
statsmodels>=0.14
yfinance>=0.2.40
jupyterlab>=4.2
ipykernel>=6.29

Future additions by level:

Directory layout

01-projects/automated-investing/
├── index.md                    # project overview + reference links
├── infrastructure-plan.md      # this file
├── curriculum.md               # (to create) level-by-level learning plan w/ milestones
├── requirements.txt            # pinned Python libs
├── .venv/                      # uv-managed Python env (gitignored)
├── scripts/                    # runnable .py files per drill
│   ├── level_1_coin_flip.py
│   ├── level_1_bayesian_updater.py
│   └── ...
└── experiments/                # markdown writeups with results
    ├── level-1-probability.md
    ├── outputs/                # generated plots, CSVs, pickles
    │   ├── level_1_lln.png
    │   └── level_1_bayesian.png
    └── ...

Convention: scripts are runnable and tiny (single-responsibility). Experiments are markdown writeups with embedded results, plots, and commentary. Scripts write plot outputs to experiments/outputs/.

Level progression plan

Each level has a concrete deliverable. We do not advance to the next level until the current level’s deliverable is working and documented.

LevelTopicDeliverablesGate to next
L1ProbabilityCoin-flip LLN, Bayesian updater, Blitzstein Ch 1-6 drillsLLN converges; Bayesian updater’s MAP matches MLE on simulated data
L2Statisticsyfinance → fit Student-t via MLE, Fama-French 3-factor regression, permutation testDemonstrate awareness of multiple-comparisons trap on a synthetic strategy sweep
L3Linear algebraS&P 500 PCA, Markowitz mean-variance optimizer from scratch with cvxpyReproduce the “5 eigenvectors explain ~70% of variance” result
L4Calculus / optimizationGradient descent on Rosenbrock, portfolio optimization with transaction costsSolve a constrained problem cvxpy can’t handle directly (custom gradient)
L5Stochastic calculusBlack-Scholes from scratch, Monte Carlo convergence check, all five GreeksMonte Carlo price matches analytical Black-Scholes within 0.5%
PM1Prediction market foundationsMonte Carlo binary contract simulator + Brier score calibratorSimulator runs on a historical Polymarket contract and produces a Brier score
PM2Tail-risk pricingImportance sampling via exponential tilting≥100x variance reduction on an extreme contract vs crude MC
PM3Live filteringParticle filter on a live or replayed Polymarket contractFiltered estimate beats raw market price in Brier score
PM4DependenciesStudent-t copula for correlated swing-state-style contractsReproduce the “2-5x higher joint tail probability vs Gaussian” result
PM5Production stack L1-L5Polymarket CLOB WebSocket subscriber + Monitor integration, probability engine, dependency model, risk, monitoringBeat 0.12 Brier score on a live event (paper)

Only after PM5 passes its gate do we deploy real capital, and only at size small enough to survive the estimation-error warning from the roadmap article.

Data sources

Compute

Claude Code tooling we’ll lean on

Discipline gates (never compromise)

From both source articles, restated:

  1. First 10 backtested strategies will look good and be noise. Accept this upfront; don’t trust strategy #1.
  2. Use Newey-West standard errors for any financial regression. Default OLS SEs are wrong.
  3. Apply Bonferroni or Benjamini-Hochberg when testing multiple strategies.
  4. Beat 0.12 Brier score on a live event (paper trading) before any real capital touches Polymarket.
  5. No full-Kelly sizing. Estimation error destroys full-Kelly the moment true parameters drift from estimated ones. Fractional Kelly with a hard cap.
  6. Log every decision to 01-projects/automated-investing/decisions.md so we can review whether the methodology held up over time.

Textbooks downloaded

Open questions / deferred decisions

What’s working already (as of 2026-04-10)

Next actions