How to Simulate Like a Quant Desk — @gemchange_ltd

Why this is in the vault

This is the implementation-side companion to 2026-04-10-gemchange-quant-from-scratch. The first article is the 18-month math curriculum; this one is the simulation engine you build once you have the math. It walks from naive Monte Carlo through importance sampling, particle filters, copulas, and agent-based models — using Polymarket-style prediction markets as the running example rather than equities.

For the Automated Investing small bet, this is the technical blueprint. It also validates my earlier flag that prediction markets are the lower-stakes sandbox for RDCO — binary outcomes, bounded losses, transparent resolution, and the full quant stack still applies.

TL;DR

Crude Monte Carlo is the foundation but it breaks in three places that matter:

Tail events — 100K samples can’t measure a 0.3% probability contract
Real-time updates — as data arrives during a live event, you need a filter, not a rerun
Correlated portfolios — Gaussian correlation misses tail dependence and blows up in exactly the scenarios that matter

Each failure mode has a fix — importance sampling, sequential Monte Carlo (particle filters), and copulas — and the fixes compose. On top of that, agent-based models give you the emergent order-book dynamics that no closed-form SDE can capture. The final Part VIII is a five-layer production stack covering ingestion → probability engine → dependency modeling → risk → monitoring.

Part I — The core reframe

Retail traders treat a Polymarket contract like a biased coin flip: estimate p, compare to the listed price, buy the edge. That’s wrong because a prediction market contract embedded in a portfolio of correlated events, with time-varying information flow and order book dynamics, has dozens of parameters — not one.

The four questions retail can’t answer: how confident to be in your estimate, how the estimate should update when new data arrives, how the contract correlates with other contracts, and whether the price path lets you exit at a profit even when you’re eventually right.

Part II — Monte Carlo foundations

Every simulation in the article reduces to Monte Carlo: draw samples, compute a statistic, repeat. The sample mean converges at O(N^(-1/2)) per the CLT, with variance p(1-p)/N.

Key insight: variance is maximized at p = 0.5. The most actively traded contracts (those near 50¢) are exactly where crude MC estimates are least precise.

To hit ±0.01 precision at 95% confidence for p = 0.5, you need ~9,604 samples. Manageable for endpoints, but scales badly for path-dependent payoffs.

Runnable: binary contract simulator

import numpy as np

def simulate_binary_contract(S0, K, mu, sigma, T, N_paths=100_000):
    Z = np.random.standard_normal(N_paths)
    S_T = S0 * np.exp((mu - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * Z)
    payoffs = (S_T > K).astype(float)
    p_hat = payoffs.mean()
    se = np.sqrt(p_hat * (1 - p_hat) / N_paths)
    return {
        'probability': p_hat,
        'std_error': se,
        'ci_95': (p_hat - 1.96 * se, p_hat + 1.96 * se),
        'N_paths': N_paths
    }

Brier score — how you measure simulation calibration:

def brier_score(predictions, outcomes):
    return np.mean((np.array(predictions) - np.array(outcomes))**2)

Benchmarks: below 0.20 is good, below 0.10 is excellent. 538 and the Economist historically land 0.06–0.12 on presidential races. If our simulator beats that, we have edge.

Part III — Importance sampling for rare events

Polymarket hosts tail-risk contracts — “S&P down 20% in one week” at 0.3¢. With 100K crude samples you get 0 or 1 hit and your estimate is useless.

Fix: exponential tilting. Replace the original probability measure with one that oversamples the rare region, then correct the bias with a likelihood ratio. For a contract paying off when a sum exceeds a threshold, the tilt parameter γ solves the Lundberg equation M(γ) = 1.

Impact: 100–10,000× variance reduction on extreme contracts. 100 IS samples can beat 1M crude samples. This is the difference between “we can’t price this” and “we’re trading it.”

Runnable: IS for tail-risk binary contracts — full code in the source article. The tilted drift shifts the distribution so the crash threshold sits ~1 stddev away instead of ~4 stddevs, then the likelihood ratio corrects back to the original measure.

Part IV — Sequential Monte Carlo (particle filters) for live updates

Election night, 8:01 PM EST, Florida just closed with a 3-point shift. Your estimates for Ohio, Pennsylvania, Michigan, and every correlated state need to update now, not on a reruns schedule.

This is the filtering problem. Tool: particle filter.

State-space model:

Hidden state x_t: the “true” probability (unobserved)
Observations y_t: market prices, polls, vote counts, news signals
State evolves via logit random walk (keeps probabilities bounded)
Observations are noisy readings of the true state

Bootstrap particle filter algorithm:

Initialize N particles from prior, equal weights
For each new observation: propagate (random walk), reweight by likelihood, normalize, resample if effective sample size drops below N/2

Runnable: production-grade particle filter for a live prediction market — the full class from the article is worth copying to the curriculum doc. Key methods: update(observed_price), estimate() (weighted mean), credible_interval(alpha) (weighted quantiles), _systematic_resample() (lower variance than multinomial).

Why this beats using the market price directly: the filter smooths noise and propagates uncertainty. When the market spikes from 58¢ to 65¢ on a single trade, the filter tempers the update based on how volatile the observation process has been historically.

Part V — Variance reduction tricks that stack

Three techniques that compose multiplicatively with everything above:

Antithetic variates — for monotone payoffs (all binary contracts are monotone), pair each random draw Z with -Z. Guaranteed 50–75% variance reduction at zero extra cost.
Control variates — if you’re simulating under stochastic volatility, use the closed-form Black-Scholes digital price as a control. Correct the MC estimate by the known error of the control.
Stratified sampling — partition probability space into J strata, sample within each, combine. Variance always ≤ crude MC. Neyman allocation (n_j ∝ ω_j σ_j) maximizes the gain.

Stacked impact: 100–500× variance reduction over crude MC. The author calls this “table stakes” for production — not optional.

Part VI — Copulas and tail dependence

Linear correlation misses tail dependence — the tendency for extreme co-movements. The Gaussian copula’s failure to model this contributed to the 2008 crisis. For prediction markets the same trap exists: a Gaussian copula says P(all 5 swing states flip together | one of them surprises) ≈ 0, which is catastrophically wrong.

Sklar’s theorem decomposes a joint distribution into marginals F_i and a copula C that captures the pure dependency structure.

The copula zoo:

Gaussian: tail dependence λ_U = λ_L = 0 — wrong for financial events
Student-t: symmetric tail dependence, ~0.18 with ν=4, ρ=0.6 — good default for swing states
Clayton: lower tail dependence only — “when one crashes, others follow”
Gumbel: upper tail dependence only — correlated positive resolutions
Vine copulas: for d > 5 contracts, decompose into d(d-1)/2 bivariate conditional copulas in a tree (C-vine, D-vine, R-vine). Implementations: pyvinecopulib (Python), VineCopula (R)

Empirical impact: a t-copula with ν=4 routinely shows 2–5× higher probability of extreme joint outcomes than Gaussian. A prediction market portfolio without tail-dependence modeling will blow up in exactly the scenarios that matter most.

Runnable: the article gives three complete functions — simulate_correlated_outcomes_gaussian, simulate_correlated_outcomes_t (Student-t via chi-squared scaling), simulate_correlated_outcomes_clayton (Marshall-Olkin algorithm). Worth preserving in full in the curriculum.

Part VII — Agent-based models

Everything above assumes you know the data-generating process. In reality, markets are populated by heterogeneous agents whose interactions produce emergent dynamics no closed-form SDE captures.

The Gode-Sunder result (1993): markets can be efficient even when every trader is completely irrational. Zero-intelligence agents submitting random orders (subject only to budget constraints) achieve near-100% allocative efficiency in a continuous double auction.

Farmer, Patelli & Zovko (2005): zero-intelligence extended to limit order books — one parameter explained 96% of cross-sectional spread variation on the LSE.

Runnable: PredictionMarketABM class — the article’s full implementation has three agent types (informed, noise, market maker) with a Kyle-lambda price impact model. Worth preserving in the curriculum because it’s small and self-contained.

What the model surfaces: how fast prices converge depends on the informed/noise ratio; how market maker spread responds to information flow; and why informed traders extract profit at noise traders’ expense. Kyle (1985) and Glosten-Milgrom (1985) are the theoretical backbone.

Part VIII — The production stack (5 layers)

This is the architecture I want on the Automated Investing small bet roadmap:

Layer	Purpose	Components
L1: Data ingestion	Get real-time signals	Polymarket CLOB WebSocket, news/poll NLP signals, on-chain event data (Alchemy/Polygon)
L2: Probability engine	Turn signals into posterior probabilities	Hierarchical Bayesian model (Stan/PyMC), particle filter, jump-diffusion SDE paths, ensemble averaging
L3: Dependency modeling	Capture cross-contract dependencies	Vine copula pairwise dependencies, factor model for shared risk, Student-t copula tail dependence
L4: Risk management	Size positions and protect capital	EVT-based VaR / Expected Shortfall, reverse stress testing, correlation stress, order-book depth monitoring
L5: Monitoring	Know when the system is broken	Brier score tracking, P&L attribution by model component, drawdown alerts, model drift detection

The author is blunt about estimation error (in the roadmap article) and variance (in this one) being the real enemies. The first-order lesson is the same in both: the math works with true parameters and true probabilities; you never have those. Every layer of the stack above exists to bound, estimate, or correct for that gap.

What this means for RDCO automated investing

This article concretizes the “build on math we understand” reframe from the roadmap article. Specifically:

Prediction markets are the right first target, not equities. Binary resolution, bounded losses, transparent mechanism, the full toolkit above still applies, and Polymarket has a public CLOB WebSocket API. This is the cleanest sandbox we have.
The first milestone is not a trading strategy. It’s a working particle filter that produces a calibrated probability estimate on a live Polymarket contract and beats 0.12 Brier. No capital at risk, just a prediction pipeline we can evaluate honestly.
Monitor fits L1. Per 2026-04-10-claude-code-monitor-tool, if we wrap the Polymarket CLOB WebSocket in a CLI subscriber that echoes ticks to stdout, Claude Code’s Monitor tool is the right primitive to stream them into the agent session.
The five-layer stack is the project milestone structure. Each layer becomes a discrete milestone on the automated-investing roadmap:
1. L1 ingestion (Monitor + Polymarket WebSocket wrapper)
2. L2 probability engine (Bayesian model + particle filter)
3. L3 dependency modeling (copula-based portfolio joint distribution)
4. L4 risk management (VaR + stress tests)
5. L5 monitoring (Brier tracking + drift detection)
Don’t trade anything until we beat the Brier benchmark. The article gives us a specific bar — beat 0.12 on a live event — before any capital is deployed. This is the discipline gate.

Reference list from the article (worth having in the vault)

Key papers the author cites. Most are on arXiv or in open-access journals:

Dalen (2025). “Toward Black-Scholes for Prediction Markets.” arXiv:2510.15205
Saguillo et al. (2025). “Unravelling the Probabilistic Forest: Arbitrage in Prediction Markets.” arXiv:2508.03474
Madrigal-Cianci et al. (2026). “Prediction Markets as Bayesian Inverse Problems.” arXiv:2601.18815
Farmer, Patelli & Zovko (2005). “The Predictive Power of Zero Intelligence.” PNAS
Gode & Sunder (1993). “Allocative Efficiency of Markets with Zero-Intelligence Traders.” JPE
Kyle (1985). “Continuous Auctions and Insider Trading.” Econometrica
Glosten & Milgrom (1985). “Bid, Ask, and Transaction Prices.” JFE
Hoffman & Gelman (2014). “The No-U-Turn Sampler.” JMLR
Merton (1976). “Option Pricing When Underlying Stock Returns Are Discontinuous.” JFE
Linzer (2013). “Dynamic Bayesian Forecasting of Presidential Elections.” JASA
Gelman et al. (2020). “Updated Dynamic Bayesian Forecasting Model.” HDSR
Aas, Czado, Frigessi & Bakken (2009). “Pair-Copula Constructions of Multiple Dependence.” Insurance: Mathematics and Economics
Wiese et al. (2020). “Quant GANs: Deep Generation of Financial Time Series.” Quantitative Finance
Kidger et al. (2021). “Neural SDEs as Infinite-Dimensional GANs.” ICML

2026-04-10-gemchange-quant-from-scratch — the 18-month math curriculum this article sits on top of
01-projects/automated-investing/index — the active project this article feeds
2026-04-10-claude-code-monitor-tool — Monitor is the right primitive for L1 ingestion
2026-04-04-swing-trading-guide — Kevin Xu’s complementary style (equities, catalysts)

Action items (for triage into the board alongside article #1 actions)

Scope L1 of the production stack as the first concrete milestone — Polymarket CLOB WebSocket subscriber + Monitor integration
Reproduce the author’s particle-filter class in 01-projects/automated-investing/experiments/ as a learning exercise on a historical Polymarket contract
Set a discipline gate: no real capital until we beat 0.12 Brier score on a live event
Capture the reference list above into 06-reference/papers/ with arXiv URLs for the three newest ones
Evaluate pyvinecopulib vs copulas Python libraries for Layer 3