01-projects / automated-investing / experiments

pm1 polymarket baseline

Thu Apr 09 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·experiment-writeup ·status: superseded-in-part

PM1 — Polymarket Baseline Calibration

⚠ Correction (2026-04-10): The conclusion in this document — that Polymarket is efficient across its entire top-650 volume range — was correct for what it measured but misleading as a statement about the whole venue. Our pagination approach capped us at markets with $7M+ volume (Gamma’s /markets endpoint 500s beyond offset ~950). A follow-up analysis using volume-band filters (pm1b-polymarket-long-tail-correction) found that three out of five volume bands between $10K and $100M fail the 0.12 discipline gate. Read the correction for the current verdict. The methodology and top-volume numbers below remain valid.

Question asked: can we find a winning strategy on a live prediction market?

Short answer: not with a directional model on the top-650 markets by volume ($7M+ median), but markets in the $100K–$2M volume range show Brier scores around 0.12–0.15 — above the discipline gate — and are the plausible target for directional strategies. See pm1b-polymarket-long-tail-correction for details.

This experiment is the first real measurement of whether Polymarket is even beatable from our current toolkit. Before testing any strategy idea, we need to know what “trust the market price” does on average — that’s the baseline any of our strategies has to beat. If Polymarket is hyperefficient, we’re either wasting time on directional prediction or we need a totally different angle (informational edge, market-making, niche markets).

Setup

Data source: Polymarket public Gamma + CLOB APIs (anonymous, no wallet). Client code at autoinv/polymarket.py.

Sample: all resolved binary markets in the top 650 by total volume (volumeNum sort), resolved between roughly 2023 and early 2026. Final N = 611 markets with usable daily price history.

Methodology:

  1. For each market, fetch daily mid-price history (/prices-history with fidelity=1440)
  2. For each of several “days before resolution” windows (1, 3, 7, 14, 30), snapshot the market’s latest price at or before that time
  3. Compute the Brier score of those snapshots against the realized binary outcome
  4. Compare to naive baselines (always predict 0.5, always predict majority class) and to the discipline gate of Brier ≤ 0.12

Scripts:

Results — top 100 markets, calibration across time windows

Window                Brier    Mean price    Win rate
--------------------------------------------------------
 1 day before        0.0348     0.199        20.2%
 3 days before       0.0325     0.198        20.2%
 7 days before       0.0361     0.194        20.2%
14 days before       0.0425     0.184        19.4%
30 days before       0.0610     0.158        20.0%

Baselines:
  Always predict 0.5:             Brier = 0.2500
  Always predict majority (0.202): Brier = 0.1612
  Discipline gate:                 Brier = 0.12
  Excellent forecaster:            Brier < 0.10

Interpretation:

Results — Brier by volume tier

Tier                          N    Median vol    Brier    Majority    Lift
-------------------------------------------------------------------------
Top 1-50 (ultra-liquid)      49   $130M        0.0537    0.1741    +0.1204
Top 51-150 (large)           99   $ 53M        0.0148    0.1286    +0.1138
Top 151-350 (mid)           190   $ 24M        0.0262    0.1257    +0.0994
Top 351-650 (small)         273   $ 12M        0.0749    0.1609    +0.0860

Interpretation:

The honest verdict

We did not find a winning strategy, and we quantified why that was unlikely from the start.

Polymarket is efficient. Not just “reasonably calibrated” — actively better than the best professional human forecasters across the entire 650-market volume range we tested. A strategy that buys contracts whose price disagrees with a model estimate will lose, on average, because the model is worse than the market price.

This is the honest negative result the roadmap article warned about: “tools have democratized, conviction hasn’t. Edge lives in unique data, unique models, or unique execution — not better pip installs.”

Three things this does NOT rule out:

  1. Very small / niche markets. Our sample stopped at volume rank 650 (median $12M). There’s a long tail of markets with $1K–$100K volume where fewer sophisticated traders participate. Plausibly less efficient but also plausibly too illiquid to trade.
  2. Short-horizon / intraday opportunities. We only looked at daily snapshots. There’s a 12-hour fidelity cap on resolved markets that prevents us from studying intraday price action without on-chain indexing. Some of the alpha might live in short bursts we can’t see from daily data.
  3. Informational edge strategies. If we build a data source the market doesn’t have — real-time news NLP, sensor data, scraped primary sources — we’d be competing with information rather than modeling noise. This is what the article means by “unique data.”
  4. Structural / market-making strategies. Providing liquidity (spreads, rebates) rather than directional prediction. This is a different game entirely and requires latency and capital, not forecasting accuracy.
  5. Other venues. Kalshi, Manifold, niche venues — different user bases, different efficiency characteristics. Polymarket’s advantage is that it’s the deepest crypto prediction market and attracts professional traders. Smaller venues may not.

Discipline gate status

From the simulate-like-quant-desk article: beat 0.12 Brier on a live event before deploying real capital.

Current status: the market itself beats 0.12 Brier at every window we measured. To pass the gate with a strategy, we’d have to build something that’s better than the market price. Our current stack (Black-Scholes for binary contracts + Monte Carlo + no alternative information sources) has zero chance of doing that.

Implication: the right next step is NOT to start tuning strategies on this data. It’s to figure out what our edge would be before writing another line of backtest code.

What I’d do next (pending your input)

Ranked roughly by effort:

  1. Confirm the negative result is not an artifact. Run the same analysis on Kalshi to see whether the result generalizes, or whether Polymarket has some weird selection effect. If Kalshi shows similar efficiency, we’re looking at “prediction markets are hard” as a property of the venue class, not of Polymarket specifically.

  2. Scan below volume-rank 650. Our sample may have missed the truly inefficient tail. Pulling the $1K-$500K volume band and repeating the analysis would either confirm or refute the “efficiency holds on tiny markets” finding. ~2 hours of work, low risk, potentially reveals where alpha lives.

  3. Pick a specific market category where we have a plausible edge — e.g., crypto price markets (because we can build a live options-implied-volatility view that retail doesn’t have), or weather markets (because NOAA data is structured and underused), or niche sports markets in leagues the big traders ignore. Define the edge first, then build the strategy.

  4. Pivot from forecasting to market-making. Polymarket’s CLOB has rebate programs for liquidity providers. A market-making strategy doesn’t need directional edge — it needs spread capture + inventory management + latency. Totally different skill stack but potentially more forgiving than directional prediction.

  5. Abandon Polymarket and look at equities again with the autoinv toolkit. Equity markets are also efficient, but we have stronger data sources (fundamentals, earnings, alternative data) and more developed infrastructure. The PM track was supposed to be the “lower-stakes sandbox” — if it’s not meaningfully easier than equities, the rationale weakens.

Plots

What I added to the package