PM1b — Polymarket Long-Tail Correction
Context
The original PM1 baseline experiment concluded that Polymarket is hyperefficient across its top 650 markets by volume and that directional strategies have nowhere to go. That conclusion was correct for the markets it measured but wrong as a statement about the venue, because our pagination approach capped us at the top ~650 volume rank — which corresponded to markets with $7M+ total volume.
The founder asked to validate Option 1 (scan below volume rank 650) first. We did, and the picture flipped.
What went wrong the first time
Polymarket’s Gamma /markets endpoint has a server-side pagination limit around offset 950. Beyond that, requests return 500 Internal Server Error. Our long-tail scan using pure offset-based pagination hit this wall and silently misreported the volume distribution — we thought we were reaching into the long tail but were still looking at $6.4M–$8.8M markets.
The fix is to use the volume_num_min / volume_num_max query params, which let us target specific volume bands directly without fighting pagination. That’s now exposed in autoinv.polymarket.list_markets as dedicated parameters.
Methodology v2
Five volume bands spanning four orders of magnitude:
| Band | Range |
|---|---|
| tiny | $10K–$100K |
| small | $100K–$500K |
| mid-small | $500K–$2M |
| mid | $2M–$10M |
| large | $10M–$100M |
For each band: pull up to 200 resolved markets via volume_num_min/volume_num_max, fetch daily price history, snapshot the mid 3 days before resolution, compute Brier against the realized outcome.
Script: ../scripts/pm1_brier_volume_bands
Results
Band N Median vol Brier vs discipline gate
-----------------------------------------------------------------------------
$10M–$100M (large) 194 $40,887,805 0.0258 PASS (below 0.12)
$2M–$10M (mid) 183 $ 8,645,680 0.0731 PASS (below 0.12)
$500K–$2M (mid-small) 176 $ 1,931,099 0.1487 FAIL (above 0.12)
$100K–$500K (small) 143 $ 495,907 0.1202 FAIL (right at gate)
$10K–$100K (tiny) 81 $ 99,793 0.1493 FAIL (above 0.12)
Corrected verdict:
- $10M+ markets: Polymarket is hyperefficient. Brier 0.07 or better. Directional strategies lose to “trust the market” baseline. No alpha from modeling.
- $2M–$10M: Still below the gate at 0.073. Marginally beatable in principle but narrow.
- $100K–$2M (three middle bands): Brier sits between 0.12 and 0.15. Above the discipline gate. This is the first real opportunity we’ve measured where a directional model could plausibly add value over the market price.
- Below $10K: bulk of this band is $0-volume markets that were never traded — not useful for strategy research even though the numerical Brier is nominally above 0.12.
Interpretation
The picture is a real “liquidity → efficiency” curve:
- Above $10M: enough liquidity to attract sophisticated traders → hyperefficient
- $2M–$10M: enough liquidity for informed traders but not as deep → still efficient
- $100K–$2M: the mispricing zone — real volume (some trading, so prices are meaningful) but below the threshold where serious players compete
- Below $100K: too illiquid to reliably price anyway
This is consistent with the classic microstructure result that efficiency scales with attention and capital. The top markets are where the smart money lives; the small markets run on retail intuition and information asymmetries.
Caveats I’m flagging explicitly
- The N is small in the tiny band (81 markets) so the 0.149 point estimate has meaningful uncertainty. Would want to see it hold up on a larger sample before committing real capital.
- The non-monotonic pattern ($500K–$2M at 0.149 is worse than $100K–$500K at 0.120) is weird. It could be a real microstructure regime where the mid-small band has particularly bad calibration, or it could be sampling noise. N=143 vs N=176 doesn’t resolve this cleanly — we need more data or category stratification.
- The lift over the majority baseline is smaller than it looks. The majority baseline in these bands is 0.22–0.24 (win rate skewed toward NO), so the lift is only 7–11 points. A strategy would need to capture a meaningful fraction of that lift after costs.
- Actually trading these markets is harder. Bid-ask spreads in the $100K volume band are likely much wider than the $10M band. A Brier-score advantage on paper may vanish once you pay the spread.
- Survivorship effects. The markets in these bands that made it into our sample are ones that resolved. Markets that were listed but never generated enough volume to have a meaningful midpoint history don’t appear. That’s a form of selection bias.
What this unlocks
The corrected finding means Option 1 was worth checking. There IS a plausible target for directional strategies, it just sits in the $100K–$2M volume band — not the top-650 we looked at first.
Three concrete next steps, in order:
-
Category slicing. Within the $100K–$2M band, does the mispricing split by category? Sports (where crowd wisdom might be strong), crypto (where our own data is better), politics (where edge is hard), weather, etc. Halls-Moore’s backtesting biases framework says to look at category x volume cells, not just volume.
-
Stability check. Pull 3 separate samples across time — markets resolved in H1 2025, H2 2025, Q1 2026 — and confirm the Brier gap is persistent, not a 2026-specific regime effect. If it’s stable over time, it’s a real pattern.
-
Define the informational edge first. Per the founder’s feedback: “Define the edge first, then build the strategy.” The next step after this is NOT to write a better probability model — it’s to identify which specific data source (X real-time sentiment, weather data, crypto options volatility, etc.) could plausibly produce an edge on one category, and then measure whether that edge actually exists.
What I’d pitch as Strategy Candidate #1
A crypto-market probability engine driven by real-time options-implied volatility and X sentiment:
- Target: Polymarket short-horizon crypto contracts (“Will BTC be above $X on date Y?”) in the $100K–$2M band
- Data inputs: (a) real-time BTC/ETH options implied volatility from Deribit, (b) X search stream filtered for relevant keywords, classified by LLM, (c) the market’s current midpoint for comparison
- Signal: synthesize a probability estimate from IV and sentiment that disagrees with the midpoint by > some threshold, trade the delta
- Paper-test discipline: beat Brier 0.10 on a held-out month of resolved crypto markets before any real money
This is the concrete expression of the founder’s Option 3 (“informational edge”) — and it has the nice property that the input data (structured IV curves, filtered sentiment streams) is potentially salable as a data product to other agents.
Related
- pm1-polymarket-baseline — the original (superseded-in-part) baseline analysis
- polymarket client — added
volume_num_min/volume_num_maxsupport as part of this experiment - ../architecture-vision — the 5-agent target; the data pipeline described above would be owned by the “Strategy Research” agent
- ../../../06-reference/2026-04-10-gemchange-simulate-like-quant-desk — the 5-layer production stack pattern the eventual strategy would follow