01-projects / automated-investing / experiments

pm1b polymarket long tail correction

Thu Apr 09 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·experiment-writeup ·status: complete

PM1b — Polymarket Long-Tail Correction

Context

The original PM1 baseline experiment concluded that Polymarket is hyperefficient across its top 650 markets by volume and that directional strategies have nowhere to go. That conclusion was correct for the markets it measured but wrong as a statement about the venue, because our pagination approach capped us at the top ~650 volume rank — which corresponded to markets with $7M+ total volume.

The founder asked to validate Option 1 (scan below volume rank 650) first. We did, and the picture flipped.

What went wrong the first time

Polymarket’s Gamma /markets endpoint has a server-side pagination limit around offset 950. Beyond that, requests return 500 Internal Server Error. Our long-tail scan using pure offset-based pagination hit this wall and silently misreported the volume distribution — we thought we were reaching into the long tail but were still looking at $6.4M–$8.8M markets.

The fix is to use the volume_num_min / volume_num_max query params, which let us target specific volume bands directly without fighting pagination. That’s now exposed in autoinv.polymarket.list_markets as dedicated parameters.

Methodology v2

Five volume bands spanning four orders of magnitude:

BandRange
tiny$10K–$100K
small$100K–$500K
mid-small$500K–$2M
mid$2M–$10M
large$10M–$100M

For each band: pull up to 200 resolved markets via volume_num_min/volume_num_max, fetch daily price history, snapshot the mid 3 days before resolution, compute Brier against the realized outcome.

Script: ../scripts/pm1_brier_volume_bands

Results

Band                           N      Median vol    Brier    vs discipline gate
-----------------------------------------------------------------------------
$10M–$100M (large)           194   $40,887,805    0.0258    PASS  (below 0.12)
$2M–$10M  (mid)              183   $ 8,645,680    0.0731    PASS  (below 0.12)
$500K–$2M (mid-small)        176   $ 1,931,099    0.1487    FAIL  (above 0.12)
$100K–$500K (small)          143   $   495,907    0.1202    FAIL  (right at gate)
$10K–$100K (tiny)             81   $    99,793    0.1493    FAIL  (above 0.12)

Corrected verdict:

Interpretation

The picture is a real “liquidity → efficiency” curve:

This is consistent with the classic microstructure result that efficiency scales with attention and capital. The top markets are where the smart money lives; the small markets run on retail intuition and information asymmetries.

Caveats I’m flagging explicitly

  1. The N is small in the tiny band (81 markets) so the 0.149 point estimate has meaningful uncertainty. Would want to see it hold up on a larger sample before committing real capital.
  2. The non-monotonic pattern ($500K–$2M at 0.149 is worse than $100K–$500K at 0.120) is weird. It could be a real microstructure regime where the mid-small band has particularly bad calibration, or it could be sampling noise. N=143 vs N=176 doesn’t resolve this cleanly — we need more data or category stratification.
  3. The lift over the majority baseline is smaller than it looks. The majority baseline in these bands is 0.22–0.24 (win rate skewed toward NO), so the lift is only 7–11 points. A strategy would need to capture a meaningful fraction of that lift after costs.
  4. Actually trading these markets is harder. Bid-ask spreads in the $100K volume band are likely much wider than the $10M band. A Brier-score advantage on paper may vanish once you pay the spread.
  5. Survivorship effects. The markets in these bands that made it into our sample are ones that resolved. Markets that were listed but never generated enough volume to have a meaningful midpoint history don’t appear. That’s a form of selection bias.

What this unlocks

The corrected finding means Option 1 was worth checking. There IS a plausible target for directional strategies, it just sits in the $100K–$2M volume band — not the top-650 we looked at first.

Three concrete next steps, in order:

  1. Category slicing. Within the $100K–$2M band, does the mispricing split by category? Sports (where crowd wisdom might be strong), crypto (where our own data is better), politics (where edge is hard), weather, etc. Halls-Moore’s backtesting biases framework says to look at category x volume cells, not just volume.

  2. Stability check. Pull 3 separate samples across time — markets resolved in H1 2025, H2 2025, Q1 2026 — and confirm the Brier gap is persistent, not a 2026-specific regime effect. If it’s stable over time, it’s a real pattern.

  3. Define the informational edge first. Per the founder’s feedback: “Define the edge first, then build the strategy.” The next step after this is NOT to write a better probability model — it’s to identify which specific data source (X real-time sentiment, weather data, crypto options volatility, etc.) could plausibly produce an edge on one category, and then measure whether that edge actually exists.

What I’d pitch as Strategy Candidate #1

A crypto-market probability engine driven by real-time options-implied volatility and X sentiment:

This is the concrete expression of the founder’s Option 3 (“informational edge”) — and it has the nice property that the input data (structured IV curves, filtered sentiment streams) is potentially salable as a data product to other agents.