01-projects / automated-investing / experiments

pm1cd category and stability

Thu Apr 09 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·experiment-writeup ·status: complete

PM1c + PM1d — Category slicing and stability check

Context

PM1b identified the $100K-$2M volume band as the first place we’ve measured mispricing on Polymarket (Brier 0.12-0.15 vs a 0.12 discipline gate). The founder asked to bundle the next three steps: category slicing (which kinds of markets drive it), stability check (is it persistent across time), and an X-sentiment prototype (can we build an edge).

This doc covers category slicing and stability. The X-sentiment prototype (PM1e) is deliberately NOT executed yet — the stability results changed my read of whether it’s worth the xmcp/LLM spend. See the “Decision point” section below.

Cost so far: $0. All analysis used the free Polymarket Gamma and CLOB endpoints.

PM1c — Category slicing

Setup. Pulled the top 500 resolved markets in the $100K-$2M band via volume_num_min=100000, volume_num_max=2000000, order=volumeNum. Polymarket’s own category field is empty on 490/491 markets so we infer categories from slug + events[0].ticker prefixes via a small regex classifier. Rule set lives in pm1c_category_slicing.py and covers sports, crypto, politics, tennis, esports, cricket, olympics, macro-fed, weather, entertainment, space-tech, tech, “other”.

Category distribution (N=491 total):

sports      220  (45%)
other       174  (35%)
politics     30   (6%)
esports      25   (5%)
crypto       25   (5%)
olympics      4
tech          4
tennis        4
space-tech    3
weather       1
macro-fed     1

Brier by category (min N=15 gate, 3 days before resolution):

Category              N     Median vol    Brier    Gate    Win rate
sports              200  $1,857,763    0.1989    FAIL    50.50%
crypto               21  $1,882,647    0.1876    FAIL    61.90%
other               163  $1,861,514    0.1159    PASS    31.90%
politics             28  $1,881,371    0.0057    PASS    21.43%

Key observations:

PM1d — Stability across time

Setup. Same $100K-$2M band. Bucket markets into three windows by endDate: H1 2025, H2 2025, Q1 2026. Compute Brier per (window × category) cell, require N ≥ 15 per cell. Only two categories clear the noise floor in at least two windows: sports and other.

Results:

Brier by window × category
category    H1 2025    H2 2025    Q1 2026
sports       0.1073     0.2216     0.1987
other        0.0796     0.1179     0.1562

Sample sizes (window × category):

category    H1 2025    H2 2025    Q1 2026
sports         22         96         80
other          27         57         56

Stability verdict:

What this means

The “sports has alpha in the $100K-$2M band” hypothesis is weaker than it looked from PM1c alone. Two plausible explanations:

  1. Sample composition effect. The top-500 markets from each window are biased toward whatever events had high volume at that time. In H1 2025 the top sports markets might be elite playoff games (hyperefficient); in H2 2025 they include a broader sweep of regular-season games (less efficient). The “mispricing” isn’t a property of the band, it’s a property of the mix of games within the band.

  2. Regime change. Polymarket’s user base grew and changed across 2025. Calibration may genuinely have degraded because new, less sophisticated users joined and bid prices off-fair.

I can’t distinguish these two explanations from this data alone. Both explanations argue against a simple “buy all sports markets in this band” strategy. If it’s (1), the alpha lives in a specific sub-population we haven’t identified yet. If it’s (2), the alpha exists but is regime-dependent and we need continuous recalibration.

The “other” category also trends unstable, which is less interpretable because “other” is whatever didn’t match our regex — it’s a catch-all with unclear composition.

Politics and crypto have too few markets per window to produce meaningful per-window Brier numbers.

Decision point — why I’m pausing before PM1e

The founder’s guidance was to be cost-conscious: “if our margin is thin it could quietly eat into our return.” Given what PM1d just showed, here’s my read:

Evidence for spending on PM1e (X-sentiment prototype):

Evidence against spending on PM1e right now:

What I want to do BEFORE spending on PM1e:

  1. Sub-category analysis within sports — split sports into NBA / NFL / MLB / soccer / esports / tennis / other. Which specific leagues drive the high Brier? That could reveal the real alpha population. (Free.)
  2. Regular-season vs playoff split — hypothesis: playoff markets are well-priced because volume and attention concentrate there, while regular-season markets are where the mispricing hides. (Free.)
  3. Spread-aware Brier — account for bid-ask spread at the time of the snapshot. Even if the midpoint shows Brier 0.20, if the spread is 10 cents wide, the actual tradeable prices may be well within discipline range. The narrow-margin concern applies doubly. (Cheap — another Polymarket API call per market.)

Only after those three are done would I feel comfortable spending xmcp budget on PM1e. And at that point I’d pick the prototype market deliberately — from the specific sub-population the analysis identifies as the alpha target.

Cost budget for future PM1e

When we do run PM1e, here’s the cost frame I’m planning against:

Per-prototype budget (single market):

Scaled-strategy budget (what it’d cost to run continuously):

This budget math is going to be a key part of any go/no-go decision. Putting it here so we have a reference.

Plots