PM1c3 — Breaking down the “other” category
Context
PM1c + PM1d found that the $100K-$2M band’s “other” bucket (163 markets, Brier 0.116) looked potentially interesting but was opaque because “other” was a catch-all for whatever didn’t match our sports/crypto/politics/esports regex. This script drills into that residual with a second-pass classifier covering crypto-price-threshold, elon-tweets, fed-rates, political-event, entertainment, weather, tech-launch, geopolitical, and several other patterns.
Cost: $0. Polymarket endpoints only.
Method
- Re-pull the top 500 markets in the $100K-$2M band
- Apply the PM1c coarse classifier (which tags 174 as “other”)
- Apply a second-pass classifier with 14 refined regex rules
- Compute Brier per refined category with MIN_N=10
- For any category with both above-gate Brier AND a skewed win rate (not 50/50), mark as a viable candidate
Results
Refined category distribution (N=174 “other” markets):
109 unknown (residual — couldn't classify even with finer rules)
21 elon-tweets
13 crypto-adjacent
10 geopolitical
8 political-event
7 tech-launch
2 entertainment-film
2 election
1 weather
1 fed-rates
Brier per category (MIN_N=10):
Category N Median vol Brier Win% Gate
elon-tweets 21 $1,881,446 0.2459 28.57% FAIL ← target
crypto-adjacent 12 $1,791,255 0.1419 50.00% FAIL
unknown 102 $1,861,599 0.0977 32.35% PASS
Viable candidates — above the gate AND not 50/50:
- elon-tweets: N=21, Brier=0.2459, win rate=28.57%
The unknown residual (109 markets) passes the gate at 0.0977. It’s a mix of tennis finals, soccer matches, celebrity predictions, crypto price thresholds (missed by my regex), sovereign leader questions, and miscellaneous events. It looks reasonably calibrated in aggregate — no obvious sub-population drives a gap.
The elon-tweets finding is the real result. Worth digging into.
The Elon-tweets finding, in detail
All 21 markets are structurally similar: “Will Elon Musk post [X-Y] tweets from [date1] to [date2]?” — narrow-bucket predictions of Elon’s weekly tweet count. Each market covers a ~20-tweet-wide bucket (e.g., 280-299, 300-319, 320-339).
Outcome distribution: 6 YES (28.6%), 15 NO (71.4%). Skewed toward NO, which is expected because any given narrow bucket has low prior probability — there are usually ~20 buckets per week covering the plausible range, so most resolve NO.
The Brier math:
- Majority baseline (always predict the base rate, 28.6%): Brier = 0.2041
- Polymarket’s midpoint 3 days before resolution: Brier = 0.2459
- Lift: -0.0419 — the market is worse than always predicting the base rate.
Read: the market is doing worse than a “know nothing, use base rate” strategy on these narrow-bucket markets. This is the first place in our analysis where we’ve found Polymarket’s own price to be measurably underperforming a trivial baseline.
Structural reason this is plausible:
- These are retail entertainment markets, not high-stakes political/financial prediction. Professional traders are elsewhere.
- The ranges are narrow (20 tweets wide). Small errors in estimating Elon’s posting rate translate into large errors in bucket prediction.
- Most traders are guessing. Nobody with the actual tweet history is bothering to price these accurately.
- Critically, the required data is 100% public and accessible via the X API. This isn’t a “we need insider information” edge — it’s a “we need to actually look at the data” edge.
Caveats I’m flagging honestly:
- N=21 is small. The 0.2459 point estimate could easily be 0.20 ± 0.08 on a larger sample. The direction (market < majority baseline) is more reliable than the magnitude.
- Stability unknown. We haven’t checked whether elon-tweets calibration has been consistently bad across time windows. Could be a 2026-specific artifact.
- Elon is structurally unpredictable. He’s been known to tweet 100 times in a day and 0 the next. Our frequency model could whiff on regime-change events.
- Narrow buckets amplify errors. A forecaster that’s “close” on the actual count can still be catastrophically wrong on the bucket prediction if the count lands near a boundary.
Why this is the ideal PM1e prototype target
- Live markets exist right now. Four active Elon tweet count events on Polymarket:
elon-musk-of-tweets-april-3-april-10— resolves today at 16:00 UTC (30 markets)elon-musk-of-tweets-april-7-april-14— resolves April 14 (30 markets)elon-musk-of-tweets-april-9-april-11— resolves April 11 (10 markets)elon-musk-of-tweets-april-10-april-17— resolves April 17 (30 markets) We can start forward-testing immediately.
- No LLM required. This is pure frequency analysis. We pull Elon’s recent tweet history, fit a distribution (Poisson or Negative Binomial) to his weekly counts, compute P(count in bucket) for each market. No sentiment, no LLM inference, no Claude calls. Pennies per run.
- xmcp gives us the data. The X API’s
getUsersPostsendpoint returns Elon’s timestamped tweets. One call pulls ~100-200 tweets, enough to fit a distribution. Cost: ~$0.02 per snapshot. - Forward-testable. New event drops every few days. We can predict on each bucket, wait for resolution, score Brier over weeks. Build a track record as a side effect.
- Failure modes are informative. If our Brier is worse than the market’s, we know the market has information we don’t. If our Brier is better, we’ve found a repeatable edge. Either answer is useful.
Proposed PM1e prototype structure
Skill: scripts/pm1e_elon_tweet_forecast.py
- Pull active
elon-musk-of-tweets-*events via/eventsendpoint - For each event, list its bucket markets and get current midpoints
- Pull Elon’s last ~200 tweets via xmcp
getUsersPosts(elonmusk) - Compute weekly tweet counts over the last 12 weeks
- Fit a Negative Binomial distribution to weekly counts (robust to overdispersion; Elon’s variance > mean)
- For each bucket market [X, Y], compute our P(tweets ∈ [X, Y]) =
nbinom.cdf(Y, μ, α) - nbinom.cdf(X-1, μ, α)scaled to the bucket’s time window - Compare our probability to the market midpoint, record the delta
- Write predictions + midpoints to a dated CSV for later scoring
- Wait for resolution, then compute Brier for our predictions and for the market’s midpoints
- Run
/loop 6h /pm1e-elon-forecastto refresh predictions as the market evolves
Cost estimate:
- Per snapshot: ~$0.02 (one xmcp getUsersPosts call) + free Polymarket calls
- Daily cost running every 6h: ~$0.08/day
- Full two-week forward-test (until all active events resolve): under $2 total
- Zero LLM inference cost.
This is well within the “free analysis” envelope you agreed to. The ~$2 bounded cost is a rounding error against any plausible trading outcome, and even if the prototype finds zero edge we’ve built and validated a pipeline we can point at other data-rich markets.
What this means for the broader PM1 story
Recap of the full arc:
- PM1: top-650 markets look hyperefficient → we were wrong, pagination capped us
- PM1b: volume bands reveal $100K-$2M is the mispricing zone (Brier 0.12-0.15)
- PM1c: splitting by category shows sports (N=200, Brier 0.20) dominates the band
- PM1c2: sports Brier is stuck at ~0.20 for inherent 50/50 reasons; spread impact is zero
- PM1cd (stability): sports Brier swings 0.11-0.22 across time windows, signal not persistent
- PM1c3 (this doc): drilling into “other” reveals elon-tweets as the first real mispriced sub-population, with a plausible structural explanation and live markets to test on
The overall shape: most of Polymarket is efficient, most “mispricing” is actually game variance, but there are specific small retail-dominated sub-categories where the market is meaningfully worse than trivial baselines. Elon-tweets is the first one we’ve found. There may be others (crypto-price-threshold markets missed by our regex, narrow-bucket social-media metrics, long-tail celebrity events).
The strategy implication isn’t “trade everything in the $100K-$2M band.” It’s “find the specific retail-entertainment niches where the crowd is guessing and systematic data access gives us an edge.” That’s a much narrower but potentially repeatable thesis.
Related
- pm1cd-category-and-stability — the PM1c analysis this drills into
- pm1b-polymarket-long-tail-correction — the volume-band finding
- pm1-polymarket-baseline — the original (superseded) baseline
- ../../../06-reference/concepts/brier-score — reference
- ../architecture-vision — 5-agent target; PM1e would be a Strategy Research + Paper Testing cycle