PM1e — Elon Tweet-Count Forward-Test
Setup
Per PM1c3: Elon tweet-count markets showed Brier 0.2459 against a majority baseline of 0.2041 on N=21 resolved markets. The market was actively worse than predicting the base rate. Structural explanation: retail traders don’t pull the actual posting history.
This script tests that hypothesis live on four active events by building an empirical / parametric forecast from Elon’s recent non-reply tweet history and comparing it to the market’s current midpoint. Discipline: record predictions NOW before resolution, then score Brier when each event closes.
Cost actual: $0.02 — two xmcp getUsersPosts calls with exclude=["replies"], 100 tweets each. Zero LLM calls.
Data
- Source: xmcp getUsersPosts for @elonmusk (user ID 44196397),
exclude=["replies"] - Pages fetched: 2 × 100 tweets = 200 (199 after dedup)
- Coverage: 2026-04-05 06:08 UTC → 2026-04-10 18:44 UTC (5.52 days)
- Rate: 36.0 non-reply posts per day
- Breakdown: 12 originals, 77 quotes, 110 retweets, 0 replies
Definition alignment: Polymarket’s resolution rule (verified from the event description) counts main feed posts + quote posts + reposts and excludes replies. Our xmcp filter exclude=["replies"] matches this rule.
Active events forecast
| Event | Window | Hours | Model |
|---|---|---|---|
| April 3-10 | 2026-04-03 16:00 → 2026-04-10 16:00 UTC | 168 | Poisson(μ=252.1) — insufficient history for 7-day empirical |
| April 7-14 | 2026-04-07 16:00 → 2026-04-14 16:00 UTC | 168 | Poisson(μ=252.1) — same |
| April 9-11 | 2026-04-09 16:00 → 2026-04-11 16:00 UTC | 48 | Empirical — 15 rolling 2-day windows, mean=72.8, std=5.0 |
| April 10-17 | 2026-04-10 16:00 → 2026-04-17 16:00 UTC | 168 | Poisson(μ=252.1) — same |
Key limitation: with only 5.5 days of history we cannot build an empirical distribution of 7-day rolling windows. For the three 7-day events we fell back to Poisson projection using the observed daily rate (36.0/day × 7 = 252.1 expected). Poisson underestimates variance vs a proper Negative Binomial fit, so our probability mass is too concentrated around the mean.
Fix for next iteration: pull 2-3 more pages of history to get 15-20 days of data, enabling empirical 7-day rolling windows on all four events.
Predictions vs market mid (top disagreements)
Event Bucket Ours Market Delta
april-3-april-10 (resolves today) 240-259 0.467 1.000 -0.533
april-9-april-11 65-89 0.933 0.655 +0.278
april-3-april-10 260-279 0.274 0.000 +0.274
april-10-april-17 240-259 0.467 0.225 +0.242
april-3-april-10 220-239 0.196 0.000 +0.196
april-9-april-11 40-64 0.067 0.205 -0.138
april-7-april-14 240-259 0.467 0.335 +0.132
april-9-april-11 90-114 0.000 0.125 -0.125
Interpretation — what each row means
April 3-10 @ 240-259, market at 1.000: this event resolves in ~hours. The market is already sure Elon posted 240-259 tweets in the window because the tracker has been updating live. Our Poisson(μ=252.1) gives 46.7% to that bucket — mathematically correct for an ex-ante forecast but irrelevant because the market already knows the answer. Don’t count this as a miss in the comparison; count it as ground truth confirmation that our rate estimate (36/day × 7 = 252) was accurate.
April 3-10 @ 260-279 and 220-239, market at 0.000: corollary — market knows the count is NOT in these buckets. Our Poisson gives them non-trivial probability because we don’t know the actual count. Our loss here is the same story: model uncertainty vs market certainty, no actual mispricing.
April 9-11 @ 65-89, ours 0.933 vs market 0.655 (+0.278): this is the real signal. The event resolves in ~21 hours. We have 15 rolling 2-day windows in our data (5.5 days can produce that many 48-hour windows with 6-hour step). Out of those 15 windows, 14 landed in 65-89. Our empirical confidence is 93%. The market says 65.5%. If our model is right, this is the biggest edge we’d act on.
April 9-11 @ 40-64, ours 0.067 vs market 0.205 (-0.138): the complementary bet — market gives too much weight to this lower bucket, our empirical distribution rules it out almost entirely.
April 7-14 @ 240-259 and April 10-17 @ 240-259: both events still have days to run. Our Poisson mode predicts 46.7% vs market 33.5% and 22.5% respectively. We disagree with the market’s distribution over these 7-day-window events, but our confidence is weak because it’s Poisson-from-projected-rate, not empirical.
Cost budget status
- Spent this run: $0.02 (2 xmcp calls)
- Per-snapshot cost: ~$0.02
- If we snapshot once per day through April 17: 7 × $0.02 = $0.14
- If we snapshot every 6h: 28 × $0.02 = $0.56
- Full two-week forward test target from PM1c3 proposal: under $2
Well within budget. No LLM inference cost; pure frequency analysis.
Honest caveats
- 5.5 days of history is too thin for 7-day rolling windows. Three of four events are running with Poisson projection, not empirical. The real signal for these events depends on more data.
- Poisson underestimates variance. Elon’s actual variance > mean, meaning Poisson gives too-narrow spreads. A Negative Binomial fit would give more conservative probability estimates.
- The market at/near resolution has an information advantage we can’t match. Events resolving today or tomorrow will always show the market “correcting” toward the true answer faster than our ex-ante model. Our genuine tests are events with 2+ days remaining.
- N=15 rolling 2-day windows for the April 9-11 empirical is small enough that the 93% probability point estimate has meaningful uncertainty. A tighter sample would narrow that confidence interval.
- Definition risk. Polymarket’s resolution source is xtracker.polymarket.com, not the X API. Minor discrepancies between our count and the tracker’s count are possible — deleted posts, main-feed reply handling, timezone edge cases.
Scored bets (to be filled in after resolution)
Not betting real money — this is a paper test. Recording what we WOULD have bet and will score against the resolutions.
| Event | Bucket | Direction | Ours | Market | Outcome | Brier (ours) | Brier (market) |
|---|---|---|---|---|---|---|---|
| April 3-10 | 240-259 | market already at 1.0 | 0.467 | 1.000 | (resolving now) | ||
| April 9-11 | 65-89 | BUY YES (+0.278 edge) | 0.933 | 0.655 | (resolves April 11) | ||
| April 9-11 | 40-64 | BUY NO (-0.138 edge) | 0.067 | 0.205 | (resolves April 11) | ||
| April 7-14 | 240-259 | BUY YES (+0.132 edge, weak) | 0.467 | 0.335 | (resolves April 14) | ||
| April 10-17 | 240-259 | BUY YES (+0.242 edge, weak) | 0.467 | 0.225 | (resolves April 17) |
Scoring: we’ll compute Brier of our predictions vs realized outcomes, and market-midpoint Brier for the same bets, and compare. If our Brier beats the market’s by a meaningful margin across all bets, the edge is real.
Next actions
- Immediate (this session): save predictions CSV to
experiments/outputs/pm1e_predictions_20260410-191825.csv. Done. - Within hours: April 3-10 resolves around 16:00 UTC (already past). Pull the final tracker count and verify our Poisson μ projection was close.
- Tomorrow: April 9-11 resolves 16:00 UTC. First real scored bet against the market.
- Before April 14 resolution: pull more tweet history (3-4 more xmcp pages = ~500 more tweets = ~14+ days total) to enable empirical 7-day rolling windows.
- Weekly: re-run the forecast with updated data, record new predictions, compare to the previous snapshot.
Related
- pm1c3-other-breakdown — the finding that motivated this experiment
- ../autoinv/polymarket — Polymarket client
- ../../../06-reference/concepts/brier-score — evaluation metric reference
- ../architecture-vision — PM1e is a Strategy Research + Paper Testing cycle per the 5-agent vision