PM1e — Elon Tweet-Count Forward-Test

Setup

Per PM1c3: Elon tweet-count markets showed Brier 0.2459 against a majority baseline of 0.2041 on N=21 resolved markets. The market was actively worse than predicting the base rate. Structural explanation: retail traders don’t pull the actual posting history.

This script tests that hypothesis live on four active events by building an empirical / parametric forecast from Elon’s recent non-reply tweet history and comparing it to the market’s current midpoint. Discipline: record predictions NOW before resolution, then score Brier when each event closes.

Cost actual: $0.02 — two xmcp getUsersPosts calls with exclude=["replies"], 100 tweets each. Zero LLM calls.

Data

Source: xmcp getUsersPosts for @elonmusk (user ID 44196397), exclude=["replies"]
Pages fetched: 2 × 100 tweets = 200 (199 after dedup)
Coverage: 2026-04-05 06:08 UTC → 2026-04-10 18:44 UTC (5.52 days)
Rate: 36.0 non-reply posts per day
Breakdown: 12 originals, 77 quotes, 110 retweets, 0 replies

Definition alignment: Polymarket’s resolution rule (verified from the event description) counts main feed posts + quote posts + reposts and excludes replies. Our xmcp filter exclude=["replies"] matches this rule.

Active events forecast

Event	Window	Hours	Model
April 3-10	2026-04-03 16:00 → 2026-04-10 16:00 UTC	168	Poisson(μ=252.1) — insufficient history for 7-day empirical
April 7-14	2026-04-07 16:00 → 2026-04-14 16:00 UTC	168	Poisson(μ=252.1) — same
April 9-11	2026-04-09 16:00 → 2026-04-11 16:00 UTC	48	Empirical — 15 rolling 2-day windows, mean=72.8, std=5.0
April 10-17	2026-04-10 16:00 → 2026-04-17 16:00 UTC	168	Poisson(μ=252.1) — same

Key limitation: with only 5.5 days of history we cannot build an empirical distribution of 7-day rolling windows. For the three 7-day events we fell back to Poisson projection using the observed daily rate (36.0/day × 7 = 252.1 expected). Poisson underestimates variance vs a proper Negative Binomial fit, so our probability mass is too concentrated around the mean.

Fix for next iteration: pull 2-3 more pages of history to get 15-20 days of data, enabling empirical 7-day rolling windows on all four events.

Predictions vs market mid (top disagreements)

Event                                  Bucket      Ours   Market    Delta
april-3-april-10  (resolves today)    240-259   0.467    1.000   -0.533
april-9-april-11                       65-89    0.933    0.655   +0.278
april-3-april-10                      260-279   0.274    0.000   +0.274
april-10-april-17                     240-259   0.467    0.225   +0.242
april-3-april-10                      220-239   0.196    0.000   +0.196
april-9-april-11                       40-64    0.067    0.205   -0.138
april-7-april-14                      240-259   0.467    0.335   +0.132
april-9-april-11                       90-114   0.000    0.125   -0.125

Interpretation — what each row means

April 3-10 @ 240-259, market at 1.000: this event resolves in ~hours. The market is already sure Elon posted 240-259 tweets in the window because the tracker has been updating live. Our Poisson(μ=252.1) gives 46.7% to that bucket — mathematically correct for an ex-ante forecast but irrelevant because the market already knows the answer. Don’t count this as a miss in the comparison; count it as ground truth confirmation that our rate estimate (36/day × 7 = 252) was accurate.

April 3-10 @ 260-279 and 220-239, market at 0.000: corollary — market knows the count is NOT in these buckets. Our Poisson gives them non-trivial probability because we don’t know the actual count. Our loss here is the same story: model uncertainty vs market certainty, no actual mispricing.

April 9-11 @ 65-89, ours 0.933 vs market 0.655 (+0.278): this is the real signal. The event resolves in ~21 hours. We have 15 rolling 2-day windows in our data (5.5 days can produce that many 48-hour windows with 6-hour step). Out of those 15 windows, 14 landed in 65-89. Our empirical confidence is 93%. The market says 65.5%. If our model is right, this is the biggest edge we’d act on.

April 9-11 @ 40-64, ours 0.067 vs market 0.205 (-0.138): the complementary bet — market gives too much weight to this lower bucket, our empirical distribution rules it out almost entirely.

April 7-14 @ 240-259 and April 10-17 @ 240-259: both events still have days to run. Our Poisson mode predicts 46.7% vs market 33.5% and 22.5% respectively. We disagree with the market’s distribution over these 7-day-window events, but our confidence is weak because it’s Poisson-from-projected-rate, not empirical.

Cost budget status

Spent this run: $0.02 (2 xmcp calls)
Per-snapshot cost: ~$0.02
If we snapshot once per day through April 17: 7 × $0.02 = $0.14
If we snapshot every 6h: 28 × $0.02 = $0.56
Full two-week forward test target from PM1c3 proposal: under $2

Well within budget. No LLM inference cost; pure frequency analysis.

Honest caveats

5.5 days of history is too thin for 7-day rolling windows. Three of four events are running with Poisson projection, not empirical. The real signal for these events depends on more data.
Poisson underestimates variance. Elon’s actual variance > mean, meaning Poisson gives too-narrow spreads. A Negative Binomial fit would give more conservative probability estimates.
The market at/near resolution has an information advantage we can’t match. Events resolving today or tomorrow will always show the market “correcting” toward the true answer faster than our ex-ante model. Our genuine tests are events with 2+ days remaining.
N=15 rolling 2-day windows for the April 9-11 empirical is small enough that the 93% probability point estimate has meaningful uncertainty. A tighter sample would narrow that confidence interval.
Definition risk. Polymarket’s resolution source is xtracker.polymarket.com, not the X API. Minor discrepancies between our count and the tracker’s count are possible — deleted posts, main-feed reply handling, timezone edge cases.

Scored bets (to be filled in after resolution)

Not betting real money — this is a paper test. Recording what we WOULD have bet and will score against the resolutions.

Event	Bucket	Direction	Ours	Market	Outcome
April 3-10	240-259	market already at 1.0	0.467	1.000	(resolving now)
April 9-11	65-89	BUY YES (+0.278 edge)	0.933	0.655	(resolves April 11)
April 9-11	40-64	BUY NO (-0.138 edge)	0.067	0.205	(resolves April 11)
April 7-14	240-259	BUY YES (+0.132 edge, weak)	0.467	0.335	(resolves April 14)
April 10-17	240-259	BUY YES (+0.242 edge, weak)	0.467	0.225	(resolves April 17)

Scoring: we’ll compute Brier of our predictions vs realized outcomes, and market-midpoint Brier for the same bets, and compare. If our Brier beats the market’s by a meaningful margin across all bets, the edge is real.

Next actions

Immediate (this session): save predictions CSV to experiments/outputs/pm1e_predictions_20260410-191825.csv. Done.
Within hours: April 3-10 resolves around 16:00 UTC (already past). Pull the final tracker count and verify our Poisson μ projection was close.
Tomorrow: April 9-11 resolves 16:00 UTC. First real scored bet against the market.
Before April 14 resolution: pull more tweet history (3-4 more xmcp pages = ~500 more tweets = ~14+ days total) to enable empirical 7-day rolling windows.
Weekly: re-run the forecast with updated data, record new predictions, compare to the previous snapshot.

pm1c3-other-breakdown — the finding that motivated this experiment
../autoinv/polymarket — Polymarket client
../../../06-reference/concepts/brier-score — evaluation metric reference
../architecture-vision — PM1e is a Strategy Research + Paper Testing cycle per the 5-agent vision