01-projects / automated-investing / experiments

pm1e elon forecast

Thu Apr 09 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·experiment-writeup ·status: running (forward-test in progress)

PM1e — Elon Tweet-Count Forward-Test

Setup

Per PM1c3: Elon tweet-count markets showed Brier 0.2459 against a majority baseline of 0.2041 on N=21 resolved markets. The market was actively worse than predicting the base rate. Structural explanation: retail traders don’t pull the actual posting history.

This script tests that hypothesis live on four active events by building an empirical / parametric forecast from Elon’s recent non-reply tweet history and comparing it to the market’s current midpoint. Discipline: record predictions NOW before resolution, then score Brier when each event closes.

Cost actual: $0.02 — two xmcp getUsersPosts calls with exclude=["replies"], 100 tweets each. Zero LLM calls.

Data

Definition alignment: Polymarket’s resolution rule (verified from the event description) counts main feed posts + quote posts + reposts and excludes replies. Our xmcp filter exclude=["replies"] matches this rule.

Active events forecast

EventWindowHoursModel
April 3-102026-04-03 16:00 → 2026-04-10 16:00 UTC168Poisson(μ=252.1) — insufficient history for 7-day empirical
April 7-142026-04-07 16:00 → 2026-04-14 16:00 UTC168Poisson(μ=252.1) — same
April 9-112026-04-09 16:00 → 2026-04-11 16:00 UTC48Empirical — 15 rolling 2-day windows, mean=72.8, std=5.0
April 10-172026-04-10 16:00 → 2026-04-17 16:00 UTC168Poisson(μ=252.1) — same

Key limitation: with only 5.5 days of history we cannot build an empirical distribution of 7-day rolling windows. For the three 7-day events we fell back to Poisson projection using the observed daily rate (36.0/day × 7 = 252.1 expected). Poisson underestimates variance vs a proper Negative Binomial fit, so our probability mass is too concentrated around the mean.

Fix for next iteration: pull 2-3 more pages of history to get 15-20 days of data, enabling empirical 7-day rolling windows on all four events.

Predictions vs market mid (top disagreements)

Event                                  Bucket      Ours   Market    Delta
april-3-april-10  (resolves today)    240-259   0.467    1.000   -0.533
april-9-april-11                       65-89    0.933    0.655   +0.278
april-3-april-10                      260-279   0.274    0.000   +0.274
april-10-april-17                     240-259   0.467    0.225   +0.242
april-3-april-10                      220-239   0.196    0.000   +0.196
april-9-april-11                       40-64    0.067    0.205   -0.138
april-7-april-14                      240-259   0.467    0.335   +0.132
april-9-april-11                       90-114   0.000    0.125   -0.125

Interpretation — what each row means

April 3-10 @ 240-259, market at 1.000: this event resolves in ~hours. The market is already sure Elon posted 240-259 tweets in the window because the tracker has been updating live. Our Poisson(μ=252.1) gives 46.7% to that bucket — mathematically correct for an ex-ante forecast but irrelevant because the market already knows the answer. Don’t count this as a miss in the comparison; count it as ground truth confirmation that our rate estimate (36/day × 7 = 252) was accurate.

April 3-10 @ 260-279 and 220-239, market at 0.000: corollary — market knows the count is NOT in these buckets. Our Poisson gives them non-trivial probability because we don’t know the actual count. Our loss here is the same story: model uncertainty vs market certainty, no actual mispricing.

April 9-11 @ 65-89, ours 0.933 vs market 0.655 (+0.278): this is the real signal. The event resolves in ~21 hours. We have 15 rolling 2-day windows in our data (5.5 days can produce that many 48-hour windows with 6-hour step). Out of those 15 windows, 14 landed in 65-89. Our empirical confidence is 93%. The market says 65.5%. If our model is right, this is the biggest edge we’d act on.

April 9-11 @ 40-64, ours 0.067 vs market 0.205 (-0.138): the complementary bet — market gives too much weight to this lower bucket, our empirical distribution rules it out almost entirely.

April 7-14 @ 240-259 and April 10-17 @ 240-259: both events still have days to run. Our Poisson mode predicts 46.7% vs market 33.5% and 22.5% respectively. We disagree with the market’s distribution over these 7-day-window events, but our confidence is weak because it’s Poisson-from-projected-rate, not empirical.

Cost budget status

Well within budget. No LLM inference cost; pure frequency analysis.

Honest caveats

  1. 5.5 days of history is too thin for 7-day rolling windows. Three of four events are running with Poisson projection, not empirical. The real signal for these events depends on more data.
  2. Poisson underestimates variance. Elon’s actual variance > mean, meaning Poisson gives too-narrow spreads. A Negative Binomial fit would give more conservative probability estimates.
  3. The market at/near resolution has an information advantage we can’t match. Events resolving today or tomorrow will always show the market “correcting” toward the true answer faster than our ex-ante model. Our genuine tests are events with 2+ days remaining.
  4. N=15 rolling 2-day windows for the April 9-11 empirical is small enough that the 93% probability point estimate has meaningful uncertainty. A tighter sample would narrow that confidence interval.
  5. Definition risk. Polymarket’s resolution source is xtracker.polymarket.com, not the X API. Minor discrepancies between our count and the tracker’s count are possible — deleted posts, main-feed reply handling, timezone edge cases.

Scored bets (to be filled in after resolution)

Not betting real money — this is a paper test. Recording what we WOULD have bet and will score against the resolutions.

EventBucketDirectionOursMarketOutcomeBrier (ours)Brier (market)
April 3-10240-259market already at 1.00.4671.000(resolving now)
April 9-1165-89BUY YES (+0.278 edge)0.9330.655(resolves April 11)
April 9-1140-64BUY NO (-0.138 edge)0.0670.205(resolves April 11)
April 7-14240-259BUY YES (+0.132 edge, weak)0.4670.335(resolves April 14)
April 10-17240-259BUY YES (+0.242 edge, weak)0.4670.225(resolves April 17)

Scoring: we’ll compute Brier of our predictions vs realized outcomes, and market-midpoint Brier for the same bets, and compare. If our Brier beats the market’s by a meaningful margin across all bets, the edge is real.

Next actions

  1. Immediate (this session): save predictions CSV to experiments/outputs/pm1e_predictions_20260410-191825.csv. Done.
  2. Within hours: April 3-10 resolves around 16:00 UTC (already past). Pull the final tracker count and verify our Poisson μ projection was close.
  3. Tomorrow: April 9-11 resolves 16:00 UTC. First real scored bet against the market.
  4. Before April 14 resolution: pull more tweet history (3-4 more xmcp pages = ~500 more tweets = ~14+ days total) to enable empirical 7-day rolling windows.
  5. Weekly: re-run the forecast with updated data, record new predictions, compare to the previous snapshot.