PM1 (Kalshi) — Calibration Baseline
⚠ Correction (2026-04-10): The MLB verdict in this document (“biggest red flag, real mispricing target”) was wrong. The baseline used an adaptive snapshot at market midlife, which for short-lived (~48h) MLB markets lands around 24h before close — before price discovery matures. A follow-up deep dive at 1h before close shows MLB actually has positive lift of +0.058. The “Kalshi MLB” parallel to the Polymarket Elon tweet finding was an artifact, not a real alpha lead. See pm1-kalshi-mlb-deepdive for the corrected analysis. The rest of this document (NBA as surprisingly efficient, CPI as hyperefficient, overall aggregate) still holds; only the MLB/EPL/LaLiga/MLB-Total negative-lift claims are suspect.
Mirror of the Polymarket PM1 analysis, run against Kalshi. Same question: how well does Kalshi’s own midpoint predict outcomes at ~3 days before close, and where is the mispricing?
Cost: $0. Anonymous Kalshi read endpoints only.
Methodology
Sample: 11 target series ticker prefixes covering sports (KXNBAGAME, KXNFLGAME, KXMLBGAME, KXNHLGAME, KXEPLGAME, KXLALIGAGAME), totals markets (KXNBATOTAL, KXNFLTOTAL, KXMLBTOTAL), and economic indicators (KXECONSTATCPIYOY, KXECONSTATCORECPIYOY). Final total: 346 settled markets with Brier-scoreable snapshots.
Why series-ticker targeting instead of pagination: Kalshi’s default status=settled sort returns a flood of multi-variate parlay markets (KXMVE*) that aren’t clean binary events. Filtering by specific series gives us the sports and econ markets we actually want to compare against Polymarket.
Adaptive snapshot methodology: NBA/MLB/NHL markets open only ~48 hours before close, so “3 days before close” is before the market exists. I scale the snapshot offset to the market’s lifetime:
- Markets living ≥6 days → snapshot at close - 3 days (original Polymarket methodology)
- Shorter markets → snapshot at approximately the market’s midpoint (lifetime/2)
This keeps the comparison fair across different market duration regimes — we’re asking “what did the market price when it was about halfway through its life?” rather than insisting on a fixed offset that doesn’t exist for short-lived markets.
Resolution source: Kalshi’s result field (‘yes’/‘no’). Snapshot price is price.close_dollars from the hourly candlestick closest to the target time, fallback to yes_bid/yes_ask midpoint.
Results
Ranked by Brier (worst calibration = biggest opportunity):
Label Series N Median vol Brier Gate Win% Lift
MLB KXMLBGAME 40 $ 1,239,290 0.3051 FAIL 47.5% -0.056
EPL KXEPLGAME 38 $ 383,954 0.2477 FAIL 39.5% -0.009
MLB-Total KXMLBTOTAL 28 $ 46,782 0.2400 FAIL 35.7% -0.010
LaLiga KXLALIGAGAME 43 $ 267,242 0.2289 FAIL 34.9% -0.002
NFL KXNFLGAME 13 $28,045,224 0.1942 FAIL 46.2% +0.054
NHL KXNHLGAME 40 $ 271,104 0.1828 FAIL 45.0% +0.065
NFL-Total KXNFLTOTAL 29 $ 512,179 0.1787 FAIL 34.5% +0.047
NBA-Total KXNBATOTAL 42 $ 95,260 0.1609 FAIL 33.3% +0.061
Core-CPI-YoY KXECONSTATCORECPIYOY 11 $ 2,742 0.1472 FAIL 18.2% +0.002
NBA KXNBAGAME 40 $ 3,097,706 0.1132 PASS 55.0% +0.134
CPI-YoY KXECONSTATCPIYOY 22 $ 15,674 0.0495 PASS 9.1% +0.033
Aggregate (N=346): Brier 0.1942 vs majority baseline 0.2243. Overall lift of +0.030.
The headline findings
1. Kalshi’s NBA game markets are remarkably well-calibrated. Brier 0.1132 with +0.134 lift over majority baseline — the only sports series that passes the 0.12 discipline gate. At median volume $3.1M per market (N=40), this is a substantial, liquid, accurately-priced market. If you want to copy-trade Kalshi’s own NBA prices, you’re doing better than pure baseline prediction by 13 percentage points.
2. CPI-YoY is hyperefficient. Brier 0.0495 on an 11% base rate. This mirrors the Polymarket finding that political/economic markets are the best-calibrated across both venues. Informational edge in these markets is extremely hard to get.
3. MLB is the biggest red flag. Brier 0.3051 with negative lift of -0.056 — the market is actively worse than always predicting the base rate. Similar structural shape to the Elon tweet-count finding on Polymarket: the market is doing worse than a trivial baseline. Two explanations:
- Baseball has high inherent variance (161 games, pitcher rotation, bullpen management, etc.) and the market is overconfident relative to that variance
- The snapshot timing is wrong for MLB (opens ~48h before close, snapshot lands in the early market life before meaningful price discovery)
The first explanation makes MLB an actual alpha target worth investigating further. The second is a methodology bug we should rule out by probing MLB snapshots specifically.
4. EPL and MLB-Total show mild negative lift (-0.009 and -0.010) but these are small enough that they’re likely sampling noise rather than real signal.
5. NBA-Total, NFL, NHL, NFL-Total sit in the “inherently coin-flippy” middle: Brier 0.16-0.20 with positive lift of +0.05 to +0.07. The market beats the baseline by a small margin. Inherent variance of individual games limits how low Brier can go — a perfect forecaster on 50/50 markets can’t get below ~0.20, so these numbers are roughly what an efficient market should look like.
Comparison to the Polymarket PM1 findings
| Metric | Polymarket (top 100) | Kalshi (aggregate) | Kalshi (NBA) |
|---|---|---|---|
| Brier | 0.033 | 0.194 | 0.113 |
| Lift over majority | ~0.13 | +0.030 | +0.134 |
| Discipline gate | PASS | FAIL | PASS |
| Dominant category | Politics/election | Sports | Sports (NBA) |
| Dominant win rate | ~20% | ~45% | 55% |
The direct numerical comparison is apples-to-oranges because:
- Polymarket top-volume is dominated by skewed-outcome political markets (~20% win rates, inherent Brier floor very low)
- Kalshi top-volume is dominated by 50/50 sports markets (inherent Brier floor ~0.20)
The fair comparison is lift, not raw Brier. On that basis:
- Polymarket top-volume: lift ~0.13
- Kalshi NBA: lift +0.134 (nearly identical)
- Kalshi aggregate: lift +0.030 (much weaker, dragged down by MLB/EPL/LaLiga)
Kalshi NBA is just as efficient as the top of Polymarket. Both venues have categories where the informed-trader population keeps the market honest. The venue isn’t the efficiency story — the category within the venue is.
The “Kalshi MLB = Polymarket Elon-tweets” parallel
The most interesting finding is that each venue has at least one identifiable sub-population where the market is measurably worse than trivial baseline:
| Venue | Sub-population | Brier | Majority baseline | Lift | N |
|---|---|---|---|---|---|
| Polymarket | Elon tweet counts | 0.246 | 0.204 | -0.042 | 21 |
| Kalshi | MLB games | 0.305 | 0.249 | -0.056 | 40 |
Both are retail-dominated markets where a structural explanation for the mispricing is plausible:
- Elon tweets: narrow buckets (20 tweets wide), retail entertainment, no-one pulls the actual history
- MLB games: high-variance sport, 162-game grind, retail bettors overconfident
These are the first two places in our entire analysis where a systematic strategy has a plausible path that isn’t immediately crushed by market efficiency. Worth deeper investigation before the arbitrage work.
Caveats I’m flagging honestly
-
NBA sample is suspicious. The +0.134 lift is striking, but N=40 sports markets is a moderate sample and the point estimate has real uncertainty. Worth re-running with 100+ NBA markets to confirm.
-
My snapshot methodology is “market midpoint in time” not “3 days before close” because of short-lived sports markets. This means the Kalshi numbers aren’t directly comparable to the Polymarket “3 days before close” numbers — Kalshi markets in the dataset are snapshotted at different points in their lifecycle depending on duration.
-
MLB’s Brier 0.3051 could partially be a snapshot timing artifact. At market midpoint (~24h before close for a 48h MLB market), there may be very little price discovery compared to 3 days before a week-long Polymarket market. We’d want to confirm by snapshotting MLB at “1 hour before close” (when the market has matured) and seeing if the calibration improves.
-
NFL has only N=13 — too small for strong conclusions. NFL markets on Kalshi have $28M median volume, so individual markets exist but the sample here is limited by what was in our recent settled sweep.
-
Kalshi’s short-lived markets make the Brier test structurally different from Polymarket’s longer-running ones. A 48-hour market has less room for opinion to diverge from eventual truth than a week-long one, which may explain why sports Brier on Kalshi tends to cluster tighter around game-inherent variance.
Actionable next steps
- Confirm the MLB finding on a larger sample (target N=150-200) and with a “close - 1 hour” snapshot. If the negative lift holds, MLB is a real alpha target.
- Confirm NBA’s +0.134 lift on a larger sample. If it holds, this is a surprisingly efficient market we can use as a reference for benchmarking our own forecasters.
- Measure Kalshi NBA at multiple time points (e.g., 48h, 24h, 12h, 1h before close) to build a calibration curve. If the curve converges to the right answer quickly, it tells us something about where in the market lifecycle the alpha lives.
- Start the market-matching pipeline (independent of price analysis, per the founder’s guidance). Build a fuzzy matcher that pairs Polymarket markets to Kalshi markets by title/category/date, so we can run arbitrage analysis continuously on matched pairs when prices diverge mid-window.
- Skip live sports arbitrage for now. Both venues are reasonably efficient on NBA and NFL; cross-venue pricing gaps are unlikely to open wide enough to overcome transaction costs and slippage. The more interesting arb candidates are econ/political events that both venues host (CPI, Fed decisions, election outcomes) where a brief information asymmetry between venues could produce real gaps.
Related
- pm1-polymarket-baseline — the Polymarket side of this comparison (partially superseded)
- pm1b-polymarket-long-tail-correction
- pm1cd-category-and-stability
- pm1c3-other-breakdown — the Elon tweet finding that parallels the MLB finding here
- pm1e-elon-forecast — the forward test currently running
- ../autoinv/kalshi — the client used for this analysis
- ../architecture-vision — target 5-agent shape