PM1 (Kalshi) — Calibration Baseline

⚠ Correction (2026-04-10): The MLB verdict in this document (“biggest red flag, real mispricing target”) was wrong. The baseline used an adaptive snapshot at market midlife, which for short-lived (~48h) MLB markets lands around 24h before close — before price discovery matures. A follow-up deep dive at 1h before close shows MLB actually has positive lift of +0.058. The “Kalshi MLB” parallel to the Polymarket Elon tweet finding was an artifact, not a real alpha lead. See pm1-kalshi-mlb-deepdive for the corrected analysis. The rest of this document (NBA as surprisingly efficient, CPI as hyperefficient, overall aggregate) still holds; only the MLB/EPL/LaLiga/MLB-Total negative-lift claims are suspect.

Mirror of the Polymarket PM1 analysis, run against Kalshi. Same question: how well does Kalshi’s own midpoint predict outcomes at ~3 days before close, and where is the mispricing?

Cost: $0. Anonymous Kalshi read endpoints only.

Methodology

Sample: 11 target series ticker prefixes covering sports (KXNBAGAME, KXNFLGAME, KXMLBGAME, KXNHLGAME, KXEPLGAME, KXLALIGAGAME), totals markets (KXNBATOTAL, KXNFLTOTAL, KXMLBTOTAL), and economic indicators (KXECONSTATCPIYOY, KXECONSTATCORECPIYOY). Final total: 346 settled markets with Brier-scoreable snapshots.

Why series-ticker targeting instead of pagination: Kalshi’s default status=settled sort returns a flood of multi-variate parlay markets (KXMVE*) that aren’t clean binary events. Filtering by specific series gives us the sports and econ markets we actually want to compare against Polymarket.

Adaptive snapshot methodology: NBA/MLB/NHL markets open only ~48 hours before close, so “3 days before close” is before the market exists. I scale the snapshot offset to the market’s lifetime:

Markets living ≥6 days → snapshot at close - 3 days (original Polymarket methodology)
Shorter markets → snapshot at approximately the market’s midpoint (lifetime/2)

This keeps the comparison fair across different market duration regimes — we’re asking “what did the market price when it was about halfway through its life?” rather than insisting on a fixed offset that doesn’t exist for short-lived markets.

Resolution source: Kalshi’s result field (‘yes’/‘no’). Snapshot price is price.close_dollars from the hourly candlestick closest to the target time, fallback to yes_bid/yes_ask midpoint.

Results

Ranked by Brier (worst calibration = biggest opportunity):

Label            Series                     N    Median vol      Brier    Gate     Win%    Lift
MLB              KXMLBGAME                 40  $ 1,239,290    0.3051    FAIL    47.5%    -0.056
EPL              KXEPLGAME                 38  $   383,954    0.2477    FAIL    39.5%    -0.009
MLB-Total        KXMLBTOTAL                28  $    46,782    0.2400    FAIL    35.7%    -0.010
LaLiga           KXLALIGAGAME              43  $   267,242    0.2289    FAIL    34.9%    -0.002
NFL              KXNFLGAME                 13  $28,045,224    0.1942    FAIL    46.2%    +0.054
NHL              KXNHLGAME                 40  $   271,104    0.1828    FAIL    45.0%    +0.065
NFL-Total        KXNFLTOTAL                29  $   512,179    0.1787    FAIL    34.5%    +0.047
NBA-Total        KXNBATOTAL                42  $    95,260    0.1609    FAIL    33.3%    +0.061
Core-CPI-YoY     KXECONSTATCORECPIYOY      11  $     2,742    0.1472    FAIL    18.2%    +0.002
NBA              KXNBAGAME                 40  $ 3,097,706    0.1132    PASS    55.0%    +0.134
CPI-YoY          KXECONSTATCPIYOY          22  $    15,674    0.0495    PASS     9.1%    +0.033

Aggregate (N=346): Brier 0.1942 vs majority baseline 0.2243. Overall lift of +0.030.

The headline findings

1. Kalshi’s NBA game markets are remarkably well-calibrated. Brier 0.1132 with +0.134 lift over majority baseline — the only sports series that passes the 0.12 discipline gate. At median volume $3.1M per market (N=40), this is a substantial, liquid, accurately-priced market. If you want to copy-trade Kalshi’s own NBA prices, you’re doing better than pure baseline prediction by 13 percentage points.

2. CPI-YoY is hyperefficient. Brier 0.0495 on an 11% base rate. This mirrors the Polymarket finding that political/economic markets are the best-calibrated across both venues. Informational edge in these markets is extremely hard to get.

3. MLB is the biggest red flag. Brier 0.3051 with negative lift of -0.056 — the market is actively worse than always predicting the base rate. Similar structural shape to the Elon tweet-count finding on Polymarket: the market is doing worse than a trivial baseline. Two explanations:

Baseball has high inherent variance (161 games, pitcher rotation, bullpen management, etc.) and the market is overconfident relative to that variance
The snapshot timing is wrong for MLB (opens ~48h before close, snapshot lands in the early market life before meaningful price discovery)

The first explanation makes MLB an actual alpha target worth investigating further. The second is a methodology bug we should rule out by probing MLB snapshots specifically.

4. EPL and MLB-Total show mild negative lift (-0.009 and -0.010) but these are small enough that they’re likely sampling noise rather than real signal.

5. NBA-Total, NFL, NHL, NFL-Total sit in the “inherently coin-flippy” middle: Brier 0.16-0.20 with positive lift of +0.05 to +0.07. The market beats the baseline by a small margin. Inherent variance of individual games limits how low Brier can go — a perfect forecaster on 50/50 markets can’t get below ~0.20, so these numbers are roughly what an efficient market should look like.

Comparison to the Polymarket PM1 findings

Metric	Polymarket (top 100)	Kalshi (aggregate)	Kalshi (NBA)
Brier	0.033	0.194	0.113
Lift over majority	~0.13	+0.030	+0.134
Discipline gate	PASS	FAIL	PASS
Dominant category	Politics/election	Sports	Sports (NBA)
Dominant win rate	~20%	~45%	55%

The direct numerical comparison is apples-to-oranges because:

Polymarket top-volume is dominated by skewed-outcome political markets (~20% win rates, inherent Brier floor very low)
Kalshi top-volume is dominated by 50/50 sports markets (inherent Brier floor ~0.20)

The fair comparison is lift, not raw Brier. On that basis:

Polymarket top-volume: lift ~0.13
Kalshi NBA: lift +0.134 (nearly identical)
Kalshi aggregate: lift +0.030 (much weaker, dragged down by MLB/EPL/LaLiga)

Kalshi NBA is just as efficient as the top of Polymarket. Both venues have categories where the informed-trader population keeps the market honest. The venue isn’t the efficiency story — the category within the venue is.

The “Kalshi MLB = Polymarket Elon-tweets” parallel

The most interesting finding is that each venue has at least one identifiable sub-population where the market is measurably worse than trivial baseline:

Venue	Sub-population	Brier	Majority baseline	Lift	N
Polymarket	Elon tweet counts	0.246	0.204	-0.042	21
Kalshi	MLB games	0.305	0.249	-0.056	40

Both are retail-dominated markets where a structural explanation for the mispricing is plausible:

Elon tweets: narrow buckets (20 tweets wide), retail entertainment, no-one pulls the actual history
MLB games: high-variance sport, 162-game grind, retail bettors overconfident

These are the first two places in our entire analysis where a systematic strategy has a plausible path that isn’t immediately crushed by market efficiency. Worth deeper investigation before the arbitrage work.

Caveats I’m flagging honestly

NBA sample is suspicious. The +0.134 lift is striking, but N=40 sports markets is a moderate sample and the point estimate has real uncertainty. Worth re-running with 100+ NBA markets to confirm.
My snapshot methodology is “market midpoint in time” not “3 days before close” because of short-lived sports markets. This means the Kalshi numbers aren’t directly comparable to the Polymarket “3 days before close” numbers — Kalshi markets in the dataset are snapshotted at different points in their lifecycle depending on duration.
MLB’s Brier 0.3051 could partially be a snapshot timing artifact. At market midpoint (~24h before close for a 48h MLB market), there may be very little price discovery compared to 3 days before a week-long Polymarket market. We’d want to confirm by snapshotting MLB at “1 hour before close” (when the market has matured) and seeing if the calibration improves.
NFL has only N=13 — too small for strong conclusions. NFL markets on Kalshi have $28M median volume, so individual markets exist but the sample here is limited by what was in our recent settled sweep.
Kalshi’s short-lived markets make the Brier test structurally different from Polymarket’s longer-running ones. A 48-hour market has less room for opinion to diverge from eventual truth than a week-long one, which may explain why sports Brier on Kalshi tends to cluster tighter around game-inherent variance.

Actionable next steps

Confirm the MLB finding on a larger sample (target N=150-200) and with a “close - 1 hour” snapshot. If the negative lift holds, MLB is a real alpha target.
Confirm NBA’s +0.134 lift on a larger sample. If it holds, this is a surprisingly efficient market we can use as a reference for benchmarking our own forecasters.
Measure Kalshi NBA at multiple time points (e.g., 48h, 24h, 12h, 1h before close) to build a calibration curve. If the curve converges to the right answer quickly, it tells us something about where in the market lifecycle the alpha lives.
Start the market-matching pipeline (independent of price analysis, per the founder’s guidance). Build a fuzzy matcher that pairs Polymarket markets to Kalshi markets by title/category/date, so we can run arbitrage analysis continuously on matched pairs when prices diverge mid-window.
Skip live sports arbitrage for now. Both venues are reasonably efficient on NBA and NFL; cross-venue pricing gaps are unlikely to open wide enough to overcome transaction costs and slippage. The more interesting arb candidates are econ/political events that both venues host (CPI, Fed decisions, election outcomes) where a brief information asymmetry between venues could produce real gaps.

pm1-polymarket-baseline — the Polymarket side of this comparison (partially superseded)
pm1b-polymarket-long-tail-correction
pm1cd-category-and-stability
pm1c3-other-breakdown — the Elon tweet finding that parallels the MLB finding here
pm1e-elon-forecast — the forward test currently running
../autoinv/kalshi — the client used for this analysis
../architecture-vision — target 5-agent shape