01-projects / automated-investing / experiments

pm1 kalshi baseline

Thu Apr 09 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·experiment-writeup ·status: partially-superseded

PM1 (Kalshi) — Calibration Baseline

⚠ Correction (2026-04-10): The MLB verdict in this document (“biggest red flag, real mispricing target”) was wrong. The baseline used an adaptive snapshot at market midlife, which for short-lived (~48h) MLB markets lands around 24h before close — before price discovery matures. A follow-up deep dive at 1h before close shows MLB actually has positive lift of +0.058. The “Kalshi MLB” parallel to the Polymarket Elon tweet finding was an artifact, not a real alpha lead. See pm1-kalshi-mlb-deepdive for the corrected analysis. The rest of this document (NBA as surprisingly efficient, CPI as hyperefficient, overall aggregate) still holds; only the MLB/EPL/LaLiga/MLB-Total negative-lift claims are suspect.

Mirror of the Polymarket PM1 analysis, run against Kalshi. Same question: how well does Kalshi’s own midpoint predict outcomes at ~3 days before close, and where is the mispricing?

Cost: $0. Anonymous Kalshi read endpoints only.

Methodology

Sample: 11 target series ticker prefixes covering sports (KXNBAGAME, KXNFLGAME, KXMLBGAME, KXNHLGAME, KXEPLGAME, KXLALIGAGAME), totals markets (KXNBATOTAL, KXNFLTOTAL, KXMLBTOTAL), and economic indicators (KXECONSTATCPIYOY, KXECONSTATCORECPIYOY). Final total: 346 settled markets with Brier-scoreable snapshots.

Why series-ticker targeting instead of pagination: Kalshi’s default status=settled sort returns a flood of multi-variate parlay markets (KXMVE*) that aren’t clean binary events. Filtering by specific series gives us the sports and econ markets we actually want to compare against Polymarket.

Adaptive snapshot methodology: NBA/MLB/NHL markets open only ~48 hours before close, so “3 days before close” is before the market exists. I scale the snapshot offset to the market’s lifetime:

This keeps the comparison fair across different market duration regimes — we’re asking “what did the market price when it was about halfway through its life?” rather than insisting on a fixed offset that doesn’t exist for short-lived markets.

Resolution source: Kalshi’s result field (‘yes’/‘no’). Snapshot price is price.close_dollars from the hourly candlestick closest to the target time, fallback to yes_bid/yes_ask midpoint.

Results

Ranked by Brier (worst calibration = biggest opportunity):

Label            Series                     N    Median vol      Brier    Gate     Win%    Lift
MLB              KXMLBGAME                 40  $ 1,239,290    0.3051    FAIL    47.5%    -0.056
EPL              KXEPLGAME                 38  $   383,954    0.2477    FAIL    39.5%    -0.009
MLB-Total        KXMLBTOTAL                28  $    46,782    0.2400    FAIL    35.7%    -0.010
LaLiga           KXLALIGAGAME              43  $   267,242    0.2289    FAIL    34.9%    -0.002
NFL              KXNFLGAME                 13  $28,045,224    0.1942    FAIL    46.2%    +0.054
NHL              KXNHLGAME                 40  $   271,104    0.1828    FAIL    45.0%    +0.065
NFL-Total        KXNFLTOTAL                29  $   512,179    0.1787    FAIL    34.5%    +0.047
NBA-Total        KXNBATOTAL                42  $    95,260    0.1609    FAIL    33.3%    +0.061
Core-CPI-YoY     KXECONSTATCORECPIYOY      11  $     2,742    0.1472    FAIL    18.2%    +0.002
NBA              KXNBAGAME                 40  $ 3,097,706    0.1132    PASS    55.0%    +0.134
CPI-YoY          KXECONSTATCPIYOY          22  $    15,674    0.0495    PASS     9.1%    +0.033

Aggregate (N=346): Brier 0.1942 vs majority baseline 0.2243. Overall lift of +0.030.

The headline findings

1. Kalshi’s NBA game markets are remarkably well-calibrated. Brier 0.1132 with +0.134 lift over majority baseline — the only sports series that passes the 0.12 discipline gate. At median volume $3.1M per market (N=40), this is a substantial, liquid, accurately-priced market. If you want to copy-trade Kalshi’s own NBA prices, you’re doing better than pure baseline prediction by 13 percentage points.

2. CPI-YoY is hyperefficient. Brier 0.0495 on an 11% base rate. This mirrors the Polymarket finding that political/economic markets are the best-calibrated across both venues. Informational edge in these markets is extremely hard to get.

3. MLB is the biggest red flag. Brier 0.3051 with negative lift of -0.056 — the market is actively worse than always predicting the base rate. Similar structural shape to the Elon tweet-count finding on Polymarket: the market is doing worse than a trivial baseline. Two explanations:

The first explanation makes MLB an actual alpha target worth investigating further. The second is a methodology bug we should rule out by probing MLB snapshots specifically.

4. EPL and MLB-Total show mild negative lift (-0.009 and -0.010) but these are small enough that they’re likely sampling noise rather than real signal.

5. NBA-Total, NFL, NHL, NFL-Total sit in the “inherently coin-flippy” middle: Brier 0.16-0.20 with positive lift of +0.05 to +0.07. The market beats the baseline by a small margin. Inherent variance of individual games limits how low Brier can go — a perfect forecaster on 50/50 markets can’t get below ~0.20, so these numbers are roughly what an efficient market should look like.

Comparison to the Polymarket PM1 findings

MetricPolymarket (top 100)Kalshi (aggregate)Kalshi (NBA)
Brier0.0330.1940.113
Lift over majority~0.13+0.030+0.134
Discipline gatePASSFAILPASS
Dominant categoryPolitics/electionSportsSports (NBA)
Dominant win rate~20%~45%55%

The direct numerical comparison is apples-to-oranges because:

The fair comparison is lift, not raw Brier. On that basis:

Kalshi NBA is just as efficient as the top of Polymarket. Both venues have categories where the informed-trader population keeps the market honest. The venue isn’t the efficiency story — the category within the venue is.

The “Kalshi MLB = Polymarket Elon-tweets” parallel

The most interesting finding is that each venue has at least one identifiable sub-population where the market is measurably worse than trivial baseline:

VenueSub-populationBrierMajority baselineLiftN
PolymarketElon tweet counts0.2460.204-0.04221
KalshiMLB games0.3050.249-0.05640

Both are retail-dominated markets where a structural explanation for the mispricing is plausible:

These are the first two places in our entire analysis where a systematic strategy has a plausible path that isn’t immediately crushed by market efficiency. Worth deeper investigation before the arbitrage work.

Caveats I’m flagging honestly

  1. NBA sample is suspicious. The +0.134 lift is striking, but N=40 sports markets is a moderate sample and the point estimate has real uncertainty. Worth re-running with 100+ NBA markets to confirm.

  2. My snapshot methodology is “market midpoint in time” not “3 days before close” because of short-lived sports markets. This means the Kalshi numbers aren’t directly comparable to the Polymarket “3 days before close” numbers — Kalshi markets in the dataset are snapshotted at different points in their lifecycle depending on duration.

  3. MLB’s Brier 0.3051 could partially be a snapshot timing artifact. At market midpoint (~24h before close for a 48h MLB market), there may be very little price discovery compared to 3 days before a week-long Polymarket market. We’d want to confirm by snapshotting MLB at “1 hour before close” (when the market has matured) and seeing if the calibration improves.

  4. NFL has only N=13 — too small for strong conclusions. NFL markets on Kalshi have $28M median volume, so individual markets exist but the sample here is limited by what was in our recent settled sweep.

  5. Kalshi’s short-lived markets make the Brier test structurally different from Polymarket’s longer-running ones. A 48-hour market has less room for opinion to diverge from eventual truth than a week-long one, which may explain why sports Brier on Kalshi tends to cluster tighter around game-inherent variance.

Actionable next steps

  1. Confirm the MLB finding on a larger sample (target N=150-200) and with a “close - 1 hour” snapshot. If the negative lift holds, MLB is a real alpha target.
  2. Confirm NBA’s +0.134 lift on a larger sample. If it holds, this is a surprisingly efficient market we can use as a reference for benchmarking our own forecasters.
  3. Measure Kalshi NBA at multiple time points (e.g., 48h, 24h, 12h, 1h before close) to build a calibration curve. If the curve converges to the right answer quickly, it tells us something about where in the market lifecycle the alpha lives.
  4. Start the market-matching pipeline (independent of price analysis, per the founder’s guidance). Build a fuzzy matcher that pairs Polymarket markets to Kalshi markets by title/category/date, so we can run arbitrage analysis continuously on matched pairs when prices diverge mid-window.
  5. Skip live sports arbitrage for now. Both venues are reasonably efficient on NBA and NFL; cross-venue pricing gaps are unlikely to open wide enough to overcome transaction costs and slippage. The more interesting arb candidates are econ/political events that both venues host (CPI, Fed decisions, election outcomes) where a brief information asymmetry between venues could produce real gaps.