06-reference / concepts

brier score

Thu Apr 09 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·concept

Brier Score — the forecaster’s ruler

TL;DR

Brier score = mean squared error between your probability predictions and the realized outcomes.

The formula

For N binary events with predicted probabilities pᵢ and realized outcomes oᵢ (each 0 or 1):

Brier = (1/N) · Σ (pᵢ - oᵢ)²

For every prediction, you square the gap between what you said would happen and what did happen, then average over all your predictions.

Why it matters

Brier punishes confident wrong predictions more than hesitant ones. If you predict 0.9 for an event that turns out to be 0 (never happened), you eat a 0.81 penalty. If you predict 0.6 for the same event, you only eat 0.36. That’s the right incentive — confident bets should cost more when they’re wrong.

But it also punishes timid right predictions. If you predict 0.55 for an event that happens, you eat 0.2025 (you got it right but you hedged too much). That’s also the right incentive — if you actually knew, you should have said so.

The property that makes this work: the expected Brier is minimized when you report your true subjective probability. If you genuinely believe something is 70% likely, saying “70%” gives you the lowest expected Brier over many trials. Saying “50%” to hedge makes your expected score worse, not better. This is what “proper scoring rule” means.

Reference values

These are the benchmarks that let you calibrate your intuition:

Brier scoreWhat it means
0.00Perfect. Every prediction matched the outcome exactly. Impossible in practice.
0.02-0.05Excellent. Professional forecaster with strong track record on a tractable domain.
0.06-0.12Very good. 538’s historical range on US presidential elections. The best human election forecasters.
0.10-0.15Good. Published weather forecasters on precipitation.
0.12Our discipline gate. From the gemchange simulate-like-quant-desk article: “if your simulation can beat that, you have edge.”
0.16Same as always predicting the base rate (majority class). On a 20% base rate, always guessing “20%” gives Brier 0.16. This is the “I know nothing except the overall frequency” floor.
0.25Always predicting 0.5. This is what you get if you don’t know anything about individual events — you just throw up your hands and say “50/50 no idea.”
0.50Worse than random on a 50/50 event. You’re systematically wrong.
1.00Catastrophically wrong on every prediction. Always predicted the opposite of the truth.

Important nuance: the “good” Brier number depends on how hard the prediction problem is. For 50/50 coin-flip events (most individual sports games), even a perfect forecaster can’t get much below 0.20 because the outcome variance is inherently high. For lopsided events (90% favorite wins), a well-calibrated forecaster can easily hit 0.05.

Always compare your Brier to the right baseline:

  1. What does “always predict 0.5” score on this dataset? (The worst honest baseline.)
  2. What does “always predict the base rate / majority class” score? (The second-worst honest baseline — if you can’t beat this, your model knows nothing.)
  3. What do professional forecasters on similar problems score? (The practical ceiling.)

How we use it in Automated Investing

Relationship to other scoring rules

Gotchas