Calibration Metrics
How well model probabilities match real-world outcomes
Current Model (v2.0) — Walk-Forward Calibration
1145 OOS fights
66.9%
Walk-Forward Accuracy
0.2303
Brier Score
0.6532
Log Loss
0.1216
ECE
| Predicted Range | Fights | Avg Predicted | Actual Win Rate | Delta | Status |
|---|---|---|---|---|---|
| 50%-55% | 689 | 52.3% | 59.2% | +6.9% | Under-confident (wins MORE than predicted — extra value) |
| 55%-60% | 348 | 57.1% | 74.4% | +17.4% | Under-confident (wins MORE than predicted — extra value) |
| 60%-65% | 95 | 61.8% | 90.5% | +28.7% | Under-confident (wins MORE than predicted — extra value) |
| 65%-70% | 11 | 67.1% | 100.0% | +32.9% | Under-confident (wins MORE than predicted — extra value) |
| 70%-80% | 2 | 71.7% | 100.0% | +28.3% | Under-confident (wins MORE than predicted — extra value) |
Walk-forward = trained on pre-2024 data, tested on 2024+ fights (1145 fights). True out-of-sample, no data leakage.
The model is consistently under-confident — fighters the model picks win even more often than predicted, meaning the real edge is larger than shown.
How to Read These Metrics
Brier Score — measures probability calibration. 0.25 = coin flip, lower = better.
Good
Log Loss — information-theoretic quality. 0.693 = coin flip, lower = better.
Good
ECE (Expected Calibration Error) — avg gap between predicted and actual rates. Lower = better calibrated.
Reasonably calibrated
Live Tracking Data (13 predictions scored — click to collapse)
13
Scored
62%
Accuracy
0.24
Brier
Calibration by Confidence Bucket
| Predicted Range | Count | Avg Predicted | Actual Win Rate | Delta | Status |
|---|---|---|---|---|---|
| 50%-55% | 7 | 53.0% | 57.1% | +4.1% | Well calibrated |
| 55%-60% | 2 | 56.4% | 50.0% | -6.4% | Over-confident |
| 60%-65% | 2 | 63.0% | 100.0% | +37.0% | Under-confident |
| 65%-70% | 1 | 67.8% | 0.0% | -67.8% | Over-confident |
| 75%-80% | 1 | 75.7% | 100.0% | +24.3% | Under-confident |
Rolling Accuracy (10-fight window)
Fight 10
70%
Fight 11
70%
Fight 12
60%
Fight 13
60%