Transparency · Reliability
Calibration Metrics
How well model probabilities match real-world outcomes — verified bucket-by-bucket.
How to Read These Metrics
Brier Score — measures probability calibration. 0.25 = coin flip, lower = better.
Good
Log Loss — information-theoretic quality. 0.693 = coin flip, lower = better.
Good
ECE (Expected Calibration Error) — avg gap between predicted and actual rates. Lower = better calibrated.
Reasonably calibrated
Live Tracking Data (87 predictions scored — click to collapse)
87
Scored
63%
Accuracy
0.229
Brier
Calibration by Confidence Bucket
| Predicted Range | Count | Avg Predicted | Actual Win Rate | Delta | Status |
|---|---|---|---|---|---|
| 50%-55% | 52 | 51.8% | 59.6% | +7.9% | Under-confident |
| 55%-60% | 12 | 57.6% | 58.3% | +0.7% | Well calibrated |
| 60%-65% | 7 | 62.2% | 57.1% | -5.1% | Over-confident |
| 65%-70% | 3 | 65.9% | 66.7% | +0.7% | Well calibrated |
| 70%-75% | 2 | 73.0% | 50.0% | -23.0% | Over-confident |
| 75%-80% | 4 | 75.5% | 75.0% | -0.5% | Well calibrated |
| 80%-85% | 1 | 80.4% | 100.0% | +19.6% | Under-confident |
| 85%-90% | 6 | 85.0% | 100.0% | +15.0% | Under-confident |
Rolling Accuracy (10-fight window)
Fight 80
60%
Fight 81
60%
Fight 82
60%
Fight 83
60%
Fight 84
70%
Fight 85
80%
Fight 86
80%
Fight 87
70%