Model Calibration · v3.0 Walk-Forward 1182 OOS fights

Expected Calibration Error

0.116

Average gap between predicted and actual win rates. Lower = better. Reasonable

Brier Score

0.2308

0.25 = coin flip

Log Loss

0.6543

0.693 = coin flip

Accuracy

66.3%

walk-forward

Buckets

confidence tiers

Reliability by Confidence Bucket Predicted Actual

50%-55%

+6.5pp

55%-60%

+17.2pp

60%-65%

+27.6pp

65%-80%

+31.6pp

Model is under-confident — its picks win more often than advertised. Real edge is larger than displayed confidence.

▸ Exact numbers table

Predicted Range	Fights	Avg Predicted	Actual Win Rate	Delta	Status
50%-55%	723	52.3%	58.8%	+6.5%	Under-confident (wins MORE than predicted — extra value)
55%-60%	345	57.1%	74.2%	+17.2%	Under-confident (wins MORE than predicted — extra value)
60%-65%	104	61.8%	89.4%	+27.6%	Under-confident (wins MORE than predicted — extra value)
65%-80%	10	68.4%	100.0%	+31.6%	Under-confident (wins MORE than predicted — extra value)

Walk-forward = trained on pre-2024 data, tested on 2024+ fights (1182 fights). True out-of-sample, zero lookahead.

How to Read These Metrics

Brier Score — measures probability calibration. 0.25 = coin flip, lower = better. Good

Log Loss — information-theoretic quality. 0.693 = coin flip, lower = better. Good

ECE (Expected Calibration Error) — avg gap between predicted and actual rates. Lower = better calibrated. Reasonably calibrated

Live Tracking Data (87 predictions scored — click to collapse)

Scored

63%

Accuracy

0.229

Brier

Calibration by Confidence Bucket

Predicted Range	Count	Avg Predicted	Actual Win Rate	Delta	Status
50%-55%	52	51.8%	59.6%	+7.9%	Under-confident
55%-60%	12	57.6%	58.3%	+0.7%	Well calibrated
60%-65%	7	62.2%	57.1%	-5.1%	Over-confident
65%-70%	3	65.9%	66.7%	+0.7%	Well calibrated
70%-75%	2	73.0%	50.0%	-23.0%	Over-confident
75%-80%	4	75.5%	75.0%	-0.5%	Well calibrated
80%-85%	1	80.4%	100.0%	+19.6%	Under-confident
85%-90%	6	85.0%	100.0%	+15.0%	Under-confident

Rolling Accuracy (10-fight window)

Fight 80

60%

Fight 81

60%

Fight 82

60%

Fight 83

60%

Fight 84

70%

Fight 85

80%

Fight 86

80%

Fight 87

70%

Calibration Metrics