Backtest

Challenger model backtest

A comparison of alternative model shapes after the first bad longshot day, covering 1X2, totals and AH movement.

Updated 2026-05-11 21:31 UTC. Paper-trading research only.

Generated: 2026-05-11T21:31:22+00:00 Train before: 2022-07-01 Test from: 2022-07-01

This is a challenger run after the 2026-05-11 longshot failure. The purpose is not to find a magic model. It is to decide which market expressions deserve more live paper trading.

Diagnostic strategies use closing prices or closing movement, so they are CLV/research checks rather than fully live betting rules.

Strategy	Bets	Profit u	ROI	Wins	Losses	Pushes	Avg odds
naive 1x2 price gap	21	17.27	82.2%	13	8	0	3.076
guarded 1x2 no draw longshots	6	-0.25	-4.2%	3	3	0	2.14
strong favourite price shop	18	-0.27	-1.5%	10	8	0	1.699
over 2 5 bucket 170 190	4867	24.79	0.5%	2728	2139	0	1.794
calibrated positive buckets	5360	6.23	0.1%	3029	2331	0	1.816
totals price move hindsight (diagnostic)	8538	91.16	1.1%	4541	3997	0	1.926
ah line move hindsight (diagnostic)	8525	367.69	4.3%	4487	3451	587	1.84
ah close best price (diagnostic)	169	34.28	20.3%	89	67	13	2.169

Read

The naive 1X2 price-gap model is not trustworthy. Even when it shows a positive historical result, the sample is tiny and it is exactly the failure mode that produced the bad day.
Strong favourites are not automatically good. They are price-sensitive and should usually be expressed through handicap/DNB when the outright is awkward.
Totals are cleaner than 1X2, but simple bucket rules are not enough on their own.
Asian handicap line movement is the strongest research signal in this run, but it is diagnostic until the bot stores openers and current lines live.
The better backbone is not a single model. It is a market router: longshot catalyst gate, then choose AH/DNB/totals before 1X2.

Proposed Backbone V2

1. Store opener immediately for 1X2, totals and Asian handicap. 2. Refresh current odds at least daily, then more often near kickoff. 3. Block 1X2 longshots unless there is news or systemic evidence. 4. If the view is favourite vulnerability, test handicap/DNB first. 5. Use totals only when the price bucket and team/news profile agree. 6. Publish rarely. No selection without a market expression that matches the thesis.

Raw report: challenger-model-backtest.md. Machine-readable data: challenger-model-backtest.json.