A deep dive into the ensemble modelling system powering WagerBase's football predictions — xG-weighted Dixon-Coles meets Elo ratings, backtested across 269 Premier League matches.
The Problem With Single Models
Most football prediction sites rely on a single statistical model. Some use Elo ratings. Some use Poisson-based frameworks. A few throw machine learning at the wall and hope something sticks.
The issue is that every model has blind spots. A Poisson model captures the low-scoring, probabilistic nature of football beautifully — but it's rigid. An Elo system adapts quickly to form changes — but it lacks the structural elegance of modelling goal distributions directly.
At WagerBase, we asked a simple question: what if we combined both?
The result is an ensemble prediction engine that outperforms either model alone across every statistical measure we tested. Here's exactly how it works.
Model 1: xG-Weighted Dixon-Coles
The Foundation
The Dixon-Coles model, first published by Mark Dixon and Stuart Coles in 1997, remains the gold standard for football match prediction. It models each team's goal-scoring as a Poisson process — a probability distribution perfectly suited to rare, discrete events like goals.
For any given match between team (home) and team (away), the expected goals are:
Where:
- = league scoring intercept (log of league average xG)
- = team 's attack strength
- = team 's defensive weakness
- = home advantage parameter
The probability of any specific scoreline - follows the bivariate Poisson distribution:
The function is Dixon and Coles' key innovation — a correction factor that accounts for the dependency between low-scoring outcomes (0-0, 1-0, 0-1, 1-1), which the independent Poisson model systematically misprices. The correlation parameter typically sits around 0.03-0.05, a small but statistically significant adjustment.
The xG Enhancement
Traditional Dixon-Coles uses actual goals scored to estimate team strengths. We replace this with expected goals (xG) — a metric that measures the quality of chances created rather than the goals that happened to go in.
Why does this matter? Consider a match where Liverpool create chances worth 3.2 xG but only score once due to poor finishing and good goalkeeping. Using actual goals, the model thinks Liverpool had a bad day. Using xG, the model correctly identifies Liverpool as dominant.
Team attack and defence ratings are derived as:
The xG values are sourced from granular match-level data across the Premier League, La Liga, Bundesliga, Serie A, and Champions League — covering individual player expected goals aggregated per fixture.
Time Decay
Football teams evolve. Managers change tactics, players gain or lose form, injuries disrupt systems. A result from August shouldn't carry the same weight as last weekend's performance.
We apply exponential decay with a half-life of approximately 23 weeks:
A match from one month ago contributes 89% of a recent match's weight. Three months ago: 67%. Six months: 45%. This ensures the model captures current form while retaining enough historical data for statistical stability.
Injury Adjustments
Raw xG doesn't account for who's available. If a team's top scorer (generating 0.6 xG per 90 minutes) is ruled out, the team's attacking output should be adjusted downward.
We apply injury adjustments to the predicted xG values based on the expected contribution of unavailable players, weighted by their historical xG output and typical minutes played.
Model 2: xG-Elo Rating System
A Different Lens on the Same Data
While Dixon-Coles models the generative process of goals (Poisson distributions, attack vs defence), Elo takes a fundamentally different approach. It maintains a single rating number per team that rises after strong performances and falls after weak ones.
Every team starts at a baseline rating of 1500. The expected win probability for the home team is:
Where:
- , = current Elo ratings
- = home advantage parameter (optimised to 60 points)
After each match, ratings update based on performance versus expectation:
Where:
- = adaptation rate (optimised to 40)
- = xG-adjusted match score
- = expected score from pre-match Elo
The xG Twist
Standard Elo updates based purely on win/draw/loss — a binary signal. Our xG-Elo blends the actual result (60% weight) with xG performance (40% weight):
The performance score uses xG margin with diminishing returns to prevent outlier matches from distorting ratings:
A team that generates 4.0 xG versus 0.3 xG conceded doesn't get four times the rating boost of a team with 1.5 xG versus 0.8. The logarithmic compression ensures proportional but bounded updates.
Draw Probability
Elo naturally produces a two-outcome probability (home win vs away win). Football has three outcomes. We model draws using an ordinal approach:
This produces draw probabilities of 25-30% for evenly matched teams (consistent with real-world Premier League rates) and lower draw probabilities for lopsided matchups.
Season Reset
At the start of each season, all ratings regress one-third toward the baseline:
This prevents ratings from drifting too far from reality across seasons as squads change, managers move, and promoted teams enter.
The Ensemble: 1 + 1 = 3
Why Blending Works
Dixon-Coles and Elo attack the same problem from different mathematical foundations:
| Aspect | Dixon-Coles | Elo |
|---|---|---|
| Core model | Bivariate Poisson with correction | Pairwise comparison |
| Captures goals distribution | ✅ Natively | ❌ Not modelled |
| Adapts to form quickly | ⚠️ Decay-weighted | ✅ Every match |
| Draw modelling | ✅ Falls out of Poisson matrix | ⚠️ Approximated |
| Computational complexity | Higher | Lower |
| Sensitivity to outliers | Lower (decay smoothing) | Higher (single K-factor) |
When Dixon-Coles slightly mispredicts (because a team's scoring pattern has shifted faster than the decay-weighted average captures), Elo often catches the correction — and vice versa. The errors are uncorrelated, which is the mathematical prerequisite for ensemble improvement.
The Blend
The ensemble output is a simple arithmetic mean:
For each of the three match outcomes (home, draw, away), we average the two model probabilities and renormalise to ensure they sum to 100%.
No fancy weighting. No meta-model. Just a clean 50/50 split. In our testing, more complex weighting schemes didn't reliably outperform the simple average — a finding consistent with the broader ensemble learning literature.
Backtest Results
We ran a strict walk-forward backtest across the 2025/26 Premier League season — 269 matches from August to February. For each match, both models predicted using only data available before kickoff. No lookahead bias. No cherry-picking.
The ensemble outperformed both individual models on every probabilistic measure we tested: log loss (the gold standard for probability calibration), Brier score (a complementary accuracy metric), and raw prediction accuracy.
The critical finding wasn't just that it was better — it was better in the way that matters most. The ensemble produces well-calibrated probabilities. When it says a team has a 60% chance, that outcome occurs approximately 60% of the time. This calibration is what separates a useful prediction model from noise — and it's what allows meaningful comparison against bookmaker prices.
Current Elo Ratings (Feb 2026)
| Rank | Team | Rating |
|---|---|---|
| 1 | Arsenal | 1,649 |
| 2 | Manchester City | 1,615 |
| 3 | Manchester United | 1,568 |
| 4 | Liverpool | 1,554 |
| 5 | Aston Villa | 1,553 |
| 6 | Chelsea | 1,550 |
| ... | ... | ... |
| 19 | Burnley | 1,389 |
| 20 | Wolves | 1,378 |
From Probabilities to Picks
Having accurate probabilities is necessary but not sufficient. The WagerBase Pick system identifies matches where our ensemble model has high confidence — a model probability exceeding 55% on a single outcome.
This is deliberately selective. Across 63 upcoming matches in a typical week, only 7-10 qualify (roughly 11%). We don't pick every favourite. We pick the favourites our model independently confirms as strong.
The distinction matters. When bet365 prices Manchester City at 65% but our ensemble says 48%, we don't pick them — despite being the bookmaker's favourite. When Liverpool sits at 60% with our model and 72% with the bookmaker, we pick Liverpool — the model independently confirms the favourite is deserved, even if it doesn't agree on the exact margin.
Selectivity is the edge. Anyone can back favourites. The skill is knowing which favourites to skip.
What This Means for Prediction Markets
WagerBase sits at the intersection of traditional sports betting (bet365) and prediction markets (Polymarket, Kalshi). Our MarketGap system identifies when these two pricing mechanisms disagree on the same event.
The ensemble model adds a third voice. When bookmakers, prediction markets, AND our model all agree — that's a high-conviction signal. When they diverge, we track which source was right over hundreds of settled markets.
Across 245+ settled MarketGap trades, when bet365 and Polymarket disagreed by 12% or more, bet365's price was correct 59.3% of the time at a yield of +67.4%. The data compounds daily.
Technical Note
The ensemble engine runs across five European leagues (Premier League, La Liga, Bundesliga, Serie A, Champions League) using xG data sourced from professional-grade football statistics providers. For leagues without xG coverage, the system falls back to a goals-based Dixon-Coles implementation.
Predictions regenerate daily at 06:00 UTC with a rolling 7-day match window, ensuring continuous coverage. All picks are tracked publicly on our performance page with full settlement transparency.
The model is a tool, not an oracle. Football is inherently unpredictable — that's what makes it beautiful. But with the right mathematical framework, you can be wrong less often than the market expects.
WagerBase provides real-time analytics where bookmakers and prediction markets disagree. Track our predictions, MarketGap data, and whale activity at wagerbase.io.