From Match Data to Probabilities
We analyze football matches as a set of factors and historical patterns. The result is probabilities, not claims about the future.
What We Calculate
We analyze football matches as a combination of measurable factors and historical patterns. The output is match outcome probabilities — not predictions of what will happen.
- Win / Draw / Loss probability distribution
- Over/Under goal estimates
- Model confidence level for each match
Data Collection
We use structured, verifiable match data. No insider tips, no rumors, no subjective opinions — only historical facts that can be checked.
- Past match results across top 5 leagues
- Team and league statistics
- Form trends over time
- Home and away performance patterns
Feature Engineering
Raw data is transformed into analytical features that capture the real context of each match. This is where thinking happens — not just feeding a CSV to a model.
- Team strength ratings
- Form momentum and dynamics
- Home vs away behavioral differences
- Relative opponent advantage metrics
XGBoost v2 Model
Phase 2 upgrade: XGBoost replaces the legacy Random Forest. Trained on 6,536 real matches with 58 features including Dynamic Elo ratings. Post-model adjustments for injuries and fatigue fine-tune every output.
- XGBoost v2 — 6,536 real matches, 58 features, ~50% accuracy
- Dynamic Elo — updated after every match (95 teams)
- H2H history built directly into model features
- Post-model: injuries (±10%), fatigue (±5%), xG (±3%)
- Probability cap: 10%–70% per outcome (realistic football range)
Probability & Confidence Estimation
For every match, the system calculates not just probabilities but how certain those probabilities are. This lets you distinguish strong signals from noise.
- Outcome probabilities for every match
- Model confidence level (Low / Medium / High)
- Degree of uncertainty in the estimate
Continuous Evaluation
After matches are played, predictions are compared to actual outcomes. The model is regularly retrained and the full prediction history is preserved.
- Predictions compared to real results
- Model retrained on fresh data periodically
- Trained on 6,536 real matches → ~50% accuracy on real data
- Accuracy statistics tracked and stored