Joseph Buchdahl explores prediction models and odds efficency during the 2022 World Cup and reveals just how close Pinnacle's prediction models are to true probability.
Read on to learn how bookmakers predict outcomes and see how Pinnacle stacked up against competitors.
Using the World Cup betting market to compare the odds efficiency of bookmakers
In the first part of this two-part series looking at the surprise factor of the 2022 World Cup, I showed how we could use a Monte Carlo simulation to reveal the range of possible World Cup match outcomes, their 64-match Multiple bet probabilities and their likelihood.
I concluded by arguing that according to Pinnacle’s view, the match outcomes (collectively speaking), whilst less probable than expected, were not hugely surprising.
Surprising results or an invalid prediction model?
Pinnacle, however, represents just one view of what one might expect to occur at a World Cup. Every bettor will have their own view, as indeed will other data forecasting companies and other bookmakers.
Randomness can deliver big surprises
What they consider to be expected will then influence whether what unfolds is regarded as surprising or otherwise. Furthermore, whether we are surprised (or not) can then help determine how much faith we should place in any underlying prediction model.
The more surprised we are about what happens, the more we should begin to question our prediction model.
Yes, it’s true, aleatory uncertainty (randomness) can deliver big surprises with no underlying cause, but the bigger the surprise, the greater the probability that there is something causal going on, and the most obvious one is an invalid prediction model.
This kind of statistical testing is essentially the same as the method I’ve previously discussed for analysing whether a bettor or tipster is skilled or merely displaying luck.
What is a good prediction model?
A good prediction model should be able to reflect actual outcomes. Clearly, on a match-by-match basis that is impossible, since outcomes are binary (they happen, 1, or they don’t, 0), whereas predictions are probabilistic.
Yet over a sample of matches, if the percentage of actual wins, broadly speaking, matches the expected number of wins, we can say that the prediction model is a good one. Pinnacle has proved itself to be very good at this task.
Their betting odds, and their implied probabilities, closely match actual win percentages. Their odds are what we would call efficient, or more simply, accurate.
The larger the sample we have, the easier it becomes to separate epistemic uncertainty (or model error) from aleatory uncertainty (or random luck).
Pinnacle versus the rest
64 matches do not offer the biggest of samples, but let’s use them to compare Pinnacle’s view of the World Cup matches to the views of a handful of other bookmakers. To do this, I ran the same Monte Carlo simulation as described in my first article for 23 additional online bookmakers plus the market average.
For each, I counted the percentage of Multiple bet probabilities that were smaller (more surprising) than the one observed, where these probabilities were calculated using the implied match probabilities as defined by each bookmaker’s odds (having removed the margin).
Recall that for Pinnacle, about 20% of the possible Multiple probabilities were smaller than the one seen in the actual World Cup. This means that whilst it was somewhat surprising, the surprise was not a big one and not statistically significant.
Compare Pinnacle’s figure to the other bookmakers. It’s the largest. What does that tell us? It says that for all other bookmakers, using their view of what the match probabilities should have been, the actual results were all more surprising than compared to Pinnacle’s view.
The p-value and confidence in a bookmaker’s prediction model
Effectively, we can treat this percentage figure as a statistical p-value. The p-value in statistical significance testing tells us the probability of seeing a particular set of data purely by chance (aleatory uncertainty) given that a particular hypothesis is true.
When the p-value is small enough, say 1% or smaller, we might then have enough confidence to say that this is too small for something to realistically happen by chance alone.
This then implies that our initial hypothesis should be discarded in a favour of a different one. In the context here, our initial hypothesis would be that the bookmaker’s view of match probabilities is valid and that they represent a true representation of the true outcome probabilities.
If the p-value is small enough, we might then have to review our confidence in the bookmaker’s prediction model. Yes, unlikely things can still happen by chance, but the smaller the probability, the more likely it is that the bookmaker’s model is wrong.
A typical p-value threshold for statistical significance is 1%. None of the bookmakers’ p-value here fell as low. The World Cup match results were most surprising for BetVictor with a p-value of about 7%.
Thus, even for them, we should probably not be concluding that their prediction model is invalid.
Nonetheless, the chart above does present an opportunity to create a rudimentary ranking scale of confidence, or degrees of belief, in the quality or efficiency of a bookmaker’s World Cup match betting odds.
As stated, a sample of 64 matches is arguably too small to draw very firm conclusions here, but qualitatively at least, it does confirm what I’ve already argued in previous articles (when analysing much larger data samples) – namely that Pinnacle has the most accurate or efficient soccer match betting odds.
In philosophical terms, these p-values tell us the probability of seeing a particular set of data by chance, given that a particular hypothesis is true.
Pinnacle has the lowest log-score over many bookmakers
We can instead reverse this thinking. Rather than testing the probability of the data occurring given the hypothesis is true, we can test the probability of the hypothesis being valid given that the data is the most likely to be witnessed. I’ve discussed the difference between these two approaches previously.
The first is called frequentist statistical testing, on the grounds that it counts frequencies of data. The second is an example of Bayesian inference, and more specifically in this case, a method known as likelihood estimation.
Given some set of observed data – in this case, the 64 World Cup match outcomes – what forecast probabilities would be required if the observed results were actually the ones most likely to occur?
This statistical best-fit methodology is formally known as maximum likelihood estimation.
I won’t bore you with the mathematics of the methodology of maximum likelihood estimation; it’s sufficient to say that effectively I’ve already performed it in part one of this article series when calculating the negative natural logarithm of the Multiple probability for the actual 64 match outcomes.
The average figure of 63.5 for Pinnacle is what is called a log-likelihood value. If we then divide that by the sample size, 64, the figure is known as a log-loss.
For Pinnacle, the World Cup log-loss score is then 0.992. The smaller the figure, the better the forecast probabilities. A perfect score would be 0. A completely imperfect score would be infinity.
Log-loss is a type of scoring rule, not dissimilar to the Brier score. The log-loss score can be interpreted as a measure of how wrong or far away your forecast probabilities are from the actual outcomes. In one sense, we might regard ‘loss’ to mean a loss of certainty.
How do the log-loss scores for the other bookmakers compare to Pinnacle’s score? Let’s take a look. The next histogram shows the data:
Pinnacle has the lowest log-loss score. This implies their model is the best one at capturing the true probabilities of outcomes. In other words, they have the most accurate or efficient match betting odds.
You can see that bookmakers with lower log-loss scores correlate very well with those that have higher p-values, and vice versa. This reinforces the idea that the better the prediction model, the less surprised we should be by the results that occur.
Analysis of the 2022 World Cup matches has offered an interesting way to investigate and test this concept, one which I think has both practical and philosophical value. If we are surprised by outcomes, it may mean those outcomes are surprising.
Equally, however, it may mean the way we tried to forecast those outcomes was wrong.
Of course, we should remind ourselves that a sample of 64 matches is small. Perhaps Pinnacle got lucky on this occasion with its accuracy. Then again, perhaps not.
I’ve analysed a bigger sample of data (the 2019/20 Premier League season) with 380 matches in my book Monte Carlo or Bust: Simple Simulations for Aspiring Sports Bettors, calculating log-loss scores, and the findings are the same: Pinnacle had the lowest log-loss score. I have little doubt this would be confirmed with yet larger samples of match data.
Odds efficiency and the Winners Welcome Policy
Having the most efficient or accurate forecast model means that Pinnacle has the most efficient or accurate betting odds.
This has a couple of important implications. Firstly, it does mean that they are potentially the hardest to beat because a bettor can only make a long-term profit if Pinnacle makes mistakes.
However, no bookmaker can set perfectly efficient odds, and indeed we know that Pinnacle uses those few customers who can beat them to help make their odds even stronger.
In addition to spending significant revenue on data analysis, incorporating the wisdom of such sharp customers into their forecast models is one of the reasons why Pinnacle has the sharpest odds.
Secondly, however, this means that being a sharp customer of Pinnacle, unlike for other bookmakers, will not see your account restricted or closed. The odds of more recreational bookmakers may be easier to beat, but if you’re not permitted to play if you beat them consistently, this advantage is completely illusory.
My analysis of the 2022 World Cup matches has provided an interesting insight into the meaning of surprise in sports and has added further weight to the accepted wisdom that Pinnacle offers the most accurate football match betting odds.
A quarter of a century after the company was founded with the rationale of providing sharp odds and challenging customers to try to beat them, they continue to do what they do best, and do it better than everyone else.
Sign up to Pinnacle for great soccer odds across a wide range of markets. Be sure to check out other insightful articles from Joseph Buchdahl at Betting Resources.