May 11, 2020
May 11, 2020

Another look at randomness and efficiency in soccer betting

What does efficiency look like?

Understanding randomness and efficiency

Still more to learn

Another look at randomness and efficiency in soccer betting

Joseph Buchdahl has published numerous educational materials on measuring randomness in soccer betting, as well as how efficient odds are and why they are so difficult to beat. In his latest article, he delves into that subject once again to take another look at randomness and efficiency in soccer betting.

I imagine readers are probably tired of the message that soccer match betting is a hard nut to crack because it’s mostly random and efficient. I thought that too, but in this time of lock down and no soccer, with not much else to do rather than procrastinate about writing another book, I started looking again at some old ideas.

This article bears the fruit of that work. There’s nothing particular new, just a representation of those ideas in a different light. I hope you find it useful.

What does efficiency look like?

Over the years I’ve debated with people who tell me soccer isn’t random. How can it be if Manchester United is way more likely to beat Cambridge United? They’re right; but we’re not talking about soccer, we’re talking about betting on soccer.

Essentially, odds are handicapped to take into account the differential ability of teams. Better teams have shorter odds. Once enough people have expressed their opinions with money about the chances of teams winning, the odds tend to be pretty close to what the ‘true’ ones would be, if those could be known, via a process known as price discovery. Whether this happens via a wisdom of the crowd or a wisdom of the sharps doesn’t really matter.

Beating Pinnacle’s odds, well, in English league soccer at least, is certainly not for the faint-hearted.

The job of the bookmaker is to get as close as possible to the true odds as they can, to ensure they’re facing the lowest possible risk in achieving their market-making commission over the long run. The job of the bettor is to find their mistakes.

One way we can investigate whether the bookmaker is getting close, on average, to the true odds, is to see whether betting all of them would leave us breaking even before their margin is applied. Furthermore, if variability in the returns from small samples of bets distribute just like they would from tossing fair coins, and if those returns regress to the mean, these are further signs that their odds are efficient and performance variation is just noise rather than signal.

Let’s look at some data. Taking the past three completed seasons (2016/17 to 2018/19) of professional English league soccer, I’ve used Pinnacle’s closing odds with their margin removed (the ‘fair’ odds) to calculate a team score for every match. The simple scoring rule, which I’ve used previously, is defined as follows.

If a team wins, award a score of 1 – 1/odds

If a team fails to win, award a score of –1/odds

Thus, team scores for individual matches will be between theoretical maxima and minima of +1 and -1 respectively.

For the four divisions over the three seasons (6,108 matches and 12,216 scores), the average score was 0.0030 with a standard deviation of 0.4557. That looks pretty close to an expected average score of 0 if the fair odds, on average, were perfect.

A random scores distribution?

What about the distribution of score samples? Ordering the data by team and match date, I calculated a series of six-match running average scores for every team. Obviously, there are no scores for the first five matches of a season.

The average six-game score was 0.0032 with a standard deviation of 0.1866. Their distribution is shown by the blue line in the chart below. The orange line shows the theoretical normal distribution of scores, if they had been generated randomly. It’s a case of spot the difference between it and the distribution for the scores based on actual results.

Randomness-in-soccer-betting-InArticle-1.jpg

The standard deviation, furthermore, is almost exactly the same as that predicted from first principles using the standard error of the mean,

Randomness-in-soccer-betting-InArticle-4.jpg

where σ is the standard deviation in the scores for the whole population of individual matches and n is the sample size, in this case 6. So,

Randomness-in-soccer-betting-InArticle-5.jpg

Is this significantly different from the observed figure of 0.1866. We can use the standard error formula again to calculate the expected standard deviation in this standard error figure, in other words the standard error of the standard error. Knowing there are 10,836 6-game samples, this will be calculated as follows.

Randomness-in-soccer-betting-InArticle-6.jpg

Hence, 0.1860 is only a third of a standard deviation away from 0.1866, and well within the boundaries of statistical significance. The difference between observation and expected randomness has arisen just because of chance.

The implication is that Pinnacle’s odds provide a really excellent measure of truth, and furthermore that the vast majority of returns that bettors will experience betting on them, over six-matches at least, will simply be a matter of good and bad luck.

This is a very unpalatable message and I get criticised for it all the time. The key criticism hinges about the phrase “on average”. Pinnacle’s soccer odds might well be efficient on average, but bettors aren’t betting on average. That’s certainly true, but the hard part for the bettor comes in knowing how to find the bookmaker’s specific errors systematically. The evidence suggests the vast majority are ‘finding’ them randomly.

I’ve repeated the exercise for 12 and 24-game samples. Their score distributions are shown together below. They follow the randomly generated normal distributions even more closely than for the six-game scores.

Randomness-in-soccer-betting-InArticle-2.jpg

Average 12 and 24-game scores were 0.0037 and 0.0049 respectively (the slight differences between all three are likely due to chance and different-sized populations of contributing matches). Standard deviations were 0.1301 and 0.0916 respectively, compared to 0.1315 and 0.0930 calculated using the standard error of the mean. These differences from expectation are each about 1 standard deviation, again suggesting they happened because of chance and nothing else.

Regression to the mean

If deviations in six-game average scores were systematic, then you might be able to predict what would happen. For example, teams with good form over six-games showing a positive average score might be predicted to show another positive average score over the next six games. Sadly, this is not the case. There is almost perfect regression to the mean over six-game samples.

Remember, however, I’m not arguing that stronger teams over six games will tend to become less strong over the next six games. On the contrary, stronger teams tend to remain stronger teams. Just look at Liverpool this season. I’m arguing that the scoring rewards they receive for doing so, which are handicapped in a betting market to take into account their underlying abilities, are mean regressing.

The chart below shows it’s practically impossible, on average, to predict what average six-game score a team will get in games 7 to 12 based on their score in games one to six. There might be winning streaks for soccer teams, but not for bettors betting on the handicapped rewards.

Randomness-in-soccer-betting-InArticle-3.jpg

Bookmakers’ skill in setting odds, and bettors’ skill in exploiting away any errors if they exist means that by market closing, almost all the variation you see in betting scores is the result of aleatory uncertainty: chance.

A scoring rule?

Last month I discussed the use of the rank probability score (RPS) as a scoring rule to help measure the efficiency of Pinnacle’s soccer betting market. We could actually think of the scoring rule I’ve used in this article in a similar way.

If Pinnacle’s odds were perfectly efficient, the population average score would be exactly 0. Of course, as for the RPS, we’ll never know how much of the deviation from 0 is a result of aleatory uncertainty (randomness in the results) and how much is a result of epistemic uncertainty (error in the bookmaker’s odd setting model). I’ve potentially introduced a secondary epistemic error in the way I’ve removed the margin from Pinnacle’s odds. Since I don’t know exactly how they apply it, I’ve had to estimate how to remove it.

Nevertheless, the closeness with which the population average score approaches 0, and closeness with which the distributions of match sample average scores approaches a random distribution is strong evidence of Pinnacle’s soccer match betting odds efficiency.

What about the hot hand fallacy?

All of this discussion leaves me with an outstanding problem. Two years ago I presented a betting system that attempted to exploit inefficiency in Pinnacle soccer match odds that may arise because of the hot hand fallacy.

The hypothesis was that bettors may believe in winning streaks. Consequently, they may over bet teams on winning runs, shortening their odds relative to true outcome probabilities. By contrast, cold teams would be under bet, lengthening their odds and potentially creating value opportunities.

The difference between betting relatively colder teams versus relatively hotter teams over 6 match sequences was weakly significant (p-value = 0.02 for the match population analysed). For the coldest versus hottest teams it was much stronger (p-value = 0.001) and a real profit of 2.7% from a sample of over 5,000 wagers (average odds 3.9) could have been made from Pinnacle’s closing odds. But if this analysis presents the market as almost perfectly efficient, was this all just a lucky illusion?

The job of the bookmaker is to get as close as possible to the true odds as they can... The job of the bettor is to find their mistakes.

It’s possible. That 2.7% profit could happen 7 in 100 times by chance, so it’s hardly guaranteed in a statistical sense. Nevertheless, if you look closely at the 6-game score averages distribution again, you can see that there are fewer large negative scores than would be expected by chance. 438 of them are less than -0.3, compared to 563 for those randomly generated.

A possible explanation for this is that with bettors under betting sequentially losing teams, their odds lengthen more than they should, meaning each time they lose they pick up a relatively smaller negative score. Had you bet on those 438 teams with a 6-game average score of less than -0.30 in their 7th game, you’d have made a profit of 11.6% (average odds 3.22, p-value = 0.06).

In fact, only 438 scores of less than -0.3 was quite a lucky result. With these odds, randomising the results based on their implied probabilities revealed an expected figure of 513, with only 2.5% showing fewer than 438 in a Monte Carlo simulation. We can then compare this expected figure of 513 to the expected figure assuming a perfect normal distribution and a population average score of 0. This happens to be 584. Only 3.5% of simulation runs had more than 584 six-game average scores less than -0.3. That’s what statisticians would describe as weekly significant. Perhaps the distribution of actual 6-game scores is not random after all.

Of course, the same reasoning should apply to ‘hot’ teams. Sequential winning should see their odds shorten relative to true outcome probabilities, meaning there should be fewer large positive scores than expected as well. But that is not what we see in this data sample.

Still more to learn about randomness and efficiency in soccer betting

If there is anything non-random to be found in the distribution of soccer betting scores, this analysis confirms that it is hard to find. There is indeed a fine line between random noise and a potentially exploitable systematic signal, which will only reveal itself over long and repeated play to the most talented and hard-working of bettors.

This article bears the fruit of that work. There’s nothing particular new, just a representation of those ideas in a different light. I hope you find it useful.

In general, it is very clear that Pinnacle’s odds are very efficient (even if on average only), and systematic inefficiencies – the hot hand fallacy possibly being one of them – are elusive. Furthermore, for the most part Pinnacle do a very good job of ensuring that if those inefficiencies exist, they remain largely within the boundaries of their margins. Beating Pinnacle’s odds, well, in English league soccer at least, is certainly not for the faint-hearted.

Betting Resources - Empowering your betting

Pinnacle’s Betting Resources is one of the most comprehensive collections of expert betting advice anywhere online. Catering to all experience levels our aim is simply to empower bettors to become more knowledgeable.