Jan 10, 2020
Jan 10, 2020

Part two: Using Bayes Factor to assess betting skill

Using the Bayes Factor to analyse a betting record

The Bayes Factor and closing line value hypothesis

The shortcomings with Bayes Factor

Part two: Using Bayes Factor to assess betting skill
In part one of this article, Joseph Buchdahl introduced the idea of using the Bayes Factor to test betting skill. Now, he uses real betting histories to show how we can test whether results are down to luck or skill. Read on to find out more. 

In part one of this two-part article I introduced the Bayes Factor as a metric that can be used to compare the relative strengths of two competing statistical hypotheses. Unsurprisingly, this will have its uses in betting analysis.

Now, in part two, I will examine three examples of how the Bayes Factor can be used in this context, and in particular how we can assess if a bettor possesses any measure of skill in returning a profit.

Using the Bayes Factor to analyse a betting record

Perhaps the most obvious way of determining whether a bettor has any skill is to compare what they think they should achieve given their forecasting methodology to what the bookmaker expects them to achieve. By default, a bookmaker will expect, or at least hope, that every bettor will lose a percentage of revenue defined by their margin. For Pinnacle’s popular markets that is typically around -2.5%.

We can use the Bayes Factor to estimate the likelihood that a bettor believing themselves to be able to achieve better than this is demonstrating a measure of skill.

Using Excel’s NORMDIST function as described in part one, the chart below plots the likelihood ratios (LR) and Bayes Factors (BF) for an Asian Handicap or point spread bettor placing 1,000 bets at odds of 1.95, with an expectation of +5% (H1). With a margin of -2.5%, the implied fair odds and win probability of each bet are 2.00 and 50% respectively (H0). 

The chart shows how LR and BF vary with observed yield.


Should the bettor achieve a yield of +5% having expected that, BF = 13.7 and LR = 19.3. According to Jeffreys this would imply strong, but not decisive, evidence that they were achieving this through skill rather than luck. Compare this to a p-value of 0.75% (or 1 in 133).

Clearly, a Bayes Factor analysis will draw more conservative conclusions than a p-value equivalent, and rightly so. Too often, bettors can be fooled by low p-values into believing that they imply evidence of skill when in fact they simply tell you the probability of something happening by chance assuming no skill.

If you beat the fair closing line by +5%, your profit expectation is 5%, and testing large samples of soccer match odds data have revealed that you will typically make +5%.

For decisive evidence of +5% skill, a bettor would need a yield of about +7.4% after 1,000 bets, but if they had achieved such a performance, we might prefer a different version of H1 (for example H1 = 7.4%), and we could test it against the original H1 = +5% or indeed against H0 = -2.5%. Remember, a Bayes Factor analysis only compares the relative likelihoods of two hypotheses, not either of them to the ‘truth’.

To achieve a decisive level of evidence (BF = 100) that an observed yield will match an expected advantage of +5%, given a bookmaker’s margin of -2.5%, would require about 1,675 bets. For such a record, the p-value would now be 0.08% or 1-in-1,250. Some statisticians increasingly recommend a much tougher p-value threshold before declaring statistical significance. Nassim Taleb, author of Fooled by Randomness and The Black Swan, for example, has advocated a minimum p-value of 0.1%. In this example this would fit closely with a Bayes Factor of about 100.

The chart below illustrates how LF and BF vary with the sample size of bets for this scenario where H0 = -2.5%, H1 = +5%, and the observed performance matches H1 exactly. BF is typically smaller than LR where H1 and observation are close, because of the use of a probability distribution to describe H1; this increases uncertainty and decreases confidence relative to the use of a specific value of H1 used in a pure likelihood ratio test.

When H1 is further away from observation, BF may be greater than LR as the chart above illustrates clearly, and as was the case for the coin toss example in part one.


Changing the odds of course dramatically changes the numbers. With odds of 5.00, a +5% observed performance with H1 = +5% and H0 = -2.5% over 1,000 bets has a Bayes Factor of just 2.89. Bigger odds; bigger variance, bigger uncertainty.

Now it would be impossible to rule out luck, although with a p-value of 4.57% some observers might choose to do just that. We need about 3,500 bets to achieve a BF = 100. The equivalent p-value is again about 0.08% or 1-in-1,250. For odds of 7, we need 10,400 bets for decisive H1 evidence here, and again the p-value is 0.08% (1-in-1,250). Taleb and Jeffreys are evidently on the same page.

Using the Bayes Factor to confirm goodness of fit

We can also use the Bayes Factor as a quasi-goodness-of-fit test. In such a test, when actual outcomes closely match those which were expected (predicted) a priori, this is an indication that our model is doing what it is supposed to be doing.

Since August 2015, I have been publishing value betting picks based on a Wisdom of Crowd methodology that uses the wisdom (efficiency) of Pinnacle’s soccer match betting odds as the basis of determining ‘true’ outcome probabilities.

The methodology’s hypothesis is that the ratio of another bookmaker’s odds to Pinnacle’s odds with their margin removed provides your expected value. For example, if bet365 offer odds of 2.5 for Liverpool to beat Manchester City, and Pinnacle have a fair price of 2.4 once their margin has been removed, your expected value for such a bet is 2.5/2.4 = 4.17%. Aggregated over a sample of bets, your expected value is simply the average expected value over those bets.

Knowing specifically the expected value of the betting history (H0) allows us to directly compare it to the actual yield (H1) after each bet. The closer expected and actual yields are, the more likely it is that methodology is working as predicted. The Bayes Factor allow us to make such a goodness of fit comparison. The closer to one the value is, the better the fit between expectation and performance.

The time series plot below shows the evolution of the likelihood ratios and Bayes Factors after each bet in the time series.

The below-par performance during the first 1,000 bets meant that a Bayes Factor analysis could not rule out that there was something wrong with my model, since there was moderate evidence that expected performance (H0) was significantly different to actual performance (H1). Thereafter, performance regressed towards the predicted mean, and both LR and BF rarely strayed much from a figure of one. After 9,681 matches, the expected yield was 4.18% whilst the actual yield was 4.02%.


Using the Bayes Factor to test the closing line value hypothesis

Readers familiar with my work will be aware of my support for the closing line value (CLV) hypothesis, the idea that the closing line or closing odds (pre-margin), particularly for 1X2 soccer markets, represents the best possible measure of win probability, and is an excellent predictor of actual betting yield.

Too often, bettors can be fooled by low p-values into believing that they imply evidence of skill when in fact they simply tell you the probability of something happening by chance assuming no skill.

If you beat the fair closing line by +5%, your profit expectation is 5%, and testing large samples of soccer match odds data have revealed that you will typically make +5%.

Nevertheless, I am open to the possibility that this hypothesis might not always be accurate. Indeed, in September 2019 I explored the weak inefficiency of a tennis match betting market whilst reviewing the performance of @nishikoripicks, a tennis tipster showing an +8.6% yield despite a CLV expectation of -0.3%. Such a discrepancy is hugely indicative that something could be wrong with the CLV hypothesis, at least for tennis. We can use the Bayes Factor to find out just how indicative.

Again, following the methodology I described in part one, I have calculated the rolling likelihood ratio and Bayes Factor after every bet in @nishikoripicks’ betting record and plotted them on the chart below.

For this, evH0 is assumed to be the cumulative expected closing line value. For each bet the expected closing line value is calculated by the ratio of @nishikoripicks’ advised betting odds to the closing odds with Pinnacle’s margin removed. For example, if he had bet 2.5 and the closing price with margin removed was 2.45, then his expected closing line value would be (2.5/2.45) – 1 = 2.041%.

The cumulative expected closing line value is then the average for all preceding bets. evH1 was assumed to be equivalent to the current yield after each bet, and thus is similarly updated on a bet by bet basis as evH0 is. In other words, @nishikoripicks’ current yield after each bet was considered the best measure for his actual expected value.


Whilst the Bayes Factor is typically more conservative than the likelihood ratio (as we’ve previously noted when the observed data matches H1), they are broadly similar. After about 2,000 bets there is a sustained decisive difference between the two models for what we should expect @nishikoripicks’ yield to be. 

By the end of his history, BF = 1,912 and LR = 2,704. If it is assumed that @nishikoripicks’ actual yield is an accurate measure of his expected yield, then this would imply that the closing line value hypothesis is very likely to be incorrect in this case. 

Of course, this Bayes Factor analysis doesn’t tell us that @nishikoripicks’ actual yield is an accurate measure of his expected yield, we’ve just assumed this to be the case in this analysis. It simply tells us that if it is, it’s decisively better than the closing line value hypothesis.

However, it could be that he’s been luckier than expected; perhaps his true expected yield is 5%. If this was the case, then a comparison of the two models H1 = +5% and H0 = -0.3% would show a BF of only 11.8.

Furthermore, there remains the possibility that Pinnacle are not responding to @nishikoripicks’ market activity for reasons other than market inefficiency. As long as we don’t know what volumes he bets, what volumes his customers bet, and even whether those customers bet at Pinnacle at all, the possibility remains that lines don’t respond in a way we’d expect them to (i.e. by 8.6% + margin) simply because their activity is not sufficient to move them by that much. 

In contrast to @nishikoripicks’ failure to move lines in accordance with the closing line value hypothesis, another bettor whose record I analysed for my article on using the closing line to test your skill in betting shows odds movements broadly in line with the hypothesis.

This individual’s 2019 tennis betting record consists of 2,223 bets showing an expected closing line value of 2.96% and an actual level stakes yield of 4.37% (although his actual return was a bit less on account of variable stake sizes). If H1 = 4.37% and H0 = 2.96%, then LR = 1.22 and BF = 0.86, implying that neither model is superior to the other.

Including his other sports as well, his overall record for 2019 to date is as follows: bets = 14,333; expected yield = 2.92%, actual level stakes yield = 3.51%, LR = 1.25, BF = 0.88.

Given that the odds movements of the sort that this bettor witnesses are completely impossible to arise by chance, such numbers would be consistent with the closing line value hypothesis (CLVH) being a valid one. Although @nishikoripicks does see some odds shortening (about 3%), why it is not a lot more and in line with his actual returns must remain an open question.

The shortcomings with Bayes Factor

To my mind, a significant shortcoming with the Bayes Factor is that it is still very much like a frequentist p-value in as much as it based on probabilities of the data occurring given the hypothesis or model is true. The real success of Bayesian statistics is that it identifies the reverse – the probability that hypothesis is true given the data we see.

Perhaps this is selling the Bayes Factor a little short. In fact, a fuller expression based on the probability of hypotheses being true, is shown below


P(H0) and P(H1) are the prior probabilities of the two competing hypotheses being true, whilst P(H1|D) and P(H0|D) are the posterior probabilities of H1 and H0 being true given some observed data.

When P(H1) = P(H0) then the Bayes Factor is specifically the ratio of the posterior probabilities, and the likelihood that one model is truer than the other.

The problems with Bayesian statistics, however, is that so often we don’t know what the prior probabilities of the models being true are. What is the prior probability that the closing line value hypothesis is true? Is it the same as the prior probability that @nishikoripicks’ actual performance is a valid measure of his expectation? 

Whilst doubt about prior probabilities remains, Bayesian analysis is always limited. Doubt and uncertainty, however, are very much what Bayesian statistics is all about, which regards ‘truth’ not as absolute but rather as probabilistic, always updatable with new data. The more data we get, the closer we get to the ‘truth’.

What have we learnt about the Bayes Factor and betting skill?

This pair of articles has revealed how the Bayes Factor can be used to test competing hypotheses a bettor might have about their performance, for example skill versus luck, why it is happening, and whether it reflects expectation. It provides another tool in the bettor’s armoury to help identify if they are a skilled bettor or otherwise.

For most betting performances the likelihood ratio will be quite adequate as a substitute for the computationally more complex Bayes Factor.

Betting Resources - Empowering your betting

Pinnacle’s Betting Resources is one of the most comprehensive collections of expert betting advice anywhere online. Catering to all experience levels our aim is simply to empower bettors to become more knowledgeable.