How do we know when betting results are down to luck and when they are down to skill? There are various suggestions to try and answer this question, but they still often lead to debate. In his latest article, Joseph Buchdahl explains how the Bayes Factor can be used to test betting skill. Read on to find out more.
I’m often asked how large a sample of bets needs to be before we can be sure that what we are seeing is genuinely a reflection of the bettor’s underlying ability as a forecaster or whether it simply reflects luck? In many respects this is like asking how long is a piece of string? However, over the last few years I’ve looked for ways to answer this question.
A common approach is to calculate the probability that what profitability we see in a sample of bets can arise by chance assuming the bettor has no skill, the so-called frequentist or p-value approach. When this probability is small (typically less than 5%, or 1% if we’re being more demanding), we subjectively make the case that something other than chance, for example skill, must be at work.
The drawback with the aforementioned approach is that it doesn’t tell us the probability that we are skilled. It simply calculates the probability of the data we see given the hypothesis that we are not skilled. Given a large enough number of bettors, we will always find some with very low p-values that can give the illusion of skill.
An alternative method uses Bayes theorem to estimate the probability of the hypothesis that we are skilled given the data that we observe. Furthermore, with each new bit of data (another betting outcome for example), we can update our prior probability to make a new one (the posterior probability) in an iterative chain of belief-updating.
A significant drawback with Bayes theorem, however, is that conclusions are sensitive to the choice of the initial prior probability, in this case that we thought we were skilled bettor before embarking on a betting ‘career’.
What is the Bayes Factor?
I’ve previously contrasted these frequentist and Bayesian approaches in testing for bettor skill. In this article I’d like to revisit these ideas by introducing the Bayes Factor. To my mind, the Bayes Factor offers a blend of both approaches, by calculating the likelihood ratio of two competing hypotheses or models (for example I am skilled versus I am not), by comparing the probabilities of the data given each hypothesis.
The aim of the Bayes Factor is to quantify the support for one hypothesis over the other, regardless of whether either hypotheses is correct. Mathematically, the Bayes Factor is commonly expressed as follows:
where P = probability, D = data, H1 is the model hypothesis, e.g. I hold an expected value of +5% through my skill, and H0 is the null hypothesis, e.g. I have no skill and my expectation is equivalent to the bookmaker’s margin of -2.5%. P(D|H) is a mathematical way of expressing “the probability of observing the data given that the hypothesis is true.”
Understanding the Bayes Factor: A simple example
Suppose we have a coin. We think it might be biased, but we don’t know. We previously tossed it 10 times and got seven heads, so we hypothesise (H1) that the coin is biased towards heads by a factor of 70% to 30%. With an unbiased coin (H0) the relative weights of heads and tails are 50% and 50%. We now toss it 100 times and get 60 heads. Which hypothesis is correct?
A frequentist p-value approach would calculate that the probability of getting 60 or more heads with a 50:50 expectation is only 1.76%, sufficiently small for academics to publish a paper about a biased coin. The problem, however, is that we could publish another paper about the coin being fair, since the probability of 60 or fewer heads with a 70:30 bias is just 2.10%. Both figures are statistically significant at the 95% confidence level.
The Bayes Factor, by contrast, compares the relative merits of each hypothesis without saying anything about their merits relative to the true heads expectation. Following the formula above, the Bayes Factor can be calculated from the ratio of the probabilities of obtaining 60 heads for each hypothesis. Using Excel, we have:
How to interpret a Bayes Factor
What does a figure of 0.783 actually mean? Comparing the two probabilities above it should be intuitively obvious that getting 60 heads is roughly as likely for either hypothesis. A Bayes Factor close to one implies there is little or no evidence to favour one hypothesis over the other. In this case, because it is less than 1, we might marginally favour H0 (the unbiased coin) over H1 (the biased coin).
Harold Jeffreys, the 20th century polymath, proposed an interpretation scale for the Bayes Factor. Values between one and three implies anecdotal evidence favouring H1 over H0 or (1 to 1/3 for H0 over H1). Three to 10 implies moderate evidence for H1 over H0 (or 1/3 to 1/10 for H0 over H1). Ten to 30 (and 1/10 to 1/30) imply strong evidence, 30 to 100 (and 1/30 to 1/100) very strong evidence and over 100 (under 1/100) decisive evidence.
Suppose instead we had seen 65 heads. How does this change the Bayes Factor? Recalculating the ratio above we have:
This would be much stronger evidence for believing the coin to be biased.
In this case we should remind ourselves that the Bayes Factor does not actually tell us how likely it is that H1 is true, just that it is more likely than H0. Consider seeing 90 heads. The Bayes Factor would be 85.7 billion but given that seeing 90 heads with a 70:30 bias would have less than a 1 in 2.5 million chance, it’s not likely that our belief in the bias being 70:30 would be correct.
The chart below shows how the Bayes Factor varies (logarithmically) with the number of observed heads.
What if hypotheses expectations are uncertain?
In my coin toss example, I assumed a fixed probability for the expectation of heads for both H0 (50%) and H1 (70%). That is reasonable for an unbiased coin since it’s a well-established rule of probability that an unbiased coin has an equal chance of landing heads or tails. [In fact, it’s a bit more complicated than that, and for those interested, here’s the paper.] But is this really the case for H1, the biased coin.
If we don’t know exactly what the heads expectation is, wouldn’t it be reasonable to assume it could fall within a range rather than to assign a precise value to it?
The Bayes Factor compares the relative merits of each hypothesis without saying anything about their merits relative to the true heads expectation.
In fact, this is indeed how true Bayes Factors are calculated. Thus far, I’ve really been talking about a likelihood-ratio test, where H1(70%) and H0(50%) are maximum likelihood estimates.
Where hypothesis expectations are uncertain, the Bayes Factor is in fact the ratio of the integrals of the probabilities over the full distribution of possible heads expectations. For example, we might assume that whilst a heads expectation of 70% is the most likely for H1, the ranges of possible expectations might be normally distributed with a standard deviation of 5%.
Excel is not equipped to deal with this sort of integral calculation, but it is possible to estimate it by calculating a weighted average over possible values for the heads expectation.
Assuming a normally distributed expectation with maximum likelihood 70% and standard deviation 5%, the original Bayes Factor value of 0.783 is increased to 1.26. Now H1 (the biased coin) is marginally the more favoured hypothesis. The reason for this is that with a distribution of possible heads expectations, some below 70% and closer to 60% (the observed percentage of heads) are being counted in the weighting calculation.
Using the Bayes Factors for more complex betting scenarios
The coin toss example above presents a very simple scenario where every toss has the same probability. In betting, however, this is almost never the case, even for point spread and Asian Handicap bettors. Where odds vary, it effectively becomes impossible to use the binomial distribution to calculate the probability of a particular set of betting outcomes.
Fortunately, the normal distribution for samples above about 30 provides an adequate replacement. Furthermore, using the average odds for a sample offers a robust measure of those probabilities, even when individual betting odds vary quite considerably provided stakes are the same.
In Excel, the equation for the simple version of the Bayes Factor (really the likelihood ratio) then becomes.
where y = your actual yield (or profit over turnover), evH1 = your expected value for your forecast model or betting system (what yield you expect to achieve), evH0 is the expected value for the null hypotheses (for example this might be the bookmaker’s margin), σH1 is the standard deviation in evH1 and σH0 is the standard deviation in evH0.
In February 2019, I showed how we can model a range of possible betting returns by means of the standard deviation. Specifically, I showed that we can use the following expression:
where p is the ‘true’ bet win probability, o are the betting odds and n is the number of bets.
With a little bit of rearranging we can show that:
where r is your return on investment (or y + 1).
If rH1 = evH1 + 1 and rH0 = evH0 + 1, then:
Anyone familiar with the NORMDIST Excel function may be aware that when used with the identifier FALSE, the output is in fact a probability density function rather than a simple probability. More confusingly, probability density functions can, in certain circumstances, have values greater than 1, whilst of course probabilities cannot.
In fact, probability density functions are equivalent to probabilities per unit (in this example it would be probability per yield), and more specifically the probability per unit at an infinitesimally small interval range (the derivative).
Such issues need not concern us here. Fortunately, because we are dividing one probability density function by another, the ‘per units’ cancel out, leaving just a probability ratio, which is what the Bayes Factor is.
The first part of this article should have provided a sufficient introduction into the Bayes factor and how it can be using in a betting context. In part two I will examine how we can use the Bayes Factor to test for evidence of skill in betting, as well as other model scenarios.