close
Two weeks ago
Nov 10, 2017

# An analysis of different expected goals models

## What kind of expected goals model is the most accurate?

Previously confined to a small sports data community, the expected goals metric now features among other common soccer statistics such as possession, shots on target and number of fouls committed. However, there are numerous approaches to expected goals. This article examines the different models used and how they produce different outputs.

The aim in soccer is to score against your opponent without conceding a goal. It sounds simple but because of things like randomness and luck, teams don’t always get the results they “deserve.”

This is why data analysis and metrics such as expected goals are useful in sports betting - we can analyse performances from a more analytical standpoint and give substance to claims such as “they were unlucky not to win.”

Because a shot is the defining action of a goal, shot data is key to any expected goals model.

Expected goals (often abbreviated to xG) is one form of data analysis that soccer teams use and is something that is becoming increasingly popular amongst bettors. Expected goals stats are widely available online but they aren’t always the same because different models are used to calculate them.

Models can range from the simple to the complex and below is an explanation of how different expected goals models work. So what are the mechanics behind these different models and how different are the outputs they produce?

### Using basic shot data

Andrew Beasley has previously explained how to calculate expected goals using a basic shot data model. Because a shot is the defining action of a goal, shot data is key to any expected goals model - there are countless events within a soccer match that contribute to a goal being scored but when trying to predict this particular outcome, shots are undoubtedly the most important.

This is a simple approach that uses what Opta defines as a “big chance” - a situation where a player should reasonably be expected to score - as well as shots taken from both inside and outside the box.

Conversion rates from the past five Premier League seasons mean a big chance has an xG value of 0.387 (38.7% chance of scoring), shots inside the box have a value of 0.070 and those from outside the box have a value of 0.036.

### Detailed analysis of shot data

Given the size of the pitch in soccer, the various angles from which a shot can be taken and the impact this has on the likelihood of scoring, whether or not a model analyses shot location in more detail will influence its expected goals output.

Although similar to Andrew Beasley’s basic expected goals model, this kind of approach uses a more in-depth analysis of the location a shot is taken from to assign its xG value. The easiest way to do this is to divide the shooting range to goal into a grid and plot each shot.

The benefit of using this kind of model is that it accounts for the difference in a player shooting from directly in front of the goal (very likely to score) and a player shooting from an acute angle (much less likely to score) as well as whether the shot came from a player’s head (harder to score) or foot (easier to score).

Paul Riley’s model is a good example of taking a slightly more advanced approach to analysing shot location data when building an xG model.

### Considering the attacking process

Of course, it isn’t just where a shot is taken from and what body part is used that will determine how likely an attempt is to be converted. The passage of play that precedes a shot will have a bearing on the quality of that chance.

Instead of simply assigning an xG value to a shot based on where it was taken from, some models will look at how the shooting opportunity was created (a cross, a through ball, a counter attack etc.) and analyse how the shot was taken in more detail (a shot following a successful dribble, a rebound after a save etc.).

Obviously, this kind of model requires a lot more data and resource to both build and maintain - 11tegen11’s xG model is one example of an expected goals model that makes consideration for the wider attacking process when assigning its xG value to shots.

### The impact defence has on xG

The three previous ways of modelling expected goals all do a good job of providing an estimation for how many goals a team should expect to score in a game or over an entire season. However, there are other variables that contribute to a potential goal scoring opportunity.

Instead of simply assigning an xG value to a shot based on where it was taken from, some models will look at how the shooting opportunity was created and analyse how the shot was taken in more detail.

Soccer isn’t just about attacking. Defensive positioning and reducing your opponent’s chance of scoring is just as important - defenders can force a player to shoot a different way or make last minute adjustments that make it harder to score.

In addition to analysing the entire attacking process - from how a chance is created to where the final action takes place - using the proximity of opposition defenders and how that affects the quality of a shot adds another level of detail to expected goals modelling.

This means that looking at where the goalkeeper and defenders are positioned in relation to where a shot is taken from could produce the most accurate expected goals output of all.

### What kind of expected goals model is the most accurate?

Now that we know how different expected goals models work, we can begin to analyse which method produces the most accurate results. The table below compares the actual goal difference for each team from the 2016/17 Premier League season and the expected goal difference output using the different expected goals models mentioned above.

## Actual goal difference vs. Expected goal difference

 Team Actual GD Model 1 xGD Difference Model 2 xGD Difference Model 3 xGD Difference Arsenal +33 +12.5 -20.5 +17 -16 +15.39 -17.61 Bournemouth -12 -6.80 +5.20 -15 -3 -13.76 -1.76 Hull City -43 -33.80 +9.20 -35 +8 -38.88 +4.12 Burnley -16 -19.20 -3.20 -26 -10 -21.06 -5.06 Chelsea +52 +25.90 -26.10 +31 -21 +31.91 -20.09 Crystal Palace -13 -1.50 +11.50 -5 +8 -6.05 +6.95 Everton +18 +5 -13 +1 -17 +1.82 -16.18 Sunderland -40 -27.40 +12.60 -26 +14 -30.56 +9.44 Leicester City -15 -7.60 +7.40 -7 +8 -6.65 +8.35 Liverpool +36 +25.30 -10.7 +33 -3 +31.87 -4.13 Manchester City +41 +41.80 +0.80 +44 +3 +51.13 +10.13 Manchester United +25 +25 0 +24 -1 +29.48 +4.48 Middlesbrough -26 -21 +5 -25 +1 -22.46 +3.54 Southampton -7 +6.60 +13.60 +8 +15 +8.15 +15.15 Stoke City -15 -0.60 +14.40 -2 +13 +0.45 +15.45 Swansea City -25 -21.70 +3.30 -20 +5 -27.34 -2.34 Tottenham Hotspur +60 +32.50 -27.50 +30 -30 +31.04 -28.96 Watford -28 -12.20 +15.80 -13 +15 -16.14 +11.86 WBA -8 -11.80 -3.80 -7 +1 -8.52 -0.52 West Ham United -17 -11.10 +5.90 -7 +10 -9.83 +7.17

The best way to assess the accuracy of each of these approaches is to find the root-mean-square deviation (RMSD) - sometimes referred to as root-mean-square error (RMSE). This is done by squaring the difference in actual goal difference and expected goal difference for each team, calculating the average and then finding the square root of that average.

## Expected goal model accuracy

 Model 1 xGD Model 2 xGD Model 3 xGD RMSD 12.92 12.55 12.01

As you can see, the three different approaches are incredibly similar in the output they produced in terms of expected goal difference in the 2016/17 Premier League season - only 0.91 RMSD separates all three despite the varying levels of data used.

However, one season (380 games) isn’t a big enough sample size to declare that one approach is better than the other with any kind of certainty. Additionally, calculating the RMSD on a game-by-game basis is more likely to provide insight into each model’s accuracy and how close they are to predicting the number of goals scored in a match.