close
Two weeks ago
Nov 10, 2017

An analysis of different expected goals models

How do you calculate expected goals?

What are the different approaches to expected goals modelling?

What kind of expected goals model is the most accurate?

An analysis of different expected goals models

Previously confined to a small sports data community, the expected goals metric now features among other common soccer statistics such as possession, shots on target and number of fouls committed. However, there are numerous approaches to expected goals. This article examines the different models used and how they produce different outputs.

The aim in soccer is to score against your opponent without conceding a goal. It sounds simple but because of things like randomness and luck, teams don’t always get the results they “deserve.”

This is why data analysis and metrics such as expected goals are useful in sports betting - we can analyse performances from a more analytical standpoint and give substance to claims such as “they were unlucky not to win.”

Because a shot is the defining action of a goal, shot data is key to any expected goals model.

Expected goals (often abbreviated to xG) is one form of data analysis that soccer teams use and is something that is becoming increasingly popular amongst bettors. Expected goals stats are widely available online but they aren’t always the same because different models are used to calculate them.

Models can range from the simple to the complex and below is an explanation of how different expected goals models work. So what are the mechanics behind these different models and how different are the outputs they produce?

Using basic shot data

Andrew Beasley has previously explained how to calculate expected goals using a basic shot data model. Because a shot is the defining action of a goal, shot data is key to any expected goals model - there are countless events within a soccer match that contribute to a goal being scored but when trying to predict this particular outcome, shots are undoubtedly the most important.

heat-map-inarticle.jpg

This is a simple approach that uses what Opta defines as a “big chance” - a situation where a player should reasonably be expected to score - as well as shots taken from both inside and outside the box.

Conversion rates from the past five Premier League seasons mean a big chance has an xG value of 0.387 (38.7% chance of scoring), shots inside the box have a value of 0.070 and those from outside the box have a value of 0.036.

Detailed analysis of shot data

Given the size of the pitch in soccer, the various angles from which a shot can be taken and the impact this has on the likelihood of scoring, whether or not a model analyses shot location in more detail will influence its expected goals output. 

grid-inarticle.jpg

Although similar to Andrew Beasley’s basic expected goals model, this kind of approach uses a more in-depth analysis of the location a shot is taken from to assign its xG value. The easiest way to do this is to divide the shooting range to goal into a grid and plot each shot.

The benefit of using this kind of model is that it accounts for the difference in a player shooting from directly in front of the goal (very likely to score) and a player shooting from an acute angle (much less likely to score) as well as whether the shot came from a player’s head (harder to score) or foot (easier to score).

Paul Riley’s model is a good example of taking a slightly more advanced approach to analysing shot location data when building an xG model.

Considering the attacking process

Of course, it isn’t just where a shot is taken from and what body part is used that will determine how likely an attempt is to be converted. The passage of play that precedes a shot will have a bearing on the quality of that chance.

Instead of simply assigning an xG value to a shot based on where it was taken from, some models will look at how the shooting opportunity was created (a cross, a through ball, a counter attack etc.) and analyse how the shot was taken in more detail (a shot following a successful dribble, a rebound after a save etc.).

pass-inarticle.jpg

Obviously, this kind of model requires a lot more data and resource to both build and maintain - 11tegen11’s xG model is one example of an expected goals model that makes consideration for the wider attacking process when assigning its xG value to shots.

The impact defence has on xG

The three previous ways of modelling expected goals all do a good job of providing an estimation for how many goals a team should expect to score in a game or over an entire season. However, there are other variables that contribute to a potential goal scoring opportunity.

Instead of simply assigning an xG value to a shot based on where it was taken from, some models will look at how the shooting opportunity was created and analyse how the shot was taken in more detail.

Soccer isn’t just about attacking. Defensive positioning and reducing your opponent’s chance of scoring is just as important - defenders can force a player to shoot a different way or make last minute adjustments that make it harder to score.

In addition to analysing the entire attacking process - from how a chance is created to where the final action takes place - using the proximity of opposition defenders and how that affects the quality of a shot adds another level of detail to expected goals modelling.

This means that looking at where the goalkeeper and defenders are positioned in relation to where a shot is taken from could produce the most accurate expected goals output of all.

What kind of expected goals model is the most accurate?

Now that we know how different expected goals models work, we can begin to analyse which method produces the most accurate results. The table below compares the actual goal difference for each team from the 2016/17 Premier League season and the expected goal difference output using the different expected goals models mentioned above.

Actual goal difference vs. Expected goal difference

Team

Actual GD

Model 1 xGD

Difference

Model 2 xGD

Difference

Model 3 xGD

Difference

Arsenal

+33

+12.5

-20.5

+17

-16

+15.39

-17.61

Bournemouth

-12

-6.80

+5.20

-15

-3

-13.76

-1.76

Hull City

-43

-33.80

+9.20

-35

+8

-38.88

+4.12

Burnley

-16

-19.20

-3.20

-26

-10

-21.06

-5.06

Chelsea

+52

+25.90

-26.10

+31

-21

+31.91

-20.09

Crystal Palace

-13

-1.50

+11.50

-5

+8

-6.05

+6.95

Everton

+18

+5

-13

+1

-17

+1.82

-16.18

Sunderland

-40

-27.40

+12.60

-26

+14

-30.56

+9.44

Leicester City

-15

-7.60

+7.40

-7

+8

-6.65

+8.35

Liverpool

+36

+25.30

-10.7

+33

-3

+31.87

-4.13

Manchester City

+41

+41.80

+0.80

+44

+3

+51.13

+10.13

Manchester United

+25

+25

0

+24

-1

+29.48

+4.48

Middlesbrough

-26

-21

+5

-25

+1

-22.46

+3.54

Southampton

-7

+6.60

+13.60

+8

+15

+8.15

+15.15

Stoke City

-15

-0.60

+14.40

-2

+13

+0.45

+15.45

Swansea City

-25

-21.70

+3.30

-20

+5

-27.34

-2.34

Tottenham Hotspur

+60

+32.50

-27.50

+30

-30

+31.04

-28.96

Watford

-28

-12.20

+15.80

-13

+15

-16.14

+11.86

WBA

-8

-11.80

-3.80

-7

+1

-8.52

-0.52

West Ham United

-17

-11.10

+5.90

-7

+10

-9.83

+7.17

The best way to assess the accuracy of each of these approaches is to find the root-mean-square deviation (RMSD) - sometimes referred to as root-mean-square error (RMSE). This is done by squaring the difference in actual goal difference and expected goal difference for each team, calculating the average and then finding the square root of that average.

Expected goal model accuracy

Model 1 xGD

Model 2 xGD

Model 3 xGD

RMSD

12.92

12.55

12.01

As you can see, the three different approaches are incredibly similar in the output they produced in terms of expected goal difference in the 2016/17 Premier League season - only 0.91 RMSD separates all three despite the varying levels of data used.

However, one season (380 games) isn’t a big enough sample size to declare that one approach is better than the other with any kind of certainty. Additionally, calculating the RMSD on a game-by-game basis is more likely to provide insight into each model’s accuracy and how close they are to predicting the number of goals scored in a match.

Want to learn more about expected goals?

If you want to know more about expected goals and apply this knowledge to betting, Andrew Beasley has written about how this metric can be applied Premier League betting.

You can also follow Paul Riley and 11tegen11 on Twitter. For a more visual representation of expected goals, UnderStat provides useful graphics using xG statistics from the top five European leagues.

Betting Resources - Empowering your betting

Pinnacle’s Betting Resources is one of the most comprehensive collections of expert betting advice anywhere online. Catering to all experience levels our aim is simply to empower bettors to become more knowledgeable.