Cows, chihuahuas and normalising football data

Twenty3 - Normalising football data

A town in Wales made the news recently when a graphic designer painted alternative measurements for social distancing on the pavement. Seven chihuahuas, 85.36 pound coins, 22 scones and 33.7 carrot cakes for the cafe. In other rural areas, the length of a cow has been used as a reminder of how much space two metres takes up.

All represent ways that a figure can be put into a context for people to understand and that can be useful to them. Some information simply approaches meaninglessness without context (in a similar vein, that COVID-19 seemed most dangerous in those with underlying health conditions was a reassurance for some, until it was estimated that 40 percent of Americans have one).

Football, with its growing mass of statistics, is the same. We learn some benchmarks like nursery rhymes (‘20 goals a season striker’, ‘a goal every two games’, ‘40 points’), but many are still new. And, confusing the matter, is the fact that players all play different amounts of time on the pitch.

That’s why stats are often presented in ‘per 90-minute’ form. This Premier League season, Son Heung-Min, Teemu Pukki, and Riyad Mahrez all scored 11 goals — but the number of minutes they played were (respectively) 2485, 2892, and 1940. The stats need to be ‘normalised’ in some way, and because a match is 90 minutes long, these 11 goals become 0.4, 0.34, and 0.51 goals per 90.

2019/20 Premier League goals comparison between Son Heung-Min, Teemu Pukki, and Riyad Mahrez

This method isn’t perfect, but it’s pretty good, both in its results and in the simplicity to understand. You could describe it as “if the player played a full game, this is roughly what they’d do”.

Serial substitutes throw a spanner in the spokes a little bit, though. When players come on at the end of matches — often against tired opponents or one of the teams is chasing the game — they’re not playing in ‘normal’ gametime, and so their stats aren’t going to be normal. It’s usually wise to keep in mind that ‘sub effects’ might be at play.

There are other problems that people try and solve by normalising stats in other ways, although none of them have reached the consensus or widespread use as ‘per 90-ing’. 

One is that, between injury time added on and stoppages taking precious seconds away, matches are rarely 90 minutes long. Some people, then, choose to use a different number other than 90 to divide the statistics by. 

An alternative to using minutes as the method of normalising stats is to use the amount of times when a player or their team actually gets the ball. There are some, then, who use ‘per 100 touches’ or ‘per 100 possessions’ as their method of normalising football data.

And then we have defensive adjustments.

Defensive stats have long been a source of pain for the spreadsheet-minded, far harder to find meaning in than the shot-based metrics of attackers. More shots usually, roughly, means a better striker; more tackles, or more clearances, or more interceptions doesn’t necessarily mean a better defender.

An obvious reason put forward for this is that players on bad teams will do more defending! If they don’t have the ball (the theory goes) then they have more opportunity to be making defensive actions. Defenders on these teams might have higher tackle numbers, for example, simply because their side has less possession.

To combat that, some people have tried ‘possession adjusting’ defensive stats to try and eke out more value from them. When possession adjusting, a player on a team who average more than 50% possession will have their figures boosted slightly, and a player on a team who averages less than 50% possession will have theirs reduced.

As someone who’s done some brief research on this subject, the logic of this varies both by position of the player and type of defensive action statistic. 

At this point, one has to ask oneself the same question as with all of these types of normalising football stats: how much does this fix my problem?

Although it seems like a simple question it really has two aspects to it: the main one, of normalising the statistics, but also the communication. Is a five per cent increase in accuracy worth a 20 per cent drop in how easy the method is to understand? 

For some cases and for some people, the answer will be yes; for others, it’ll be no. Like measuring two metres in cows or chihuahuas, you need to put football stats into the context that works for you.

If you’d like to learn about our products or services, and how they might be able to help you, don’t hesitate to get in touch.