The Data Game: Mark Carey’s approach to data-driven analysis


The Data Game series sees Twenty3 speak to prominent writers and figures from the world of football to get their opinion on the use of data and analytics in the beautiful game. In the third instalment, we spoke to Mark Carey, who co-hosts the Football Fanalytics podcast and is the Data Analyst at The Athletic.

Here’s what Mark had to say…

Why is context so important when looking at data?

This is something we discuss a lot on the podcast – context is key. The best and worst thing about using data is that you can shape it to tell the story you want it to. So with great power comes great responsibility!

One way to highlight the importance of context is with passing data. For example, Pierre-Emile Højbjerg has a pass completion rate of 90% in the Premier League this season –  but is that good compared with other midfielders in the league? What type of passes are they? How does that relate to the team style he is asked to play in? You can build a picture of a player’s performance with the use of data in addition to the wider context, but rarely on its own.

Pierre-Emile Hojbjerg stats for the 2020/21 Premier League season.

Without giving away too many secrets, what’s your approach to profiling players?

As many would agree, there is no set approach, but I think this links with the answer above – start with context. It’s fairly easy to see which players top the list for certain metrics but that doesn’t explain why a player may be producing that performance.

Understanding the tactical set-up of the team can add more information, and then of course it’s crucial to apply what you find in the data with video analysis and real-life scouting. Again, we say it frequently on the podcast, data is there to support traditional scouting and recruitment methods but never to replace it. It can provide a useful filter to direct your approach, but nothing replaces profiling a player in real-life.

What’s the most underrated or overlooked statistic when profiling a player?

It might not be a statistic per se, but when looking at creative players in particular I think it is crucial to separate what they do in open play vs. set pieces. Players who take free-kicks and corners can easily inflate their creative metrics as they have greater chances to get the ball into a dangerous area without any pressure.

Many can overlook the importance of separating out what players do in open play. It’s more common practice to look at non-penalty goals now for forwards for the same reason – and I think it’s important to ensure you do the same when profiling a creative player.

How has the Twenty3 Toolbox helped the way you analyse player or team performance for The Football Fanalytics Podcast? 

The Toolbox has definitely allowed us to take the quality of the podcast to the next level, because we now have the ability to analyse players and teams right at our fingertips. Another part which has helped hugely is our ability to dovetail what we say on the podcast with the Twenty3 visualisations, which has brought the content to life. We have loved sharing the visualisations on social media and they have got a really positive response from our followers. 

Do you have any tips for anybody looking to get into the world of football analytics?

Don’t be afraid to showcase yourself and ask questions. The football analytics community is known to be a friendly bunch (on Twitter and in real life!), and the best way that people can learn is by sharing their own work and getting feedback from others.

Another tip would be to ensure you are persistent and resilient. I started my football analytics blog as a hobby in 2017, and I have only got my first job in football this year. If you really want to work in football analytics, it may take time but with the right attitude you can get there. You don’t have to know everything, but having a positive mindset and willingness to learn will take you a long way.

How important is sample size and why?

For me, this relates back to the importance of context when looking at data. There are always likely to be fluctuations in a player or team’s performance within a season, so the more data you have to work from, the more confident you can be with your conclusions. As with anything in life, you want to make sure that the inferences you make about something are reliable and based on solid foundations. The same goes for appraising a player or team with data.

As we often discuss on the podcast, if you are working with a small sample size, it is then important to acknowledge it in your thought process – identifying that this conclusion may change with access to more information.

What are the three most important things to consider when writing a data-driven piece of analysis?

That’s a good question. For me it comes down to the following:

  • Make sure you are using the right metrics – it sounds like an obvious one, but understanding what each metric can reveal will help in knowing what to look out for in your analysis. A simple example would be that assists are not the best proxy of creativity – expected assists are better, then through balls can add further information, and you can build a reliable picture from there.
  • Talk football language – it is very useful to be driven by the data for a piece of analysis, but if you can’t translate that into football language then it loses its message. This goes for analysis in the media or in a professional football club. Don’t assume that people know what each metric or statistic means, always bring it back to the language of football. 
  • Provide context – I know it has been a thread throughout, but it is so crucial. Conveying the ‘why’ in a data-driven piece of analysis is essential, and the two should always work in tandem. For example, “X player has improved in X metric, but what this means is…”

Why do you think data is so important when analysing players or teams?

The use of data has definitely become more accepted in recent years, and more clubs are appreciating the importance of it – not just in terms of getting the edge but simply keeping up with their competitors.

I think data is important in analysis simply because it removes bias. We all have a favourite player, or we ‘feel’ that someone isn’t performing well, but the data takes the emotion out of it and provides an objective picture – which is essential. 

It is also a fast, effective tool in shortlisting players from a recruitment perspective. You might overrate players who look good in a weaker league, or miss out on a player who might appear average in a strong league. But using data can be a cost-efficient method for clubs to use within their recruitment policy. Unfortunately, those who don’t use data will realise they are falling behind.