The past, present and future of tracking data

68

Michael D'Auria & Dominic Jordan

The past, present & future of tracking data

June 3, 2025

Tracking data has transformed the way that top teams approach and analyse the game.

But what exactly is it? And how is it used? These are the questions we set out to answer, in some depth, in Episode #68 of the TGG Podcast.

To do so, we engaged two real experts in this field: Michael D’Auria, the Executive Vice President of Sports and Technology at Genius Sports, the official tracking data provider for the Premier League; and Dominic Jordan, the Chief Data Officer for Twelve Football and former Director of Data at Manchester United. 

You can listen to the podcast via the Player below and read an edited transcript after that.

What is tracking data?

Michael D’Auria: Tracking data is a class of technology that’s trying to better capture the live action that’s going on on a football pitch. When we talk about it now, we refer to optical tracking data. So that’s a set of cameras installed in a stadium that are using computer vision to translate the live action into as robust a real data representation of the game as possible. 

This state-of-the-art technology  allows you to essentially create a real-time digital twin or digital replica of what’s going on in the game.

That opens up all types of exciting downstream possibilities for products, services and just having a better understanding of the game. Now we can track thousands of data points on the surface of every player and the ball hundreds of times a second. Over the course of a 90-minute match, you’re getting billions of data points to represent what’s going on.

What’s the origin of tracking data?

Some of this computer vision technology was from things like missile defence systems. Those are some of the early applications of it.

It’s the same type of technology that you see in self-driving cars and in lots of other applications now. It’s using cameras and different kinds of sensors to try to get the best representation of the real world. 

Our specific application is sport, which was actually a bit late to adopt some of this technology. That’s because it’s hard, it’s live, it’s dynamic. This is the reason Second Spectrum was founded. 

When we started, we were just trying to track the centre of mass of a player projected down onto the pitch in two dimensions. But as we’ve got better and AI has continued to evolve, we’ve been able to get a much more granular representation.

Back then it was a single data point per player, 25 times a second. And the data was delivered to you the next morning, a number of hours after the game. Over time, the systems have got much more mature and they can capture a huge amount more data, but also do it in real time, in less than a second.

A much earlier version of this system was first introduced in the NBA. It wasn’t league wide, but existed at a couple of different venues. Folks in the NBA ecosystem started to really get excited about the possibilities this could bring and how it could improve tactical analysis and how the game is coached and played.

The challenge at that time was that it was a huge amount of data and most NBA teams didn’t staff up teams of engineers. So we brought this combination of athletes and cutting-edge engineers into the space to say, ‘We can help you take this tracking data and turn it into something that’s going to be valuable for you as a coach, as a General Manager, as a fan.’

One of the early pieces of value we added was classifying all the pick and rolls that happened in a basketball game and on all the types of defences that a team could play against them. And it could be things that go beyond what a human can measure like shot probability or expected pass completion.

Even post acquisition by Genius, [our focus] is to create the best tracking data and then use it to create more value in the sports ecosystem, whether it’s a coach, a fan, a broadcaster, a player.

In football now, you can take that tracking data and instantly have hundred of new football metrics that fans and coaches have always wanted to talk about. Pressing and pressure and between the lines, passes and overlapping runs, things that are happening on the ball, things that are happening off the ball.

And you can deliver this much more robust and complicated language of football in real time. 

This has unlocked a new era of how data could be used to inform sports. You really saw this in the NBA over the past decade – for all 30 NBA teams, their workflows now are totally reliant on this data.

They make a decision about how to prepare for a game and how to defend the team they’re playing, to how to value a player when they’re making a trade or acquiring somebody in free agency. This level of information just didn’t exist in sport before.

Entry into English football

You started to see individual clubs dabble with this about 10 years ago, a bit later than the NBA.

As a company, as a business, we really made the transition from being just a basketball company to working in football via our partnership with the Premier League (in 2019).

Now, every Premier League ground –  and actually every Championship ground too – has our optical tracking system installed. So at a Premier League ground, you’re going to have about 28 or 30 cameras deployed throughout the stadium.

We actually use a smartphone as the core unit of our system we use. So there are 28 or 30 iPhones at every Premier League ground that are capturing data real-time for what’s going on in a game. 

I think the Premier League has been a great example of how this data you can capture can touch every part of the football ecosystem. All of the clubs use this data on a day-in and day-out basis to evaluate players, figure out strategy, help their Manager, be informed about what’s going on in the game.

The media uses it as well. Once a week, Premier League Productions puts out something called Datazone, that uses data and video augmentation to have a much different presentation of the game. Sky Sports are huge users of our data from an editorial perspective to tell deeper stories about what’s going on.

And then this year, you may have seen that we’ve launched semi-automated offside technology, which is powered by the exact same data. And so you’re now moving into the world of impacting officiating of the game as well. 

This is where we think there’s a lot of power when you can capture really high-quality data in real time and use AI to translate it into football language that someone’s going to care about.

Whether you’re a club or an official, you can really bring a lot of improvement to the game with that. That’s what we’re after. 

How do you collect tracking data?

We actually use the camera on the smartphone. It sounds a little bit interesting, but one of the things about phones is that over the last decade or so they’ve had billions and billions of dollars of investment put into them.

It’s a very competitive market and the cameras just keep getting better and better with every evolution of the phone.We can capture up to 200 frames-per-second on the iPhone and, because it’s a small unit that you can point wherever you want, you can really have a lot of focus on certain areas of the pitch where you might need more coverage.

And because they’re relatively cost effective and easy to mount, you can really scale up or scale down the numbers of phones you need. And so we think it’s going to be the right system for the future. Whereas even if new use-cases come online, it’s not so hard for us to go back to a stadium and add another 4, 5, 6, 7 phones.

We tend to put them up pretty high and try to tuck them away in the rafters or in the higher parts of the stadium. Every ground is a little bit different and we’re pretty creative about where we can fix them.

Again, they’re small units, they just come in a bit of a weatherproofed housing, and can be affixed to just about any part of stadium infrastructure. But we generally try to just get as good a coverage as we can, all the way around the pitch. 

For us, it’s really about having as many high-quality angles of the pitch as possible to make sure that in any given moment you have multiple cameras pointed at every part of the pitch.

How is the tracking data delivered to a team?

The real value is that we can deliver this tracking data FOR you. Then we can apply another layer of real time AI, to basically translate that tracking data. 

We’ll process that video into tracking data and provide that as an API to clubs. Then we’ll have another layer of processing that turns it into that language of football.

Not just, ‘where was my right ankle for this 100th of a second,’ but, ‘this was a particular football action.’ That’s also delivered as a separate data API that clubs can consume.

This all goes into a software tool that we produce for them that can further help them process this data in a live environment. So it will have a live fitness tracker to help them manage substitutions and understand if players are getting a bit fatigued, it will allow them to chart data they might want to be looking at over the course of a game, it will allow them to query the data and precisely index it to video.

So if you want to look at every time your left back made an overlapping run and joined the attack, you can very quickly pull up the video playlist of all those moments. 

It’s really using it to try to help video analysts and data analysts – whether it’s data, video or visualisations – in the most bespoke way possible.

Most clubs now have somebody who works in data science or a similar profession and their job is to connect to the live streams of this data and video, power it through whatever internal systems the club has and use it as a way to evaluate how they’re performing in the match or get ready for some reports to give to the Manager at half-time or as a post-game analysis.

We’re pretty tightly integrated with all those clubs, trying to make sure all that’s happening as smoothly as possible. They’re generally taking in a data API that might power an internal report that they’ve constructed for their club or their Manager.

A lot of the work does happen through our software tool, where we can help them process it and they can preset some things. You have a dashboard and you’re getting alerts about whatever things you might particularly care about during the match. And a huge part of football coaching and analysis is video analysis.

It’s a lot of the ways that the actual message finally gets translated to players. So a lot of this just helps them really quickly and really efficiently or fully automatically cut up video into a playlist, so a Manager can know at half-time they’re going to be able to walk in and have the seven video clips of their team not performing the way they wanted to, or show something their opponent’s doing that they need to respond to.

It’s a way we can really help clubs do that faster and more automatically.

Very often, our data will be powering some sort of report that they want to be looking at, or some sort of tracking of what was going on during the action. The greatest Managers have a really good eye test.

They can watch a game and just get a handle of what’s going on everywhere on the pitch intuitively, because they’ve watched so many games before. 

What we can really do is give them a superpower to be able to do that with AI – so something they might be missing, or something that maybe allows them to focus on a different part of the game.

Liverpool: Early adopters

At this stage now, we see all 20 Premier League clubs as heavy users of tracking data. When technology is a bit more nascent, there are always clubs that are a bit more eager to lean into that and that’s sometimes dictated by who the staff are or what their resources are.

Liverpool were one of the early ones to invest in this, they were early adopters.

And it’s also one of the ways that technology has really evolved. When Will (Spearman) started at Liverpool, there was probably a number of seconds or even minutes of latency in the data feed.

Now we’re delivering it in less than a second. If you’re looking at the game and then down at an iPad or a device, you can’t discern the difference.

(Spearman has previously spoken about using tracking data to assess pitch control).

It’s one of the great things about football – it’s such a geometric game where space matters so much. It’s one of these great things that we can do. 

We can very precisely measure exactly where every player is, exactly where the ball is, and so, by default, how much space they are occupying, or how much pitch control they might have, depending on how people are moving.

And we can represent that to an analyst. That’s something that, again, a great Manager might get a feel for, but we can actually mathematically measure it now and give you a real-time view of it and a report of how that has changed and evolved over the course of a half or 90 minutes of football.

I always enjoyed visiting Liverpool. I had an engineering background – I went to MIT many years ago – and they had had the most physicists I’d ever seen employed by a football team!

It was really a good fit, because I think the way that a physicist models 3D interactions in the real world actually has a lot of parallels with the type of data we’re trying to capture in a football match. Now we refer to it as spatiotemporal data – it’s data about how physical objects move through space and time.

There are quite a few parallels in that core level of tracking data and the way that physicists model interactions in the real world. 

We look at a club and say, ‘You are the absolute world-leading experts on the game’. It’s our job to show them what’s possible with the technology and they are often the ones that come to us and have some really great ideas about the practical applications they would like. We will help bring them to life in our set of products.

That’s been a really fruitful cycle for us. The data is now able to touch more and more parts of the club and there’s more and more you can do with it.

The Sports Scientists use it to measure physical performance. The Video Analysts use it to automate their video workflows. You have these data science teams now that are managing different parts. The recruiting group, Head Coaches and Assistant Managers, are using this now as well.

Recruitment 

The great part about the system is that once it’s installed, there’s really no limit to the kind of ways you can evaluate a player or a team. You can get precise information about their physical attributes: how fast are they running, their acceleration, their deceleration, how much they fatigue at what points in the match, how they perform under fatigue.

Because you have full 3D body modelled, you can really understand how their leg swings when they are trying to bend a cross in, how high can they get up when they’re trying to head a ball into the back of the net and really understand that dynamic way their body moves.

A lot of this stuff wasn’t possible before. For years, people would do an analysis on how many times a midfielder might have a head swivel or be checking for space as he’s receiving a pass and getting ready to distribute. These become things you can now mathematically model.

You can look at the likelihood that a player is going to complete a pass and how they actually perform against that. I think recruitment is probably the next big frontier here. 

Sharing tracking data between countries and leagues

We have really great coverage across English football, in the Premier League and the Championship.

One of the big things we hope to do is expand the number of leagues where this type of tracking system exists. It was one of the real design principles behind our newest tracking system, where we can go to 28, 38, 50 phones in a Premier League ground, but you can also scale that down.

So if you get to smaller leagues that are a bit more resource constrained, we can still have that same core tracking system deployed. So a lot of what we’re spending our time on now is trying to promote getting this system out to more places where significant football is played.

We think it will democratise a lot of these tools and generally help the whole industry if there’s that level of data sharing. When we started in the Premier League, data wasn’t even shared across all 380 matches.

You would only get access to data for games that you played in. That’s changed now. If you’re a Premier League club, you get access to all 380 matches. And so we’d love to see a similar thing happen between leagues as well, between England and France and Belgium and Denmark (we work in all those leagues), to get a data exchange.

And then, as a club, you can really look across all the different leagues as you’re going through your recruiting process. 

Data is becoming more and more portable and we’re becoming more and more comfortable sharing some (maybe not all!) of this information. And so we certainly support more cross-league sharing of data in the future.

We already know some leagues have frameworks for this. Ultimately, of course, that’s going to be a league-by-league decision, but we think there’s kind of just more value for everybody if that starts to become more of the norm.

I think we will start to see that pretty soon. It won’t be all at once, but I think we’ll start to see that happening a bit more.

Broadcast tracking is an opportunity as a stopgap but it is is never going to be as robust and detailed as having a fixed set of cameras in a stadium.

That’s just based on the kind of physics of occlusion and how many cameras you have, but that is still a better data source in some instances than traditional manual events.

Future of tracking data: mesh tracking

Mesh tracking is fully deployed in the Premier League and allows you to do a couple of really important things.

The first one is for officiating, to really make a call on off-sides. You don’t want to approximate where a player’s centre of mass is, or their shoulder joint is, you need the full surface of a human body mapped out with thousands and thousands of data points per frame.

On a really tight call that’s coming down to one centimetre, you need to know with certainty the moment the foot struck the ball, the precise kick point, and then exactly where the curvature of the shoulder or thigh or whatever part of the body is being evaluated in is with regards to the offside line.

Premier League players are really impressive athletes, but they have different body shapes, and so being able to get that level of resolution is very important. It has allowed us to really push into the officiating space where we can have the most accurate calls and again, do it in real time, so you’re not interrupting the game for a fan.

The second thing I expect is this world of 3D recreations of football for fans and for coaches. If a phenomenal goal is scored, we’re used to having a replay before the next kick-off, a couple of different angles from wherever coverage you have in the stadium.

With these really realistic 3D recreations, you could instantly go into a first-person view to see what it felt like to be Hojlund as he was scoring a goal, or you could go from the keeper’s view or travel the path as the ball. You can now watch and experience a game in this 3D environment that’s hyper realistic.

We’re already seeing it with football clubs: where if you’re trying to coach an athlete on what they were doing wrong or a different decision they could be making, it’s often much more effective to go into the 3D world and show them what it felt like to be on the pitch.

While we all watch the game from that midfield camera-one view, that’s not the way a player experiences the game at all. So if you can actually show them and speak to them in their own language – it’s a really potent tool.

It’s easy to critique from that overhead view, but that’s not what the players see. The better you can show a player what they were actually experiencing on the pitch, the better you’re going to be able to coach them. And the same thing goes for a referee too, right?

Sometimes it’s really instructive to put yourself in the position of a linesman who’s trying to call a close offside on the other side of the pitch and give a bit of perspective of how challenging it is to make that call in real time. 

You think about how tracking evolved – one data point on each player 25 times a second, and then maybe 25 or 30 skeletal points, maybe 50 times a second.

That’s a pretty big leap. Now you’re going up to 10,000 data points per player, hundreds of times a second. And so the volume of data is just scaling up. And frankly, not every club needs that level of data or is going to instantly get value out of it.

Different clubs, different approaches

A club has more data to consume and play with if they want to go down that route. But we’re still going to package that data up into products that are easy to consume, so you, as the Data Analyst, get more value, more options, but don’t necessarily need to process all that data.

The 3D recreation is a great example. We produce those in our software tool. And so an analyst can just go around and navigate the 3D world. They don’t have to worry necessarily about the billions of data points that went into creating that. 

We can also give you tactical data points about how the physical bodies are moving in kind of human language rather than having to parse through, you know, all these representations of again, an ankle rotating through space.

And so, for the clubs that want to go down and do it on their own, great, they can have at it. And for the clubs that maybe that’s a bit overwhelming for, we can help them turn it into kind of products and services that are more powerful than they were before. 

They both give you more precision, they give you a better representation of what’s going on in the football world and then that leads to the possibility to do more with it. 

If I go from knowing I’m a dot in two-dimensional space to actually knowing the full context of how my body’s moving, there’s just more you can analyse.

I could maybe look at a space when I was 2D world. But you can now get down to the technique of how you’re planting your foot and how you’re running and how you’re rotating your body – all these things that matter to the game. 

Potential for competitive advantage

Then it creates opportunity for bigger staffs. We definitely saw that happen in the NBA. When we started in the NBA, two or three clubs had one or two people who were working on this in the background.

Now you have, across all 30 NBA teams, groups of 10, 15 people who are doing this every day. The other way you can think about this is it kind of creates a new area for competitive advantage.

If you are a club that is not at the top of the table or doesn’t have the biggest budget, but you want to get really smart about how you deploy technology across your organisation, it gives you a new place where you can try to find a competitive advantage.

And I think, particularly in football, there’s still a lot of opportunity there. It is such a dynamic and flowing game that I think there’s a lot more potential of how data can have an impact and evaluate what’s going on. 

It’s one of the reasons we’re really excited about the future value we can hopefully bring to the ecosystem. 

We love to chat about this stuff and are really trying to promote the deployment of technology and this exciting future all across sport. Folks can always reach out to me directly at michael.dauria@geniussports.com

Dominic Jordan: Chief Data Officer at Twelve Football and former Director of Data at Manchester United

Difference between event and tracking data

Dominic Jordan: Event data is probably the simplest form of data you can collect about a football match, beyond the scores, scorers and who got red cards. You can think of it as being the kind of information you would hear if you were listening to football match on the radio: who passed the ball to who, whereabouts it is on the pitch, who tackled who and so on. 

So you can build up this picture of the ball moving, progressing to the other end of the pitch. Event data describes the motion of the ball and the players who interact with it.

Tracking data is a step beyond that. The current version of tracking data that typically most clubs would have access to is like the version of football manager where you could see the teams and there was a top view of the pitch and you saw the blobs moving around on the pitch and the two colours representing the two teams.

The best form of tracking data will have a position on the pitch typically for every frame of broadcast or of a camera capturing – so 40 to 50 times per second – the position of every player on the pitch. You can translate that into a top down visual of the pitch which allows you to look at the spatial and temporal movement, so movement of players across space as time moves forward throughout the match and the ball and the referee and typically and sometimes even the assistant referees as well.

I think there’s still a lot of value to be extracted from event data, I don’t think that by any means the value has been extracted. But the reason why tracking data is more valuable to clubs is that you can extract a lot more granular information. 

There’s only about 3% of a match when an individual player is touching the ball, so it’s what you do in the other 97% that matters. With tracking data, you can get a more rounded view of what each individual player is doing when they’re not on the ball and how they’re contributing to the team, but also how the team is interacting.

So creating space, how the opposition is trying to limit the amount of penetration that one team can make or exploit the weaknesses in the opposition team. So you can get a lot richer information, but it comes at a cost which is a very, very much larger amount of information per match.

So event data, somewhere between 2 to 3,000 rows of information maybe per match, depending on the provider, but tracking data, you’ve got every camera frame of every second of the match.

So that’s 40 to 50 frames per second. So for every player you have a significantly higher amount of information and therefore if you’re going to collect and process that within your club, you need a different set of skills which typically don’t exist in many football clubs.

If you’re going to do this within a club, you would need a data engineering capability.

Data engineering is the skillset which interacts with third party data providers, like Genius, building the systems that ingest the data into a local place where analysts can utilise it.

And you will need people who are skilled in storing that data, make sure their data is available and stored correctly, clean and available to be used. And then you have data scientists on top of those who actually manipulate the raw data, that X,Y tracking data on its own.

If you take event data, you can read it as a story: player A pass to player B, player C pass to player D and so on. With tracking data, you have XY co-ordinates on the pitch.

That’s just a long list per player per frame and that’s not really interpretable on its own. There’s a significant amount of work to be done to turn that into something which, actually, a coach or even or an analyst can make a decision from. 

The different types of tracking data

We’re focusing on optical tracking in this podcast but there are other kinds as well. 

The smallest is wearable technology strapped onto each individual player that collects not just X,Y positions but other information about the player as well.

For that you would only get the data for your own team. That’s used for performance load monitoring. 

The next level is the optical or in-stadium collection that Genius do, which is cameras or other kind of collection sensors that are installed in-stadium and multiple cameras or multiple sensors within those stadiums.

So you get a 360-degree view of the pitch. For that, you would get both teams and often, but not always, depending on the league you’re working with, you might have a sharing agreement, with one company responsible for collecting and distributing the tracking information for the whole league.

That is fairly common in the bigger leagues but not in other leagues. That information, that data, is very high quality, because it has these multiple cameras. And there are very good techniques for distinguishing between players and the location of players on the pitch.

But you wouldn’t necessarily get a team from other competitions. And if you were scouting players from other competitions, you wouldn’t necessarily have that same level of information.

And so there’s a third level, which is very broad, which is data that’s derived from broadcast technology.

There are computer systems that are capable of extracting not only who the individual players are on the pitch but where they are on the pitch. There are very significant problems with deriving that from one camera in lots of different stadiums under lots of different conditions.

So what you get is a lot more information from a lot more leagues, but the quality is variable because of the camera quality. The camera positions are difficult, the lighting conditions can be difficult and you don’t really have any kind of quality control on it.

And also the cameras pan back and forth. They’re not trying to capture every player on the pitch. So while you get a much broader view of the football pyramid., it’s not the same qualities you have from a company like Genius, whose technology is designed to have the highest accuracy within a fixed environment.

How do clubs use tracking data?

If we’re talking about optical tracking data specifically, there are three main ways in which clubs are using this data.

The first – and the one that is probably most common within clubs – is within tactical analysis. This can be both pre-match and post-match analysis. Examples of what you can get from tracking data would be identifying the opposition shape in different phases of the game.

So how the team sets up in build-up situations and how they try and progress the ball forward. How they move their players around the pitch and manipulate the opposition to try and progress the ball. 

Obviously that’s possible to do from an opposition analyst watching games, but typically opposition analysts will watch maybe three games, maybe a handful more, of a team and they will use that to derive their tactical plan.

Having access to ALL of the games that a manager has played with different set-ups gives you a much richer dataset that you can look at for examples of a particular pattern, for example. You can look at shape, you can look at build-up, you can look at repeatable patterns of play.

Sometimes, if clubs are sophisticated, they will use it to try and identify how the opposition might try and exploit their weaknesses or manipulate them into weaknesses.

Post-match you can derive better metrics. So, typically, teams would have post-match analysis that would involve things like xG or xT – mathematical models that describe how the game played out. With tracking data, you can have slightly more sophisticated metrics.

You can look at things like the team shape in different build-up phases and whether or not players stuck to the plan you’d asked them to stick to. You can look at the distance between the back and the front line for example, or the horizontal width across the pitch that a team employs, so you can help to teach players by giving them examples after the match of places in which they didn’t follow the tactical instructions.

And the third way, from a tactical point of view, is in set pieces. This is actually fairly common, to look at how teams set up set pieces and how they try and manipulate those situations. So set pieces can be corners of course, and free kicks.

Increasingly, the definition of a set piece is extended to any situation where the situation is under complete control of one team. Not how the opposition is set up, but how YOU choose to set up. Obviously that’s a corner of free kick, but throw ins is the same. Even kick-offs I would consider a set piece. 

And you see some clubs who are more adept at exploiting this kind of thing, figuring out ways to potentially create chances very quickly from those situations.

How does tracking data come through to a team?

Not every club will have access to in-match tracking data. But most of the companies that produce the data are now capable of turning the camera feed into digital format fairly quickly within less than a minute.

It really depends on the kind of company you’re working with. But typically, what the tracking data provider would do is to provide a message queue. You can subscribe as a club to that message queue and then receive the data in a stream.

This is the same as any kind of streaming information that is more common in other businesses. There are plenty of external businesses that apply this kind of thinking to collect information more or less in real time. Then that data is stored in your own internal systems.

If you have the capability, you could do modelling on that data in more or less real time and potentially in matches. There are practical challenges with doing that.

That whole process requires some fairly sophisticated data engineering technology. If you’re lucky enough to be working at Liverpool, then you’ll have mature integration of data science into your workflows.

That just doesn’t exist at lots of clubs and there might not even really be a distinction between an analyst and a coach at clubs. So you see a lot more often – in fact there was a job offered by Burnley recently which actually was very explicit that they wanted somebody who not only could manipulate and process data, but also could contribute to the coaching side.

I think those lines will be blurred significantly over time as younger coaches come through having had more experience of this kind of technology. The third party data that’s created, somebody has to set up the system to collect and manipulate it.

The data scientists, they’re not working in real time. They will write code that manipulates that data into information that the analyst can interpret or the coaches can interpret. And so the job of a data scientist in that context is to understand both parts of the equation.

So what the data is and what it means and what it can be used for, but also what decisions the analyst or coach are going to make from having intelligence that’s derived from that information and bridging the gap between those two.

Lots of tracking companies, they are producing not just the XY data, but they’re providing quite a bit of that data science. Let’s say you might have heard the term pitch control model.

This essentially divides the pitch up into zones which are under control by one player. Every part of the pitch is reachable by one player quickest,depending on where they are on the pitch and which way they’re running.

So you can split up the pitch into different zones that would tell you which parts of the pitch are under control of individual players. And those parts of the pitch typically then are targets for passes. 

Progression of the ball into those places obviously makes sense, because your player is more likely to reach the ball than the opposition player.

So data science would would work out that kind of model, then would need to produce something that’s useful, because that on its own isn’t necessarily useful. Another example would be the X pass model, right?

This is where tracking data is capable of giving you what in data science terms we call the counterfactual. So the thing that didn’t happen. If you imagine a situation where a player on the ball has two or three options for passes and chooses the wrong one.

With tracking data, you can derive models that say, ‘had that player been capable of making that pass, how dangerous would that have been in terms of the team’s ability to go on and score a goal?’ 

And so you can create models that say how difficult is each of the passes that was available to the player?

And how dangerous would the situation have been had they been able to make it? And you can use that information to kind of find better players, but also to, you know, potentially for player improvements. Okay, so those models are quite sophisticated and require quite a lot of data science work.

Even then the output isn’t necessarily going to be directly interpretable by an analyst or coach. And so, at a bigger club, maybe you have a decision science team whose job is not to collect the data and store it and clean it, not to build these sophisticated models that have to work on very large data sets, but to produce the information that is usable by coaches and analysts.

And that’s a kind of a third role. If you’re talking about a smaller club, well, good luck with that! You’ll probably get one person who’s doing all of those things and they might also be the analyst as well. So I think there are going to be a lot more opportunities for people with data skillsets in the future who have that domain knowledge and who can straddle both those kinds of roles.

That kind of process is exactly what you’d find in a business. That kind of process of getting data from outside the building into the hands of the decision-makers in a usable format is pretty mature in football clubs, much less.

Experience at Manchester United and coaches who are comfortable with data

I can’t really talk about what happened at Manchester United too much, but Eric (ten Hag) in particular was very, very open (to data) and he’s just got a new job at a club that has a very mature data function. He was very open to using data and to the benefits it could bring.

There are very clear examples where the information which we were able to provide to him as a data science team team actually led to very significant victories. So yeah, I think it’s very clear that data contributed significantly to the successes that we had there.

If you don’t use data, then you will be exploitable by people who ARE using it. I have only been working with coaches who have wanted to learn more and to understand what the benefit is.

I have been lucky to work in football, but I’ve also been lucky to work in other businesses where I encountered similar problems of companies that had done things one way. And you have to work hard as a data scientist to show the value and you have to show the issues with not paying attention to this sort of thing.

I can’t really say what it would be like trying to work with a coach who didn’t use data, but I can say that those coaches will be far less involved in the game in the future because it’s just coming, this is happening, it HAS happened. You know, the accessibility of data down to, well, essentially all leagues, to almost every professional men’s league, to almost every professional women’s league, starting to cover almost every aspect of youth football.

Technology is only going to get more sophisticated, as is the ability to collect data. Genius are now providing full body tracking, which you’re going to be able to derive more kinds of information from. Ignore that at your peril, because that’s a revolution that’s already happened. You absolutely should not be scared of it, because it’s not coming to take people’s jobs, it’s not coming to take an analyst’s job.

The best use of data, in any business, does the heavy lifting to allow the experts to do their jobs more efficiently. It can find examples of exploitable patterns for you, it can find where players need to understand that they are not contributing to the team effort.

It can show you where players are running out of steam and you need to make your changes earlier. It can tell you how you might want to change your tactical settings. But it’s not going to make those decisions for you, in the near term at least.

It’s not taking anyone’s job. It’s just doing a lot of work, fairly dull, tedious work at scale that is useful to you if you embrace it in the right way. That’s a great place to finish.