TGG Podcast #46 - David Sumpter: Curiosity-based approach to analytics
Written by Training Ground Guru — February 13, 2023
PROFESSOR David Sumpter is a pioneer of the application of data science in football and has worked with leading teams including Ajax, Barcelona, Hammarby and England.
In Episode #46 of the Training Ground Guru Podcast, in association with Hudl, he gave insights into this work and predictions on the future of football analytics.
You can listen to the podcast via the player below and read and edited transcript after that.
STUDYING FISH AND ALLYING MATHS WITH FOOTBALL
Professor David Sumpter: I spent 10 years of my life studying the mathematics of fish.
The idea really is this: fish move in tens of thousands, but they only look at two or three individuals around them. As long as they know the positions of those around them they can move in unison in a very effective way.
It’s the same thing in football: you can’t know where every player is, but if you know the rough formation and where your position is on the pitch then you can start to build up those relationships and understand the game.
Footballers and fish have a lot in common, which means the same mathematical models allow you to understand the collective motion of the players.
I suppose the first real coupling between maths and football came with Matthew Benham’s work on gambling - he is a bit of a hero of mine. He started looking at the factors that predict the outcome of a match; how you model the odds and so on. That definitely developed at Brentford and Brighton and FC Midtylland, where they started to employ mathematicians to do that.
After my book (Soccermatics) came out in 2016, a couple of Premier League clubs flew over to Uppsala and said, ‘We want to know more about this.’ I was training my son’s team at the time and I could take these Premier League people to go and watch them play.
Lee Mooney, who was working with Manchester City at the time, came and visited me and I took him to one of my son’s trainings. He actually came and played with the boys and asked them, ‘What teams do you support?’ City wasn’t one of them and he asked why. My son put his hand up and said, ‘Because they’re a bunch of sell-outs,’ which I was a bit embarrassed about!
FULL-TIME AT HAMMARBY
To do analytics properly, you have to be really involved with the team. So what Hammarby did was to give me access to ALL parts of the club. We sat down with the manager, chairman and the Sporting Director, who said, ‘Here is David. Allow him to go everywhere and do what he wants and be involved with things.’
Stefan Billborn, the manager, was very welcoming and allowed me to go to training. I could collect the balls and get a feeling and sound out different people. I talked to the assistant a lot and we started to get an idea about what might be interesting for them to know from mathematics.
I think you have to do it that way. I worked 50% there for two years and found out how it actually works on the ground. I really picked up the language (of football), the feeling. After a while we were doing certain data analyses and then I would show them to the players and they would give me feedback. There would be a back and forth and that was when I could really get going.
That’s when I could say I properly understood and was part of the football world.
The thing I am most proud of is that we changed some of our tactics based on suggestions I made. Billborn wanted to play an attacking style of football. He wanted to be up in the final third all the time putting pressure on and he also wanted to win the ball back quickly whenever they lost it.
We came up with this idea together, using a technique called pitch control, called ‘the cross’. Basically, you have five players up in the final third attacking and the other five form a cross behind them that occupies as much space as possible.
It’s important to have your full-backs come in and take their place in the cross. You want this compact shape in the middle, you don’t want your wingers or full-backs too wide out.
We could plot that out and he (Billborn) could show it to the players and we could use tracking data to see how they were positioned. That worked very well and we had an extremely successful season where we scored three or four goals in a few matches, while also keeping this shape and winning the ball back.
David Sumpter is a Professor at the University of Uppsala in Sweden and the co-founder of the Twelve Football data science consultancy. He has worked with leading football clubs and federations and authored a number of books, including Soccermatics (2016), Outnumbered (2018) and The Ten Equations that Rule the World (2020) and had more than 100 scientific articles published, on subjects as diverse as social psychology, machine learning and artificial intelligence. His talks at Google, TedX and The Royal Institution are available online.
We were using Javier Fernandez at Barcelona’s code to do this, because he had written a programme that calculated it. The basic idea is in Soccermatics. There I used a Voronoi diagram, which you could say is the first version of pitch control.
A lot of these ideas are mathematically natural. Then they get reinvented by lots of different people as they put them into practice.
For example, Sarah Rudd did an analysis of Jordan Henderson in 2011 and showed he ranked very highly on expected threat and Liverpool did a very smart buy of him at this time. I think Ian Graham calls expected threat 'goals added’. If they are thinking of buying players, they think, 'How much is this player going to add in terms of goals added during the season?'
When I was working every week with Hammarby, I was also writing the blog for their website. This was great, because my idea is that you should communicate openly about analytics at every level - so the board are seeing things and so too are the players, fans, the Sporting Director. They might be seeing slightly different things, but it’s the same story.
We’re not going to tell the fans who we’re scouting, but we can tell them about the performance of the players, who is creating space, who is creating x-threat. Those type of things you can communicate to everybody.
Unfortunately, I eventually had to go back to my real job, so I’m not out on the pitch every day any more. I go there (Hammarby) about three times a year and we also have a lot of meetings, mainly on WhatsApp. I have a very good relationship with the women’s coach and we have a lot of chats, about tactical things, about their performance and also scouting players.
Through my company Twelve we have an ongoing collaboration and I love the way that it’s every part of Hammarby we are involved in. I have relationships with the chairman, the Sporting Director, the coaches of the different teams. There is a very good analyst for the men’s team called Abel (Lorincz), who I’ve learned a lot from, and I even get to go on Hammarby Fan TV - they love the stats!
We packaged up the stuff I learned during those years with Hammarby and turned it into an approach to football that we think we can apply to different clubs, especially with the philosophy that every level of the club can be interested in. We work with fan experience, performance and also with the scouting.
We work with a big Premier League club, one in Spain and are also starting a relationship with a club in the Bundlesiga. We also work with some smaller clubs and you can think of it like a consultancy, but also delivering a platform where they can analyse their own data. That has been an amazing journey.
It seems that most clubs are organised in the same way - from a small club, like Hammarby, to the biggest clubs in the world. There are the same sorts of tensions going on.
When we work with clubs we’re thinking what style of football do the team want to play. The next thing is the KPIs, the metrics you want to measure about that. For that, we try and get clubs to write down five or six things that are really important, in words. Then we turn that into a number and then we make it into a visualisation.
Everybody in the club should be building into that. And you can, I think, put that together with analytics.
At Hammarby, we have a consistent idea of how we want to play the game. We want to play possession-based football, attacking football. Not only because we want to entertain, but also because we are never going to be the richest club in the world and we want to play football which develops the players.
AJAX AND BUILDING A MODEL OF FOOTBALL
When you have a business relationship and they’re paying you to help them out with things it’s a bit more tricky to name them, but also I continue to do research projects and there are two big collaborators currently.
The first is with Mirjam Bruinsma in Ajax and her boss Vosse de Boode. We have several research projects together. What’s been really good has been looking at the rules of motion of the players, which goes back to where we started and the fish.
So what cues does each player use when they do a run? How do they open up space? How do they co-ordinate and move as a group together? All of this is inspired by the (Johan) Cruyff, Ajax way of thinking.
There is a lot of hype about artificial intelligence and there is a paper by Google where they predict the movement of players using a very advanced AI machine. They have used this to solve chess, so you can find the next move. We were interested in this idea. Is football like chess? Can the methods they have been used to solving also be used in football?
And the answer we have found is no. A footballer’s movement is reasonably simple. For the most part, most of the time a footballer is following the mean position of their team and following the ball around. So we could actually find a very simple model that described the baseline movement of the players as following the average position of their team-mates and the position of the ball.
It is saying here is the baseline and then we can start to build on variations of that. One of the things we have been thinking about at Ajax is if you do see a deviation from this pattern, what is this player doing? This can be very useful for a tool for the team, because you can have the tracking data for the whole match, look to see if a player is deviating away regularly from this following around pattern and this is the player you want to watch.
You can see have they been given a special role and start to identify individuals who might have a particular tole in the team.
What is amazing at Ajax - and a lot of this is down to Vosse, who runs the research group there - is they are doing basic research. They do a lot on free kicks, penalties, how you kick the ball, different things on training. We want to build up a full model of football that allows us to do the more complicated things.
What we have found is that in some ways analytics has skipped over some of those basics and we are trying to put them into place.
ENGLAND: BRINGING ANALYTICS INTO EVERYTHING THEY DO
With the English FA, I have a PhD student there who I am supervising. He is a big part of their team in terms of delivering things during the World Cup, but also we are doing a research project together looking at so many different things: looking at scanning behaviour, also looking at x-threat models and how you should play in different types of situations.
For me as a researcher it’s fascinating, because you have a genuinely difficult mathematical computing problem plus you have something which is going to be applied, which England might use in the World Cup, which makes it really fascinating.
The Football Association fund that PhD and are doing lots of things in the right direction towards bringing data analysis into everything that they do. I think we will see more and more of that. They have a big group of researchers, people working with the men’s and women’s teams, doing these types of analyses. I have met and talked to a few of these and am very impressed by the level.
Everything is very integrated, very down to earth, and the coaches, at all levels, both men’s and women’s, always take the input of the data scientists and analysts and the analysts have a very high level.
I’ve had contact with Emily Angwin (England Women's Data Analyst) and looked a little bit at how she works, and Mark Carter (Women's Player Insights Lead) as well - they work extremely analytically and I think they are very appreciated by the coaches too.
UNTAPPED POTENTIAL OF TRACKING DATA
The stuff we did at Hammarby is still a lot more advanced than what is being done at some very big clubs now. There is still so much that they can use this tracking data for; I don’t think it’s been exploited yet.
Just to take some examples. We have this example from a Sporting Director, that you can’t measure attitude with analytics. We came up with this idea of off-the-ball runs. If you keep making these runs throughout a match, then in a sense that is showing very good attitude. We made a metric that measured these - during a match, do players continue to make these runs? And we found that Firminho was topping this metric.
We ended up calling this the Firminho metric. You can also think about tracking back.
We made another one. Gary Neville said you can’t measure the look on their faces when they go a goal down. Ok, that’s probably true. But you can measure how they play after they go a goal down. Does a player perform better or worse? We called this the Gary Neville metric.
And we could see that certain players really lifted their game when their team went a goal down and others didn’t.
I don’t really believe so much in using these things live. You do your work during the week. You would do a tracking data analysis of your opposition. I don’t think you can do that big tactical manipulation during the match to actually change things a lot. It’s before you start that you have to get things right.
WHY ANALYTICS WILL NEVER REPLACE HUMAN EXPERTISE
When Google Deep Mind said they were going to start working on football, that’s when we thought we really need to take this back to basics and work out what the basic structure of football is. I don’t think you can have a computerised manager, its never going to be that way. That’s the AI bullshit part of it.
It’s very important to stress the role. clubs need to have the role, but in the same way you need to have a physio and the doctor and the person with the kit. You need this support around you and one of these people - or a group of them - is the data scientist.
It’s not that the data scientist is going to write an AI that is going to replace the manager, it’s one part of that important team that is built up around the players. It’s the players who decide what the outcome is and it’s those people you work with and support and pretty much nothing else matters.
It’s getting those people out on the pitch and performing at the top level.
We are also thinking about fan experience and wrote a football chatbot a few years ago. There are a few scenarios here. Imagine you’ve got a chat with your mates on WhatsApp and invite this bot in and it can enrich the conversation a bit, make a comment about, 'This happened in this year,' and so on.
I think those types of applications will become more and more common. You can tell it to shut up if you don’t want to hear from it any more. I think it will affect the game itself a little less.
Injury prevention in particular is one of the things that can annoy me most. At Hammarby, every time one of these companies came in and wanted to sell something, it was ‘send them over to David,’ and I had to deal with their product.
Most often maybe clubs don’t have the competency to see what they do. And the injury prevention one, in particular, really annoyed me, because they said they could take the data from our 22 players - just them, no-one else’s, which, quite rightly, they can’t use - and predict whether the player was going to get injured, based on some questionnaires they had filled in and various things. And a lot of it just isn’t true.
You can’t make those types of predictions on such a small sample size and you can’t out-perform the physio, either. If you have AI v human contact, the inputs of the experts are so important.
Of course it’s important to measure training load and all those types of things, but the idea you can make predictions, you have to be really careful about that. And it’s important also talking to someone like me, when I’m talking about expected threat or whatever, to be critical about what you can and can’t do with those methods.
A lot of that starts with common sense. Does it sound likely that a piece of software can predict if Kevin De Bruyne is going to be injured next week? No. He might get injured, it’s football, but it's difficult to predict. That common sense leads you to the conclusion that your AI product probably isn’t going to do that.
One brilliant thing about being me - I’m in the academic world and am not part of the industry so much and can be reasonably honest about this. There are so many problems inside the world of football with a lot of money floating around, a lot of corporate entertainment, with people knowing people who are inside the club and buddies giving each other jobs and so there isn’t always a lot of control of getting hold of the best person to do a particular job, especially when it’s an area like AI, where nobody really knows so much about it.
You get in something that sounds convincing and two years later you find, 'Well no that didn’t really work out, did it.' There is a little bit of a problem with that in football. If its worse than other industries I’m not sure, but I think maybe it is.
BESPOKE ANALYTICS: THE NEXT REVOLUTION?
The first revolution is Python, which we did on the xG course. What Streamlit allows you to do is not just output figures in Python, but it allows you to build a web app. So if you’re interested in shot maps of players, first I write my code in Python and then I turn it into this Streamlit app and then I can share it online so you can play around and look at different players, their xG values in different situations.
Instead of delivering a report or interactive web page, you have this whole interactive programme you can deliver to the coaching staff. You can put that together in a day or even half an hour.
That cycle of your style of play generating a number and a visualisation, using streamlit as your tool and python as your programming language, can be completed in a day.
This is where an expert puts it to use and can start to answer questions really quickly.
Everybody wants things delivered in a different way. One manager wants to see directly after the match, within five minutes, a PDF explaining what the performance was.
Some want it for the fans, so we are building a fan entertainment app, and some also for the scouts. Every delivery is different, which is why you have to be flexible, and this is the revolution.