Jeremy Doku and using AI to predict the success of transfers
Written by By Jens Melvang, Joe Gallagher and Daniel Dinsdale — November 27, 2021
RENNES have had some extremely good recruitment in recent years - not least in the right wing position.
Ismaila Sarr joined the Ligue 1 club from Metz for a reported €17m in July 2017 and two years later moved to Watford for £30m. Sarr has made a big impact with the Hornets and is now being linked with moves to elite Champions League sides.
His replacement was Brazilian Raphinha, who was signed from Sporting CP in Portugal and went on to join Leeds United a year later. Again, he has made a major impression in the Premier League.
Next in line was Belgian Jeremy Doku, who could turn out to be the best of the three, which is really saying something. Doku joined from Anderlecht for a reported €26m in October 2020 and is now being linked with big-money moves to sides like Barcelona, Liverpool and Tottenham.
Signing players from foreign leagues is notoriously difficult, but the more context you can add to describe a player’s environment, the better your prediction will be.
This is what we’ve been working on at Stats Perform for the last three years: using Artificial Intelligence (AI) to add context to try and predict the success of transfers.
Could these AI models have predicted the success of Doku’s move from Anderlecht, as Rennes’ recruitment team were able to do? Let's see.
We've seen a dramatic increase in both the value of transfer fees and the number of people hired to try and minimise the risk of these transfers.
However, a lot of signings are still not successful. When we speak to Sporting Directors and recruitment staff about the reasons for this, they often tell us they are able to accurately evaluate tangible things such as physical and technical attributes.
What they often struggle to evaluate is context, namely identifying the strength and adversity of the league in which a player is competing and the style of play there.
The more context you are able to add to describe a player’s environment, the better your prediction about the success of a potential transfer will be. You need to identify:
- Role: The skillset of the player and whether it's needed in your team. We talked to more than 50 clubs around the world and they told us they needed a more nuanced way of segmenting players than just their position. This means their characteristics, skillset and qualities.
- Style: Of your team. Will the new player be able to fit into that? What is the gap between the selling and buying club in terms of style of play? Is the style explicitly described, are the scouts aware of it and can it be measured? Obviously you want to spend money on the roles that are particularly important to your playing style. Most Sporting Directors don’t use numbers to describe players, they use similarity to other players. So what is important is that we link players together and use them as search points. Then we can look at similarities.
- Adaptation: Will the player be able to adapt to their new team and league? What is the gap between where they are playing currently and where they are going? The bigger the gap, offensively and defensively, the more the player will have to adapt and the more risk there will be to the transfer. We work with importance filters, because certain KPIs are more important to some teams than others. We can look at players from similar clubs who have moved to the same league to try and identify risk.
- Team: How will the incoming player impact the performance of the other players and the team as a whole?
Once these points have been identified we are able to give a good prediction about the success of the transfer, because we have a good description of the context and with it the expected performance.
Role Discovery is a way for us to assess a player’s role away from traditional positional labels. Players can have really different qualities and be asked to do very different things, even if they have the same positional label. Think, for example, of Trent Alexander-Arnold compared to Matty Cash; they are both right-backs but are very different players.
We use metrics from several different Stats Perform models for this role discovery: Possession Value, which models the probability of the team scoring in the next 10 seconds and how this increases or decreases depending on each player action; Expected Pass Completion, which is a measure of pass risk; Movement Chains, which are sequences of three or more team passes; Playing Styles, describing the style of each team possession; Heat Maps, which are spacial descriptions of passes made and received to show where players operate on the pitch.
We take more than 70 different features and use a Gaussian Mixture model to learn 25 unique player roles. We can then reduce this into a 2D representation. We have quantitative and qualitative descriptions of all these roles that enable us to compare potential targets in a more meaningful way.
We can filter by specific roles, by competitions (the 10 big European leagues) and by any number of KPIs, like attacking threat. We can look into playing styles and compare to the average player in his role.
Raphinha crosses a lot less than the average player in his role of attacking wide dribbler. Doku is in the role of attacking wide threat and crosses a lot more than the average player in this role - and more than Raphinha.
This is the playing style radar for Rennes and shows they have a lot of fast tempo sequences. We can also build an out-of-possession similarity value. Then we can compare the two specific teams and overlay them on top of each other (image above).
We need to look at the ability to change in any given transfer.
Our Stats Perform Power Rankings help a lot with this. They give a single ability rating score for any team across the world. We run the algorithm from 1990 to the present day across 195 countries and 423 leagues.
There are almost three million games in total, allowing us to assign an ability score on any given day to any team around the world. For example, on October 22nd, Bayern Munich were top with a ranking of 100.
This wide net provides us with a lot of data to train our models internally across any league we can get event data for; we are also hoping to use it as a product in itself.
Using the Power Rankings, we can see that Anderlecht were ranked higher than Rennes from 2015 to mid-2018, but that after that the French side were in the ascendancy. At the date of Doku’s transfer, Rennes were actually the better team quite considerably.
We can also apply the Power Rankings to leagues, showing that Ligue 1, overall, is stronger than the Jupiler Pro League.
Now we need to bring everything together, the style and the substance, to predict future performance. We’re going to do this in our Transfer Portal Model.
To do this we use a multi-head neural network model that predicts 19 metrics that are aggregated per 90 minutes - shots per 90, xG per 90, passes in the final third per 90 and so on.
We can see that the average winger at Anderlecht has around 17 passes per 90; at Rennes it’s 14.7. The xG per 90 was 25% higher at Rennes at the point of the transfer and Anderlecht also relied far more heavily on their wingers for creating chances per 90 than the Ligue 1 team.
But we need context to tell us whether these values are actually good or bad, which is why we use swarm plots.
First of all, expected assists (xA). We can see that Doku is in the 96th percentile for xA and is actually the best in the entire league for take-ons per 90. At Rennes, a stronger team, we predicted he would still be in the 81st percentile for xA.
In his first season at Rennes, he actually went on to get 0.18 xA per 90, which we predicted pretty accurately.
So we managed to predict that it would be a step up to a better team and league with Rennes, but that it was still a good fit for Doku and that he would be able to perform to a high standard. That’s what’s happened in reality and he is now destined for a move to one of the biggest clubs in Europe, whoever that may be.
However, we haven’t always got these predictions right. We did, for example, underestimate Raphinha’s predicted performance at Leeds United last season and this is why.
His xG and xA were very, very high in Ligue 1. We then used a feature called adjustment models (above) to predict how Leeds, who had just been promoted, would play in the Premier League and how they would use their wingers.
We predicted that Raphinha’s xA in the top flight would be 0.13, whereas in reality it was way up, at 0.18, which put him in the 90th percentile for players in his role in the Premier League.
This was because our models underestimated the performance of Leeds in the Premier League. We had them as relegation candidates, whereas in reality they ended up being a top-half team, finishing ninth.
Our next steps with our AI modelling will be to integrate player role. This means that we won’t just predict how a player will perform at a new team but how will they interact with and impact all of the other players in the team.
For this we will use a neural network approach and will have more to tell you about that soon.
- Jens Melvang is a Product Manager for Stats Perform; Joe Gallagher and Daniel Dinsdale are AI Scientists for the company. This article is taken from their presentations at the 2021 Big Data Webinar. To watch all seven presentations click HERE.