Blog of Random Thoughts and Pictures

Player Ranking Framework

April 11th, 2021

As mentioned in Part I of this multipart post, Luca Pappalardo prepared a video, for the Friends of Tracking channel in 2020, to talk about some elements of a paper related to an open Wyscout data set, and advanced statistics related to passing networks, flow centrality and player ranking.

For this post (Part IV) I’m going to cover my take on the PlayeRank framework created by this team of researchers.

I’ve forked the “mapping-match-events-in-Python” repo into my mmoffoot area and created a new branch called ‘englanddata’ to cover the data set of English Premier League information for the 2017-18 season.

An exhaustive description of the PlayeRank framework is available in this paper Pappalardo, Luca, Cintia, Paolo, Ferragina, Paolo, Massucco, Emanuele, Pedreschi, Dino & Giannotti, Fosca (2019) PlayeRank: Data-driven Performance Evaluation and Player Ranking in Soccer via a Machine Learning Approach. ACM Transactions on Intelligent Systems and Technologies 10(5).

This Notebook builds player rankings from match events, the following steps are required:

  • compute feature weights (learning)
  • compute roles (learning)
  • compute performance scores (rating)
  • aggregate performance scores (ranking)

It doesn’t take long to run through the [In] steps of the Notebook and for the English data you end up with Figure 1 as seen below.

Figure 1: Player Ranking English Premier League 2017-2018

The visual output from the Notebook is interactive which is great as you can hover over the points to catch the name. For example in the striker role H. Kane is the outlier (at the top), S. Augero second. There’s even a drop down menu to do a comparison.

Figure 2: Player Ranking Comparison for H. Kane and S. Augero English Premier League 2017-2018

The positions are an interesting element to this ranking systems which is based on a role matrix.

Team Attacking left to right, position 0 is a Striker

And the top players from English Premier League 2017-2018 for each role position

  • Position 0. = H. Kane
  • Position 1. = L. Milivojevic
  • Position 2. = N. Monreal
  • Position 3. = D. Janmaat
  • Position 4. = S. Mane
  • Position 5. = M. Salah
  • Position 6. = N. Otamendi
  • Position 7. = J. Stephens

If I look at the PFA Premier League Team of the Year for 2017-18, Otamendi, Kane and Salah were named in it and also appear here in PlayeRank, but none of the rest. I wonder how D. Silva and K. De Bruyne both of whom are in the PFA team, missed out in this PlayeRank framework.

Overall over 4 posts I can say this is a great Jupyter Notebook, firstly to really learn about Jupyter Notebooks, and secondly to be able to see the structure and how to use WyScout data. It is so important given this data set is used by so many tops clubs for the scouting, analyses and recruitment of players.

I got caught a little on the passing networks, and the flow centrality but certainly a thread of more investigation on a measure of cohesiveness within the team, would be a nice continuation of this topic.

The player ranking and the full explanation of the PlayeRank Framework was fantastic and a joy to read and interact with.

LOI Matchday 4 Predictions

April 8th, 2021

As I go into match day 4, a review of match day 3 reveals that I got, or should I start to say the system got, 1 result correct that was the Sligo Rovers win. While I hedged against them at the weekend it was good to see Waterford FC pick up their first points of the season.

The running tally so far after 14 games is 2 correct predictions, a 14% hit rate so far ….. terrible. Top of a local prediction league is 8 out of 14 so a bit to go there to catch up.

For match day 4 here are the predictions

St. Patricks (47%) vs Derry City (26%)
The Probability of a Draw between St. Patricks and Derry City is 26%

So calling a St. Pats win

Dundalk (62%) vs Bohemians (15%)
The Probability of a Draw between Dundalk and Bohemians is 22%

Dundalk have had a shaky start to the season, but will go for the Dundalk win.

Finn Harps (24%) vs Waterford (49%)
The Probability of a Draw between Finn Harps and Waterford is 27%

Finn Harps are flying it but the system is saying a Waterford win, so will go for the Waterford to win scenario and make up for last week.

Longford (36%) vs Drogheda (38%)
The Probability of a Draw between Longford and Drogheda is 25%

This is very close looks like it could be a draw.

Sligo Rovers (28%) vs Shamrock Rovers (43%)
The Probability of a Draw between Sligo Rovers and Shamrock Rovers is 29%

Dare I say a clash at the top of the table, and will go for a Rovers win …… Shamrock Rovers.

There’s an extra game this week

Derry City (29%) vs Shamrock Rovers (43%)
The Probability of a Draw between Derry City and Shamrock Rovers is 28%

And looks to be another Shamrock Rovers win to really put them top of the table. Now in truth I called this already as a draw a week or so back, but I’m going for Rovers win now.

Passing networks and Flow centrality

April 5th, 2021

As mentioned in Part I of this multipart post, Luca Pappalardo prepared a video, for the Friends of Tracking channel in 2020, to talk about some elements of a paper related to an open Wyscout data set, and advanced statistics related to passing networks, flow centrality and player ranking.

For this post (Part III) I’m going to cover my take on Passing networks and Flow centrality.

I’ve forked the “mapping-match-events-in-Python” repo into my mmoffoot area and created a new branch called ‘englanddata’ to cover the data set of English Premier League information for the 2017-18 season.

Passing networks

This Notebook creates a player passing network for any of the matches covered in the data set. The passing network is a weighted network where nodes are players and weighted edges represent movements of the ball between players. The size of an edge is proportional to the number of passes between the players.

Some finer details on are covered in a research paper Cintia et al., The harsh rule of the goals: data-driven performance indicators for football teams, In Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA’2015), 2015.

Now I wanted to continue with a passing network from the Tottenham Hotspur – Leicester City match (2018), given this was the match showing the events to this point, however it just turned out as a blank image for both teams via the Notebook, I tried another match and only got one team and finally the Arsenal – Burnley match turned out what was being looked for, so I moved over to this match.

Arsenal Passing Network, Arsenal – Burnley, May 13, 2018
Burnley Passing Network, Arsenal – Burnley May 13, 2018

The produced images need some time for study, even a quick look leaves me wondering what I can take from them. I also tried to gain some insight from a related paper from the authors P. Cintia, S. Rinzivillo, and L. Pappalardo, “A network-based approach to evaluate the performance of football teams,” in Proceedings of the Machine Learning and Data Mining for Sports Analytics workshop, ECML/PKDD 2015, were it mentions again that nodes are players, directed edges represent passes between players and the size of an edge is proportional to the number of passes between the players. Node 0 indicates the opponent’s goal, and edges ending in 0 node represent goal attempts. However there are no Node 0 in these passing networks.

The related video at minute 15:00 has a section that describes the passing networks too, but unfortunately there’s still not enough in there to give a hint as to what the take away is.

Flow centrality

Next up is flow centrality, which is a feature that can be computed on the passing network and in this Notebook is described as a way to capture the fraction of times that a player intervenes in those paths that result in a shot. They take into account defensive efficiency by letting each player start a number of paths proportional to the number of balls that he recovers during the match.

This concept is only lightly explained in the paper Duch et al., Quantifying the Performance of Individual Players in a Team Activity, PLoS ONE 5(6): e10937 as referenced in the Notebook and I must admit I still didn’t get it, but then a read of a source paper by Freeman LC, A set of measures of centrality based upon betweenness, Sociometry 40: 35–41, 1977 clears it all up “when a particular person in a group is strategically located on the shortest communication path connecting pairs of others, that person is in a central position”, and that same paper goes on to show measures that define centrality in terms of the degree to which a point falls on the shortest path between others and therefore has a potential for control of communication.

Flow Centrality Burnley 2018, Arsenal – Burnley May 13, 2018

Dare I say Westwood (Burnley) is the betweenness man in control for Burnley.

Might be a follow up idea to classify based on how they measure cohesiveness within the team.

Advanced football visualisations duels on the pitch Italy compared to England

March 27th, 2021

As mentioned in Part I of this multipart post, Luca Pappalardo prepared a video, for the Friends of Tracking channel in 2020, to talk about some elements of a paper related to an open Wyscout data set, and advanced statistics related to passing networks, flow centrality and player ranking.

For this post (Part II) I’m going to cover my take on the match evolution and spatial stats.

I’ve forked the “mapping-match-events-in-Python” repo into my mmoffoot area and created a new branch called ‘englanddata’ to cover the data set of English Premier League information for the 2017-18 season.

Spatial distribution of events

There are tons of events collated in the WyScout API Events from duels to fouls to interruptions, as explained in the WyScout API document. For passes there are 6 types:

  • Pass = Hand pass,
  • Pass = Head pass,
  • Pass = High pass,
  • Pass = Launch,
  • Pass = Simple pass,
  • Pass = Smart pass,

In this Notebook there’s an interesting set of images created which show the distribution of positions per event type. These kernel density plots show the distribution of the events’ positions during the match with the darker the green representing the higher number of events in a specific zone of the field.

Figure 1: Move the slider to compare Italian Serie A Duels and English Premier League Duels 2017-18

The first image is of duels, and in the WyScout world “Duel” has a specific meaning,

A challenge between two players to gain control of the ball, progress with the ball or change its direction.

With a number of subtypes to consider too: Defensive duel, Offensive duel, Aerial duel, Loose ball duel, and Sliding tackle

For the moment I’m not going into the whys and wherefores of these subtypes, but it’s really interesting to review and compare the images and to see the difference of where the Italian league and the English league host their duels. Dare I say right-backs, left-backs and right wingers, left wingers should look closer if they are moving between the leagues.

A big summer move from England to Italy was Emre Can from Liverpool to Juventus, with Stephan Lichtsteiner coming in from Juventus to Arsenal maybe a view of these plots before the new 2018-19 season got underway might have been handy.

Of note there’s a 10,000 event sample size in here by default, so for the Italian & English league this represents about 6 matches worth of events, and so a larger sample size would be nice to see and compare against. Would also be nice to identify specific players (RB,LB and RW, LW) that were strong in those main duel locations, however that will have to be for another day.

Here are where the fouls happen.

Figure 2: Move slider to compare Italian Serie A Fouls and English Premier League Fouls 2017-18

And the shots.

Figure 3: Move the slide to compare Italian Serie A shots and English Premier League shots 2017-18

Intra-match evolution of the events

Goals are the main stay of football and so when looking at the English and Italian leagues (season 2017-18), its pleasing to see the difference between the leagues, especially the 1st half goals.

Yellow cards and red cards are covered in the data set too, and displayed in the Jupyter Notebook but I’ll be honest and say I didn’t take too much time to analyse the results here, because I was fascinated by the Duel plots.

LOI Matchday 2 Predictions

March 25th, 2021

Here’s another go at the match day predictions, using past match results

For match day 2 in the League of Ireland Premier division

Dundalk (76%) vs Finn Harps (6%)
Draw (13%)

Calling a Dundalk win here

Waterford (43%) vs Sligo Rovers (30%)
Draw (27%)

Calling a drawn match

Bohemians (62%) vs Longford Town (15%)
Draw (22%)

Calling a Bohemians win

St. Patricks (62%) vs Drogheda (15%)
Draw (21%)

Calling a St. Patricks win

Derry City (29%) vs Shamrock Rovers (43%)
Draw (28%)

Calling a drawn match here once it eventually gets played.

Advanced football visualisations and data analysis of match events

March 22nd, 2021

Luca Pappalardo an author of the paper (PCR2019) Pappalardo, L., Cintia, P., Rossi, A. et al. A public data set of spatio-temporal match events in soccer competitions. Nature Scientific Data 6, 236 (2019) prepared a video, for the Friends of Tracking channel in 2020, to talk about some elements of this paper and the related Wyscout data set, which was used for the paper.

In this video Luca covers:

  • The Wyscout data set, how it is collected, from players to events.
  • Basic statistics on events and distributions.
  • Plotting events on the field, match evolution and spatial stats.
  • Advanced statistics: passing networks, flow centrality and playerRank

For this blog post I’m going to cover my take on the player events in the Wyscout data set and the display of some basic statistics on events and distributions.

The origin of the code is available on Github under the project “mapping-match-events-in-Python” and worked through in this video.

Set up

Then I created a new branch called ‘englanddata’ in this area to cover the changes I made.

The example code base uses the Italian league data, but the branch name might be a give away, seeing as the data set has English Premier League information for the 2017-18 season I wanted to run the code base against that data set, and so I took a copy of the original Jupyter Notebook and ran it against the English data as data_england_exploration.ipynb.

The full list of data available includes:

  • Italian first division 2017-18
  • English first division 2017-18
  • Spanish first division 2017-18
  • French first division 2017-18
  • German first division 2017-18
  • European Championship 2017-18
  • World Cup 2018

All the matches, events, players, and competition data sets are hosted in a figshare repository with all the data stored in a JSON format.

The way the data is collected is explained in the paper, with a nice visual representation in the Notebook so I won’t ruin that insight and will let you read it in there.

I should say a quick word on Jupyter Notebooks, its an interactive way of developing and presenting data science projects, and I can really see that it’s an easy way to follow the code base for this project. It’s easy enough to install Jupyter Notebook on a machine too and well worth the install.

Plotting events on the field

There are a number of nice overviews of the structure of data given in the early part of the Notebook, but it’s more interesting when it comes to the static plots.

Figure 1: All Events Tottenham Hotspur 5 – 4 Leicester City, May 13, 2018.

Although of course too much detail can overwhelm and so the interactive plots in this Notebook are much better mechanism to share this information, as in you just have to hover the mouse over the event and its details come to the fore.

Figure 2: Pass Events Tottenham Hotspur – Leicester City, May 13, 2018.
 
Figure 3: Foul Events Tottenham Hotspur – Leicester City, May 13, 2018.
 
Figure 4: Fouls by a specific player Tottenham Hotspur – Leicester City, May 13, 2018.

This is a great Jupyter Notebook, firstly to really learn about Jupyter Notebooks, and then of course to be able to see the structure and how to use WyScout data. It is so important given this data set is used by so many tops clubs for the scouting, analyses and recruitment of players.

There’s more to come, as I plan to complete the match evolution, spatial stats in part II of this blog post and finally cover the advanced statistics: passing networks, flow centrality and playerRank in a part III of this blog post.

Football opening day predictions for League of Ireland 2021

March 19th, 2021

The new season (2021) is just about to get underway for the League of Ireland and unfortunately I’m not quite ready with the full league table predictions, but I have got an early version of match prediction analysis software in place (details to follow at a later date) and so I’m going to take a quick look at my local team Waterford FC first and see what we have in store in this first league game against Drogheda United.

What a first match to pick, Waterford and Drogheda haven’t played against each other in the LOI Premier League since July 2007, a 3-0 home win for Drogheda, and with all of my data so far going back to 2012, I’m left a little stumped with having to go on to predict this one. There’s just a model of the home / away goal scoring rates in the LOI since 2012, so in using this the result pops out as:

The Probability of a Home win for Drogheda is 30%
The Probability of an Away win for Waterford is 45%
The Probability of a Draw between Drogheda and Waterford is 25%

Okay an away win for Waterford (maybe), let’s see what happens.

By the way it wasn’t that easy to find a dated version of LOI results from way back when, Whoscored ended up being the best source, with an honourable mention for Goal.

As for the other matches here are the predictions for the opening day

Shamrock Rovers (50%) vs St. Patricks (23%)
Draw (27%)

Finn Harps (24%) vs Bohemians (47%)
Draw (29%)

Longford Town (22%) vs Derry City (53%)
Draw (25%)

Sligo Rovers (24%) vs Dundalk (50%)
Draw (26%)

So wins for Shamrock Rovers, Bohemians, Derry City, Dundalk and Waterford FC for this opening day, let’s see how it pans out.

Top Assister Womens World Cup 2019

March 15th, 2021

As an alternative update to the the first challenge were the Mens World Cup 2018 data was looked at, this time it’s a view of the Womens World Cup 2019.

Again I’m going to look at players that assist a goal and at a glance there was one clear player with the most goal assists at this World Cup 2019, that’s Sherida Spitse from the Netherlands, according to Wikipedia, when I head to another site there is a 2 player tie, both with 4 assists so I’m off again to see what the data highlights.

There’s a fantastic FIFA Technical Report on the Womens World Cup 2019, and while there’s fine details on loads of aspects of the games , the one thing it doesn’t cover is a easy table to see the player with the most assists.

So back to the code base, and using the same code base from the previous challenge the top three that popped out were:

  1. Megan Rapinoe (USA) : 21
  2. Amel Majri (France) : 17
  3. Sherida Spitse (Netherlands) : 16

What actions did they perform

Megan Rapinoe was top of the class with 21 passes that lead to a shot on goal.

Figure 1: Megan Rapinoe passes that assisted a shot on goal

So not only was Rapinoe the Golden Boot winner with six goals, tying with Alex Morgan (6 goals) and Ellen White (6 goals), she also had three assists in the tournament, also tied with Morgan (3 assists), and finally captured the Golden Boot for top scorer on the second tiebreaker, doing it with fewer minutes played than Morgan. Her six goals and three assists also saw her win the Golden Ball as the best player in the tournament.

Figure 2: Expected goals from Megan Rapinoe passes

Clearly Rapinoe was involved in so much of the attacking actions of the USA team, and deserved the accolades it is only lightly worth noting that a number of Rapinoes actions were from set plays (corners and the like).

But this specific write up as about the top assistor, and the next player to look at in the category is Amel Majri of France.

Figure 3: Expected goals from Amel Majri passes

Majri was a left back in the French squad, but wore the number 10 shirt and certainly represented the creative flare of that shirt with 17 passes that lead to a shot on goal. It is also quite noticeable that Majri was very consistent as to where a ball into the box would be placed, a dream for the attacking 3 of France to predict.

However the award for top assistor in the Woman’s World Cup 2019 rightly goes to Sherida Spitse of the Netherlands, with 16 passes that lead to a shot on goal, of which 4 were goals.

Figure 4: Expected goals from Sherida Spitse passes

Spitse played as a right sided defensive (pivot) midfield player for the Netherlands, and here are video clips of all of Spitse assists. This first one is from a special tactic camera.
Of note there are a number of the Womens World Cup matches covered by tactic cameras.

Spitse (Netherlands) assist for a goal in the match Netherlands vs. Canada – Thursday June 20, 2019

Spitse (Netherlands) assist for a goal in the match Netherlands vs. Japan – Tuesday June 25, 2019

First assist for Spitse (Netherlands) in the match Italy vs. Netherlands – Saturday June 29, 2019

Second assist for Spitse (Netherlands) in the match Italy vs. Netherlands – Saturday June 29, 2019

FWWC 2019 – Set-piece specialist World Cup 2019 Sherida Spitse

Mini-challenge: Plotting Actions of an Assister

March 1st, 2021
  1. Think of a player who you enjoyed watching at the recent Men’s or Women’s World Cups.
  2. What actions did they perform that were important and why?
  3. Plot the actions and describe how the data supports or contradicts your own analysis.
  4. Write a short text using at most two figures that illustrate your point.

I’ll address this challenge with a view on a player (or set of players as it turns out), and later in the piece I’ll also share some code snippets on how the information was extracted from the StatsBomb data.

Think of a player

The first challenge and I must admit I’m floundering a little already as I can barely remember the Mens 2018 World Cup, bar the eventual winners, the golden boot winner and the best player award.

However the one thing I do like in football is the player assisting goals.
So for this challenge I thought I would look at the one player that came out on top in this regard at the 2018 World Cup and to state the important actions they performed.

However on first glance it is the case that there is a 16 way tie for the most goal assists at the World Cup 2018, well according to Wikipedia, when I head to another site there is a 19 player tie.

So the first thing I’ve done is take the Team names for all the players listed as the top assist players, have a look at all the passes, and especially all the passes that lead to a goal (or more so a shot on goal) and try and find out the top three players that assisted shots on goal.

This lead to a surprise, for me anyway, because the top three that popped out from this exercise was:

  1. Kieran Trippier (England) : 25
  2. Neymar da Silva Santos Junior (Brazil) : 24
  3. Philippe Coutinho Correia (Brazil) : 13

What actions did they perform

Kieran Trippier had 25 passes that lead to a shot on goal.

Figure 1: Kieran Trippier passes that assisted a shot on goal

 
However Kieran Trippier does not appear on the top player to assist goals chart because for all these passes only one of them lead to an assisted goal (for John Stones).

Now it’s time to look at the Expected Goal plot of the shots that happened, straight after Kieran Trippier gave in the pass for that shot. As far as I can tell Kieran Trippier could have had at least 2, if not 3, other assisted goals, however there are many of those passes that are into areas where the Expected Goal of that next shot is low.

 

Figure 2: Expected goals from Kieran Trippier passes

It is also worth noting that a larger number of Trippiers actions are from set plays (corners and the like).

So who was the top assistor at the 2018 World Cup, is it Neymar of Brazil ?

Neymar comes in second on the list gathered earlier, however he only assisted 1 goal also.

 

Figure 3: Expected goals from Neymars passes

 

Therefore it must be Philippe Coutinho, with 13 passes that assisted a shot on goal and 2 actual goal assists he is the top player in this category.

 

Figure 4: Expected goals from Philippe Coutinho passes

 

Or is he ? A little addendum, when I look to the positions that Neymars passes went into and the Expected Goal setting for each one of those shots taken after a Neymar pass, for me he should be the player of note.

 

FWC 2018 – Group E – SRB v BRA Neymar Jr

 

Figure 5: Scatter plot of expected goals from Top 3 player passes

 

A large number of Neymars assists for a shot are close to the goal (120 is the goal line) and central to the goal (40 is the centre of the goal).

Code snippets

I also want to share some of the code snippets that helped gather the data from Statsbomb in regards to this challenge.

Extracting data from competitions.json

I found it handy to extract the season_id from the competitions.json file and using this to find all the matches with teams I was interested in. This also helped to identify which matches in the events folder had to be picked up.

The Assisted Shot Id

When iterating through the passes finding the pass_assisted_shot_id was very handy, but there was a bit that stumped me for a while when a pass didn’t turn into an assisted shot then that id is set as NaN or not a number, which is a little off putting at first.
There are also times when there are duplicate entries for the pass_assisted_shot_id for related events which in one iteration of this code had Kieran Trippier (England) with 37 actions.

        #Find passes with pass_assisted_shot_id set
        if isinstance(passasid, str):
            #If the pass_assisted_shot_id is a nan (Not a Number) then
            # we don't want it. Usually the assisted_shot_id is some hex value
            playersAsisstingShotsOnGoal.append(tpassplayerdir)

Given that the dataframe of shots for the matches set’s the index name as the Id of the event, well I took me a while to figure how to re-access that index name when doing a compare later on in the code. Of course it’s simple when you see how with the .name.

    for shotOnGoal in matchesShotsOnGoal:
        if (shotOnGoal.name in assistedShotId):
            x=shotOnGoal['location'][0]
            y=shotOnGoal['location'][1]

Acknowledgements

Thanks to feedback from Eoin O’Brien, Eoin Slattery, Michael Kerley, Oliver Critchfield and David Sumpter as this has lead to a revision of the text and images.

Plotting actions on a pitch

February 14th, 2021

Purpose

The second element to work on via this course is Plotting actions on a pitch and the purpose is to

  • Loading match data and finding all the shots
  • Plotting shots on the pitch and highlighting goals.
  • Plotting expected goals
  • Plotting passes

The code needed for this lecture is also available at the Github SoccermaticsForPython repo.

Set up

Under the organisation on Github called mmoffoot I used my fork of the SoccermaticsForPython repo into the mmoffoot area.

Then re-used the branch called ‘week1’ covering the changes I had made as of week1 of the course.

As mentioned before I placed the StatsBomb data in a directory higher and then just add a soft link to the source data within the ‘Statsbomb’ folder of this project.

What was coded

This second exercise concentrated on the England vs Sweden Womens World Cup 2019 match and the exercise asked that I :

  1. Create a dataframe of passes which contains all the passes in the match.
  2. Plot the start point of every Sweden pass. Attacking left to right.
  3. Plot only passes made by Caroline Seger (she is Sara Caroline Seger in the database)
  4. Plot arrows to show where the passes went

I made the code changes to 2PlotShotsAndPasses.py and run the code as

python3 2PlotShotsAndPasses.py

The output from this exercise were mainly images, which was nice.

The code provided from the original repo had shots and goals defined for display, and also a nice feature where the size of the circle represented the expected goal (xG) rating of the shot, as calculated by StatsBomb. This xG thing is covered in more depth a little later in the course, but in general terms my understanding is that xG is the probability of that shot being a goal, the higher the xG the more likely that should should have been a goal.

Therefore in the plot, a solid colour with a name is a goal and the less visable circles are shots that were taken but did not end up as a goal.

Figure 1: Shots and Goals England vs Sweden Womens World Cup 2019.

 

I have a thing about the pitch being green, with markings as white, so that was the one code change I made to this original code. This caused a slight issue with the goals scored by England as they played in white, I made their shots in white and the penalty spot is white. All of this means that it looks like a goal was scored by England from the penalty spot, but the goal scorer is not named. For now I think I’ll leave it as is.

As for the football analysis, there’s 6 shots on goal from within the box for Sweden, with 2 of those chances having a high xG, but were not goals, and one of the goals scored by Eva Jakobsson looks like it was a tough chance to convert, but she managed it. England only had 4 shots on goal from within the box, and even had a chance with a larger xG but didn’t score from it.

Originating Passes

Next up, the creation of a dataframe of passes was relatively straight forward as the StatsBomb data provides a ‘type_name’ of Pass within the data set so they were easy enough to extract. So the first real plot I have ever created is all the originating position of passes of the match for both Sweden and England, with Sweden playing from left to right. Also added a little text at the bottom to highlight the data came from StatsBomb.

Figure 2: Originating Position of Passes in the England vs Sweden Womens World Cup 2019.

 

That image is a bit of a mess, as in hard to offer any football analysis, so here’s just the Swedish passes.

SWEpasses
Figure 3: Originating Position of Passes in the England vs Sweden Womens World Cup 2019.

 

Now this is a little bit more interesting, no passes by Sweden in with the opposition box.

Passes made by Caroline Seger

Next up, passes made by Caroline Seger (she is Sara Caroline Seger in the database), and for this filtering out by ‘player_name’ is not too hard with the StatsBomb data.

SWEpassesSCS
Figure 4: Originating Position of Passes by Caroline Seger (SWE) in the England vs Sweden Womens World Cup 2019.

 

Given the plot, there is a lot of midfield play by Caroline Seger and of course the next step of plotting arrows to show where the passes went would really add context to this play.

Now to verify that the code I have written for the directional arrows is correct I went searching for video footage of the match, and low and behold, on YouTube there are a set of full match videos for the Womens World Cup 2019 taken from the tactical camera from behind one of the goals. Thankfully the England vs Sweden match is up there too. Now I found it really hard to pick out Caroline Seger, but I did find it easier to identify Rut Hedvig Lindahl, the Swedish goal keeper, and therefore I picked minute 11 (at random) to see if I could correlate the pass on the video and the pass on my plotted pitch.

passRHL
Figure 5: Match footage of pass by Rut Hedvig Lindahl (SWE) in the England vs Sweden Womens World Cup 2019.

 

At this very period, this action was a throw out pass by Rut Hedvig Lindahl, so quite distinctive.

SWEpassesdirectionRHL
Figure 5: Plot of pass by Rut Hedvig Lindahl (SWE) in the England vs Sweden Womens World Cup 2019.

 

And from my code. Well there was a little gashing of teeth. The tutorial video that goes with this session indicated a different result, with code that was slightly different but thankfully this was corrected by the lecturer at a later date, and therefore I was on the right track. Just goes to show obtaining some sort of footage from a match can help with these things.

So finally we have the plot I was looking for in this whole session.

Figure 4: Passes by Caroline Seger (SWE) in the England vs Sweden Womens World Cup 2019.

 

From a football analysis view point Caroline Seger was mentioned in the Swedish line up as taking up the Left Defensive Midfield position, and the passes she made show she did indeed play that role, with a preference to progress the ball towards the opposition goal, with one incisive ball into the opposition box.

What was learned

The main take aways, it was great to learn how plotting the direction of the pass with arrows for a player is important, along with verifying a few of the passes via match footage, if at all possible. I know not all matches will have the tactical camera, but more often than not for the higher tier matches there’s footage of a goal or two to be found.

Post Update:

Thanks to Mike for reviewing the content and for rightly pointed out that I had the wrong image in place for the directional passes of Caroline Seger, and the coordinates for England passes was slightly off. Images and associated code have been updated.