Blog of Random Thoughts and Pictures

Advanced football visualisations and data analysis of match events

March 22nd, 2021

Luca Pappalardo an author of the paper (PCR2019) Pappalardo, L., Cintia, P., Rossi, A. et al. A public data set of spatio-temporal match events in soccer competitions. Nature Scientific Data 6, 236 (2019) prepared a video, for the Friends of Tracking channel in 2020, to talk about some elements of this paper and the related Wyscout data set, which was used for the paper.

In this video Luca covers:

  • The Wyscout data set, how it is collected, from players to events.
  • Basic statistics on events and distributions.
  • Plotting events on the field, match evolution and spatial stats.
  • Advanced statistics: passing networks, flow centrality and playerRank

For this blog post I’m going to cover my take on the player events in the Wyscout data set and the display of some basic statistics on events and distributions.

The origin of the code is available on Github under the project “mapping-match-events-in-Python” and worked through in this video.

Set up

Then I created a new branch called ‘englanddata’ in this area to cover the changes I made.

The example code base uses the Italian league data, but the branch name might be a give away, seeing as the data set has English Premier League information for the 2017-18 season I wanted to run the code base against that data set, and so I took a copy of the original Jupyter Notebook and ran it against the English data as data_england_exploration.ipynb.

The full list of data available includes:

  • Italian first division 2017-18
  • English first division 2017-18
  • Spanish first division 2017-18
  • French first division 2017-18
  • German first division 2017-18
  • European Championship 2017-18
  • World Cup 2018

All the matches, events, players, and competition data sets are hosted in a figshare repository with all the data stored in a JSON format.

The way the data is collected is explained in the paper, with a nice visual representation in the Notebook so I won’t ruin that insight and will let you read it in there.

I should say a quick word on Jupyter Notebooks, its an interactive way of developing and presenting data science projects, and I can really see that it’s an easy way to follow the code base for this project. It’s easy enough to install Jupyter Notebook on a machine too and well worth the install.

Plotting events on the field

There are a number of nice overviews of the structure of data given in the early part of the Notebook, but it’s more interesting when it comes to the static plots.

Figure 1: All Events Tottenham Hotspur 5 – 4 Leicester City, May 13, 2018.

Although of course too much detail can overwhelm and so the interactive plots in this Notebook are much better mechanism to share this information, as in you just have to hover the mouse over the event and its details come to the fore.

Figure 2: Pass Events Tottenham Hotspur – Leicester City, May 13, 2018.
 
Figure 3: Foul Events Tottenham Hotspur – Leicester City, May 13, 2018.
 
Figure 4: Fouls by a specific player Tottenham Hotspur – Leicester City, May 13, 2018.

This is a great Jupyter Notebook, firstly to really learn about Jupyter Notebooks, and then of course to be able to see the structure and how to use WyScout data. It is so important given this data set is used by so many tops clubs for the scouting, analyses and recruitment of players.

There’s more to come, as I plan to complete the match evolution, spatial stats in part II of this blog post and finally cover the advanced statistics: passing networks, flow centrality and playerRank in a part III of this blog post.

Handling StatsBomb Event Data

February 7th, 2021

Yes I did have a blog post back in September 2020, highlighting that I was undertaking the Uppsala University course “Mathematical Modelling of Football” which overtook my time and life in Q4 2020. In completing the course I’m finally getting my head up in 2021 to record all my notes and code and to take a journey to share those notes as I go down through each section and sub-section of the course.

So here’s the first of hopefully many notes from the course, which were originally written in Asciidoc.

 

Purpose of Handling Event Data

The first element to work on via this course is Handling Event Data and the purpose is to learn how to :-

  • Download code and data
  • Organise working folder
  • Load in data from a json file.
  • Using ‘for’ loops and ‘if’ statements
  • Identify specific matches in Statsbomb data

The code needed for this lecture is available at the Github SoccermaticsForPython repo.

Set up

Created a new organisation on Github called mmoffoot standing for Mathematical Modelling of Football. The purpose is to fork the Github projects used in the course to track my own changes to those repos.

First up is a fork of the SoccermaticsForPython repo into the mmoffoot area.

Then created a branch called ‘week1’ covering the changes I had made as of week1 of the course.

The next little hurdle here is the loading of the Statsbomb data. It’s really in another repo on Github called statsbomb / open-data and in order to always have access to the StatsBomb data within this repo was going to set up a git submodule for this repo. This means that any time in the future when this repo is cloned (new) then it has to be done with the recursive command switch.

However then I noted that the StatsBomb data is over 3Gb in size and that it doesn’t really make sense have a couple of copies of this data on the one machine so I just placed it in a directory higher.I then just add a soft link to the source data within the ‘Statsbomb’ folder.

ln -s ../../statsbomb-opendata/data .

Also modified the README file to point this out.

What was coded

The first exercise is to

  1. Edit the code to print out the result list for the Mens World cup
  2. Edit the code to find the ID for England vs. Sweden
  3. Write new code to write out a list of just Sweden’s results in the tournament.

I made the code changes to 1LoadInData.py and run the code as

python3 1LoadInData.py

The output reads

The match between Croatia and Denmark finished 1 : 1
The match between Australia and Peru finished 0 : 2
.........
.........
The match between Spain and Russia finished 1 : 1
The match between Croatia and England finished 2 : 1
The Sweden match between Mexico and Sweden finished 0 : 3
The Sweden match between Sweden and South Korea finished 1 : 0
The Sweden match between Sweden and Switzerland finished 1 : 0
Sweden vs England has id:8651
The Sweden match between Sweden and England finished 0 : 2
The Sweden match between Germany and Sweden finished 2 : 1

I think the exercise is complete.

What was learned

Learning how to extract match results from the StatsBomb open data is important, and being able to read in the StatsBomb open data is great because at the time of writing it has a number of competitions covered in it.

  • International Mens FIFA World Cup 2018 (competition_id=43)
  • Europe Champions League 2018/2019
  • Europe Champions League 2017/2018
  • Europe Champions League 2016/2017
  • Europe Champions League 2015/2016
  • Europe Champions League 2014/2015
  • Europe Champions League 2013/2014
  • Europe Champions League 2012/2013
  • Europe Champions League 2011/2012
  • Europe Champions League 2010/2011
  • Europe Champions League 2009/2010
  • Europe Champions League 2008/2009
  • Europe Champions League 2006/2007
  • Europe Champions League 2004/2005
  • Europe Champions League 2003/2004
  • Europe Champions League 1999/2000
  • Spain La Liga 2018/2019
  • Spain La Liga 2017/2018
  • Spain La Liga 2016/2017
  • Spain La Liga 2015/2016
  • Spain La Liga 2014/2015
  • Spain La Liga 2013/2014
  • Spain La Liga 2012/2013
  • Spain La Liga 2011/2012
  • Spain La Liga 2010/2011
  • Spain La Liga 2009/2010
  • Spain La Liga 2008/2009
  • Spain La Liga 2007/2008
  • Spain La Liga 2006/2007
  • Spain La Liga 2005/2006
  • Spain La Liga 2004/2005
  • England Premier League 2003/2004
  • International Women’s World Cup 2019 (competition_id=72)
  • United States of America NWSL (Female) 2018
  • England FA Women’s Super League 2019/2020
  • England FA Women’s Super League 2018/2019

Maths, Modelling, Software and Football what a match.

September 2nd, 2020

Back in 2018 I took to reading “Soccermatics” and really enjoyed, what I would say is a different view on the sport. I say different because its less of the blaa blaa and gives strength to looking at how football problems are being solved on the pitch, with the data to back it up.

More recently reading “Zonal Marking” has really excited the mind, and then watching the “Friends of Tracking” video sessions has been brilliant for the getting the little software skills I have, applying them to football and bringing both books content into even sharper focus.

I recently spotted the chance to participate in the Uppsala University course “Mathematical Modelling of Football” which to me is just an opportunity not to be missed, and with some help of Google Translate and some very late nights I’m going to take on Mathematical Modelling of Football.

Here’s to going back to school !