Luca Pappalardo an author of the paper (PCR2019) Pappalardo, L., Cintia, P., Rossi, A. et al. A public data set of spatio-temporal match events in soccer competitions. Nature Scientific Data 6, 236 (2019) prepared a video, for the Friends of Tracking channel in 2020, to talk about some elements of this paper and the related Wyscout data set, which was used for the paper.
In this video Luca covers:
- The Wyscout data set, how it is collected, from players to events.
- Basic statistics on events and distributions.
- Plotting events on the field, match evolution and spatial stats.
- Advanced statistics: passing networks, flow centrality and playerRank
For this blog post I’m going to cover my take on the player events in the Wyscout data set and the display of some basic statistics on events and distributions.
The origin of the code is available on Github under the project “mapping-match-events-in-Python” and worked through in this video.
Set up
Under the organisation on Github called mmoffoot I forked the “mapping-match-events-in-Python” repo into my mmoffoot area.
Then I created a new branch called ‘englanddata’ in this area to cover the changes I made.
The example code base uses the Italian league data, but the branch name might be a give away, seeing as the data set has English Premier League information for the 2017-18 season I wanted to run the code base against that data set, and so I took a copy of the original Jupyter Notebook and ran it against the English data as data_england_exploration.ipynb.
The full list of data available includes:
- Italian first division 2017-18
- English first division 2017-18
- Spanish first division 2017-18
- French first division 2017-18
- German first division 2017-18
- European Championship 2017-18
- World Cup 2018
All the matches, events, players, and competition data sets are hosted in a figshare repository with all the data stored in a JSON format.
The way the data is collected is explained in the paper, with a nice visual representation in the Notebook so I won’t ruin that insight and will let you read it in there.
I should say a quick word on Jupyter Notebooks, its an interactive way of developing and presenting data science projects, and I can really see that it’s an easy way to follow the code base for this project. It’s easy enough to install Jupyter Notebook on a machine too and well worth the install.
Plotting events on the field
There are a number of nice overviews of the structure of data given in the early part of the Notebook, but it’s more interesting when it comes to the static plots.
Although of course too much detail can overwhelm and so the interactive plots in this Notebook are much better mechanism to share this information, as in you just have to hover the mouse over the event and its details come to the fore.
This is a great Jupyter Notebook, firstly to really learn about Jupyter Notebooks, and then of course to be able to see the structure and how to use WyScout data. It is so important given this data set is used by so many tops clubs for the scouting, analyses and recruitment of players.
There’s more to come, as I plan to complete the match evolution, spatial stats in part II of this blog post and finally cover the advanced statistics: passing networks, flow centrality and playerRank in a part III of this blog post.
You must be logged in to post a comment.