Blog of Random Thoughts and Pictures

Mini-challenge: Plotting Actions of an Assister

March 1st, 2021
  1. Think of a player who you enjoyed watching at the recent Men’s or Women’s World Cups.
  2. What actions did they perform that were important and why?
  3. Plot the actions and describe how the data supports or contradicts your own analysis.
  4. Write a short text using at most two figures that illustrate your point.

I’ll address this challenge with a view on a player (or set of players as it turns out), and later in the piece I’ll also share some code snippets on how the information was extracted from the StatsBomb data.

Think of a player

The first challenge and I must admit I’m floundering a little already as I can barely remember the Mens 2018 World Cup, bar the eventual winners, the golden boot winner and the best player award.

However the one thing I do like in football is the player assisting goals.
So for this challenge I thought I would look at the one player that came out on top in this regard at the 2018 World Cup and to state the important actions they performed.

However on first glance it is the case that there is a 16 way tie for the most goal assists at the World Cup 2018, well according to Wikipedia, when I head to another site there is a 19 player tie.

So the first thing I’ve done is take the Team names for all the players listed as the top assist players, have a look at all the passes, and especially all the passes that lead to a goal (or more so a shot on goal) and try and find out the top three players that assisted shots on goal.

This lead to a surprise, for me anyway, because the top three that popped out from this exercise was:

  1. Kieran Trippier (England) : 25
  2. Neymar da Silva Santos Junior (Brazil) : 24
  3. Philippe Coutinho Correia (Brazil) : 13

What actions did they perform

Kieran Trippier had 25 passes that lead to a shot on goal.

Figure 1: Kieran Trippier passes that assisted a shot on goal

 
However Kieran Trippier does not appear on the top player to assist goals chart because for all these passes only one of them lead to an assisted goal (for John Stones).

Now it’s time to look at the Expected Goal plot of the shots that happened, straight after Kieran Trippier gave in the pass for that shot. As far as I can tell Kieran Trippier could have had at least 2, if not 3, other assisted goals, however there are many of those passes that are into areas where the Expected Goal of that next shot is low.

 

Figure 2: Expected goals from Kieran Trippier passes

It is also worth noting that a larger number of Trippiers actions are from set plays (corners and the like).

So who was the top assistor at the 2018 World Cup, is it Neymar of Brazil ?

Neymar comes in second on the list gathered earlier, however he only assisted 1 goal also.

 

Figure 3: Expected goals from Neymars passes

 

Therefore it must be Philippe Coutinho, with 13 passes that assisted a shot on goal and 2 actual goal assists he is the top player in this category.

 

Figure 4: Expected goals from Philippe Coutinho passes

 

Or is he ? A little addendum, when I look to the positions that Neymars passes went into and the Expected Goal setting for each one of those shots taken after a Neymar pass, for me he should be the player of note.

 

FWC 2018 – Group E – SRB v BRA Neymar Jr

 

Figure 5: Scatter plot of expected goals from Top 3 player passes

 

A large number of Neymars assists for a shot are close to the goal (120 is the goal line) and central to the goal (40 is the centre of the goal).

Code snippets

I also want to share some of the code snippets that helped gather the data from Statsbomb in regards to this challenge.

Extracting data from competitions.json

I found it handy to extract the season_id from the competitions.json file and using this to find all the matches with teams I was interested in. This also helped to identify which matches in the events folder had to be picked up.

The Assisted Shot Id

When iterating through the passes finding the pass_assisted_shot_id was very handy, but there was a bit that stumped me for a while when a pass didn’t turn into an assisted shot then that id is set as NaN or not a number, which is a little off putting at first.
There are also times when there are duplicate entries for the pass_assisted_shot_id for related events which in one iteration of this code had Kieran Trippier (England) with 37 actions.

        #Find passes with pass_assisted_shot_id set
        if isinstance(passasid, str):
            #If the pass_assisted_shot_id is a nan (Not a Number) then
            # we don't want it. Usually the assisted_shot_id is some hex value
            playersAsisstingShotsOnGoal.append(tpassplayerdir)

Given that the dataframe of shots for the matches set’s the index name as the Id of the event, well I took me a while to figure how to re-access that index name when doing a compare later on in the code. Of course it’s simple when you see how with the .name.

    for shotOnGoal in matchesShotsOnGoal:
        if (shotOnGoal.name in assistedShotId):
            x=shotOnGoal['location'][0]
            y=shotOnGoal['location'][1]

Acknowledgements

Thanks to feedback from Eoin O’Brien, Eoin Slattery, Michael Kerley, Oliver Critchfield and David Sumpter as this has lead to a revision of the text and images.

Plotting actions on a pitch

February 14th, 2021

Purpose

The second element to work on via this course is Plotting actions on a pitch and the purpose is to

  • Loading match data and finding all the shots
  • Plotting shots on the pitch and highlighting goals.
  • Plotting expected goals
  • Plotting passes

The code needed for this lecture is also available at the Github SoccermaticsForPython repo.

Set up

Under the organisation on Github called mmoffoot I used my fork of the SoccermaticsForPython repo into the mmoffoot area.

Then re-used the branch called ‘week1’ covering the changes I had made as of week1 of the course.

As mentioned before I placed the StatsBomb data in a directory higher and then just add a soft link to the source data within the ‘Statsbomb’ folder of this project.

What was coded

This second exercise concentrated on the England vs Sweden Womens World Cup 2019 match and the exercise asked that I :

  1. Create a dataframe of passes which contains all the passes in the match.
  2. Plot the start point of every Sweden pass. Attacking left to right.
  3. Plot only passes made by Caroline Seger (she is Sara Caroline Seger in the database)
  4. Plot arrows to show where the passes went

I made the code changes to 2PlotShotsAndPasses.py and run the code as

python3 2PlotShotsAndPasses.py

The output from this exercise were mainly images, which was nice.

The code provided from the original repo had shots and goals defined for display, and also a nice feature where the size of the circle represented the expected goal (xG) rating of the shot, as calculated by StatsBomb. This xG thing is covered in more depth a little later in the course, but in general terms my understanding is that xG is the probability of that shot being a goal, the higher the xG the more likely that should should have been a goal.

Therefore in the plot, a solid colour with a name is a goal and the less visable circles are shots that were taken but did not end up as a goal.

Figure 1: Shots and Goals England vs Sweden Womens World Cup 2019.

 

I have a thing about the pitch being green, with markings as white, so that was the one code change I made to this original code. This caused a slight issue with the goals scored by England as they played in white, I made their shots in white and the penalty spot is white. All of this means that it looks like a goal was scored by England from the penalty spot, but the goal scorer is not named. For now I think I’ll leave it as is.

As for the football analysis, there’s 6 shots on goal from within the box for Sweden, with 2 of those chances having a high xG, but were not goals, and one of the goals scored by Eva Jakobsson looks like it was a tough chance to convert, but she managed it. England only had 4 shots on goal from within the box, and even had a chance with a larger xG but didn’t score from it.

Originating Passes

Next up, the creation of a dataframe of passes was relatively straight forward as the StatsBomb data provides a ‘type_name’ of Pass within the data set so they were easy enough to extract. So the first real plot I have ever created is all the originating position of passes of the match for both Sweden and England, with Sweden playing from left to right. Also added a little text at the bottom to highlight the data came from StatsBomb.

Figure 2: Originating Position of Passes in the England vs Sweden Womens World Cup 2019.

 

That image is a bit of a mess, as in hard to offer any football analysis, so here’s just the Swedish passes.

SWEpasses
Figure 3: Originating Position of Passes in the England vs Sweden Womens World Cup 2019.

 

Now this is a little bit more interesting, no passes by Sweden in with the opposition box.

Passes made by Caroline Seger

Next up, passes made by Caroline Seger (she is Sara Caroline Seger in the database), and for this filtering out by ‘player_name’ is not too hard with the StatsBomb data.

SWEpassesSCS
Figure 4: Originating Position of Passes by Caroline Seger (SWE) in the England vs Sweden Womens World Cup 2019.

 

Given the plot, there is a lot of midfield play by Caroline Seger and of course the next step of plotting arrows to show where the passes went would really add context to this play.

Now to verify that the code I have written for the directional arrows is correct I went searching for video footage of the match, and low and behold, on YouTube there are a set of full match videos for the Womens World Cup 2019 taken from the tactical camera from behind one of the goals. Thankfully the England vs Sweden match is up there too. Now I found it really hard to pick out Caroline Seger, but I did find it easier to identify Rut Hedvig Lindahl, the Swedish goal keeper, and therefore I picked minute 11 (at random) to see if I could correlate the pass on the video and the pass on my plotted pitch.

passRHL
Figure 5: Match footage of pass by Rut Hedvig Lindahl (SWE) in the England vs Sweden Womens World Cup 2019.

 

At this very period, this action was a throw out pass by Rut Hedvig Lindahl, so quite distinctive.

SWEpassesdirectionRHL
Figure 5: Plot of pass by Rut Hedvig Lindahl (SWE) in the England vs Sweden Womens World Cup 2019.

 

And from my code. Well there was a little gashing of teeth. The tutorial video that goes with this session indicated a different result, with code that was slightly different but thankfully this was corrected by the lecturer at a later date, and therefore I was on the right track. Just goes to show obtaining some sort of footage from a match can help with these things.

So finally we have the plot I was looking for in this whole session.

Figure 4: Passes by Caroline Seger (SWE) in the England vs Sweden Womens World Cup 2019.

 

From a football analysis view point Caroline Seger was mentioned in the Swedish line up as taking up the Left Defensive Midfield position, and the passes she made show she did indeed play that role, with a preference to progress the ball towards the opposition goal, with one incisive ball into the opposition box.

What was learned

The main take aways, it was great to learn how plotting the direction of the pass with arrows for a player is important, along with verifying a few of the passes via match footage, if at all possible. I know not all matches will have the tactical camera, but more often than not for the higher tier matches there’s footage of a goal or two to be found.

Post Update:

Thanks to Mike for reviewing the content and for rightly pointed out that I had the wrong image in place for the directional passes of Caroline Seger, and the coordinates for England passes was slightly off. Images and associated code have been updated.

 

Handling StatsBomb Event Data

February 7th, 2021

Yes I did have a blog post back in September 2020, highlighting that I was undertaking the Uppsala University course “Mathematical Modelling of Football” which overtook my time and life in Q4 2020. In completing the course I’m finally getting my head up in 2021 to record all my notes and code and to take a journey to share those notes as I go down through each section and sub-section of the course.

So here’s the first of hopefully many notes from the course, which were originally written in Asciidoc.

 

Purpose of Handling Event Data

The first element to work on via this course is Handling Event Data and the purpose is to learn how to :-

  • Download code and data
  • Organise working folder
  • Load in data from a json file.
  • Using ‘for’ loops and ‘if’ statements
  • Identify specific matches in Statsbomb data

The code needed for this lecture is available at the Github SoccermaticsForPython repo.

Set up

Created a new organisation on Github called mmoffoot standing for Mathematical Modelling of Football. The purpose is to fork the Github projects used in the course to track my own changes to those repos.

First up is a fork of the SoccermaticsForPython repo into the mmoffoot area.

Then created a branch called ‘week1’ covering the changes I had made as of week1 of the course.

The next little hurdle here is the loading of the Statsbomb data. It’s really in another repo on Github called statsbomb / open-data and in order to always have access to the StatsBomb data within this repo was going to set up a git submodule for this repo. This means that any time in the future when this repo is cloned (new) then it has to be done with the recursive command switch.

However then I noted that the StatsBomb data is over 3Gb in size and that it doesn’t really make sense have a couple of copies of this data on the one machine so I just placed it in a directory higher.I then just add a soft link to the source data within the ‘Statsbomb’ folder.

ln -s ../../statsbomb-opendata/data .

Also modified the README file to point this out.

What was coded

The first exercise is to

  1. Edit the code to print out the result list for the Mens World cup
  2. Edit the code to find the ID for England vs. Sweden
  3. Write new code to write out a list of just Sweden’s results in the tournament.

I made the code changes to 1LoadInData.py and run the code as

python3 1LoadInData.py

The output reads

The match between Croatia and Denmark finished 1 : 1
The match between Australia and Peru finished 0 : 2
.........
.........
The match between Spain and Russia finished 1 : 1
The match between Croatia and England finished 2 : 1
The Sweden match between Mexico and Sweden finished 0 : 3
The Sweden match between Sweden and South Korea finished 1 : 0
The Sweden match between Sweden and Switzerland finished 1 : 0
Sweden vs England has id:8651
The Sweden match between Sweden and England finished 0 : 2
The Sweden match between Germany and Sweden finished 2 : 1

I think the exercise is complete.

What was learned

Learning how to extract match results from the StatsBomb open data is important, and being able to read in the StatsBomb open data is great because at the time of writing it has a number of competitions covered in it.

  • International Mens FIFA World Cup 2018 (competition_id=43)
  • Europe Champions League 2018/2019
  • Europe Champions League 2017/2018
  • Europe Champions League 2016/2017
  • Europe Champions League 2015/2016
  • Europe Champions League 2014/2015
  • Europe Champions League 2013/2014
  • Europe Champions League 2012/2013
  • Europe Champions League 2011/2012
  • Europe Champions League 2010/2011
  • Europe Champions League 2009/2010
  • Europe Champions League 2008/2009
  • Europe Champions League 2006/2007
  • Europe Champions League 2004/2005
  • Europe Champions League 2003/2004
  • Europe Champions League 1999/2000
  • Spain La Liga 2018/2019
  • Spain La Liga 2017/2018
  • Spain La Liga 2016/2017
  • Spain La Liga 2015/2016
  • Spain La Liga 2014/2015
  • Spain La Liga 2013/2014
  • Spain La Liga 2012/2013
  • Spain La Liga 2011/2012
  • Spain La Liga 2010/2011
  • Spain La Liga 2009/2010
  • Spain La Liga 2008/2009
  • Spain La Liga 2007/2008
  • Spain La Liga 2006/2007
  • Spain La Liga 2005/2006
  • Spain La Liga 2004/2005
  • England Premier League 2003/2004
  • International Women’s World Cup 2019 (competition_id=72)
  • United States of America NWSL (Female) 2018
  • England FA Women’s Super League 2019/2020
  • England FA Women’s Super League 2018/2019

Maths, Modelling, Software and Football what a match.

September 2nd, 2020

Back in 2018 I took to reading “Soccermatics” and really enjoyed, what I would say is a different view on the sport. I say different because its less of the blaa blaa and gives strength to looking at how football problems are being solved on the pitch, with the data to back it up.

More recently reading “Zonal Marking” has really excited the mind, and then watching the “Friends of Tracking” video sessions has been brilliant for the getting the little software skills I have, applying them to football and bringing both books content into even sharper focus.

I recently spotted the chance to participate in the Uppsala University course “Mathematical Modelling of Football” which to me is just an opportunity not to be missed, and with some help of Google Translate and some very late nights I’m going to take on Mathematical Modelling of Football.

Here’s to going back to school !

Bridging MQTT brokers and using security certs from Let’s Encrypt

October 14th, 2019

This is an item that came up while working on a project within the TSSG and so might be worth sharing.

Have you ever tried to use a MQTT broker ? Message Queuing Telemetry Transport (MQTT) is a machine-to-machine (M2M), Internet of Things data protocol, which is in line with other data protocols such as XMPP, CoAP, AMQP, and Websockets. Invented in 1999, MQTT is now an OASIS (Organization for the Advancement of Structured Information Standards) standard, and ISO standard (ISO/IEC PRF 20922).

MQTT is extensively used in Amazon Web Services, Microsoft Azure IoT Hub, IBM WebSphere MQ, and is a publish/subscribe message exchange pattern, that can support persistent message storage on the broker and supports security in the form of authentication using user name and password, and encryption using SSL/TLS.

For something like the Eclipse Mosquitto broker, MQTT it has a really small code footprint, the libmosquitto (client library) is about 1.3 MB and is ideal if processor or memory resources are limited and also ideal if bandwidth is low or network is unreliable. Classic problems in the IoT space.

In the case of using MQTT for the smart grid, scale and security are top priorities. To achieve scale I’ve looked at bridging MQTT brokers in a hub and spoke model, where a very light MQTT broker is at the edge of the network (at the end of the spoke) and there’s a large MQTT broker at the hub which can aggregate all the data.

However the purpose of this post is to highlight the security aspects within MQTT and in particular the application of encryption (SSL/TLS) when using Let’s Encrypt certificates. Applying a certificate to an MQTT broker is not too hard, there’s a nice guide here on Mosquitto SSL Configuration for MQTT TLS Security and here too on SSL/TLS Client Certs to Secure MQTT however in the vast majority of cases the examples use self-signed certs and not certs as provided by Let’s Encrypt.

By the way if you don’t know Let’s Encrypt is a non-profit certificate authority run by the Internet Security Research Group that provides X.509 certificates for Transport Layer Security encryption at no charge. The certificate is valid for 90 days and the renewal process is quite simple.

Now bridging two MQTT brokers can be relatively straight forward too however getting the certs right when you want that bridge to be encrypted can be a little tricky look at how much you have to do to bridge a Mosquitto MQTT Broker to AWS IoT.

In my case I wanted to bridge two Mosquitto MQTT Brokers, each with encryption enabled by a Let’s Encrypt cert. Firstly I created a [special Docker container]() that could pick up the Let’s Encrypt cert, and having followed all the guides I kept getting the following error in the logs

OpenSSL Error: error:14037418:SSL routines:ACCEPT_SR_KEY_EXCH:tlsv1 alert unknown ca

OpenSSL Error: error:140370E5:SSL routines:ACCEPT_SR_KEY_EXCH:ssl handshake failure

Socket error on client , disconnecting.

I tried verifying the certs, by installing openssl

openssl verify cert.pem

But all was fine.

I thought I had to download the trusted root CA certificates for Let’s Encrypt and place it somewhere in the Alpine linux system (the base OS of the broker), but I must admit this “somewhere” was not so clear me.

The problem is that the MQTT broker does not know how to verify its own CA before starting the ssl exchange with any client. This is because the CA signing the Let’s Encrypt cert is not yet distributed and bundled by default in to the Alpine Linux system and therefore has to be added manually.

In the Mosquitto MQTT broker configuration, instead of just pointing directly at the chain.pem file I decided to point at the default place where all ca certs should be.

#cafile /mosquitto/config/certs/chain.pem
capath /etc/ssl/certs

And this write up on installing certificates in an Alpine Image to establish Secured Communication (SSL/TLS) really got to the heart of the matter, the cert needs to be copied to a special directory /usr/local/share/ca-certificates/and then you need to run the program update-ca-certificates so it gets placed in the right way into the folder /etc/ssl/certs.

After much head scratching, it all comes down to 2 command lines

cp /mosquitto/config/certs/chain.pem /usr/local/share/ca-certificates/chain.pem

update-ca-certificates

Once done (via a docker-entrypoint.sh command) the container is able to handle the CA issue, and bridging 2 Mosquitto MQTT brokers that are using Let’s Encrypt certificates can be achieved.

Splash World 10KM

April 12th, 2015

I did this race a few years ago and thought hey why not try it again and I’m glad I did, a new course which was super flat and nice to run. Grabbed a time of 42.12, which has me in a good mood! Also nice event afterwards for the kids to enjoy and play around Tramore.

Run Mount Juliet 10km

February 14th, 2015

What is it with me and hilly races. First of the year and I have to say this Mount Juliet race is a great one to start off with. There’s a steady incline for the first 1.5 kms then a down hill section and then the WALL …….. the hill that just seems to go on for ever.

Once I got over that the race was normal enough and grabbed a time of 43.17, which is another few seconds scrapped off from last year (2014 was 43.49). 

Rathgormack 5 Mile Race

July 11th, 2014

Okay I was warned that this would be hilly …. but boy was that race hilly. Enjoyable though and in a time of 33:54 I’m pretty happy with that.

Deadmans 5 Mile

June 6th, 2014

Carrick, Deadman …. hills ahhh I was just not feeling right for this one, dare I say had a slight injury …. excuses anyway in a time of 33:24, which should have been better.

Tom Jordan 5 Miles

May 9th, 2014

The runs are coming thick and fast these days and up today was the Portlaw 5 mile run. I must admit I’m not too sure about these shorter than 10km runs but then again I should be running for pace so 5 miles it is.

This year the Portlaw run was on a new course and we had to walk a mile outside of the village to get to the top of a hill for the start line. It felt like I was being wound up, getting up the hill just to let it all out going back down again and sure enough once the horn went for the start I was off down that hill in a shot just like everyone else.

I controlled the enthusiasm though I’ve had these quick starts before and it never works out in the long run. Down through Portlaw was fun and really only mile 3 caught me out a little.

Hitting the 4 mile marker was funny as someone called for the time I yelled 26 mins and it reminded me that I was on track for my goal time so I went for it hell for leather in the last mile. That was fun as I left 4 people behind although they nearly all caught me on the line was maybe I went too early again!

All was well at the end and I grabbed a time of 32:42 which was 1:16 quicker than I did l last year. Now I was super happy with that result.