So in my
first post I talked about using a model that predicts the win probability at a particular point in a match to quantify how good a match is in terms of excitement, tension and surprise. In this post I'm going to talk a bit more about where that model comes from. It get's pretty mathsy so don't say I didn't warn you!
The inspiration comes from
this paper I found which talks about using a Brownian Motion model for the Progess of Sports Scores. I've used some concepts within and applied them to AFL to calculate in game win probabilities and then used these numbers to quantify excitement, tension and surprise.
We first assume that the margin from the home teams perspective (referred to as \( X(t) \) from here on) can be modelled as a Brownian motion process with drift \( \mu \) per unit time (this represents the expected margin over an entire match and includes adjustment for relative team strength and home ground advantage) and variance \( \sigma ^2 \) per unit time. I'll scale all matches to be 1 unit of time, so \( X(0), X(0.25), X(0.5), X(0.75) \) and \( X(1) \) correspond to the margin at the start of the game, quarter time, half time, 3 quarter time and the end of the game respectively.
Under a Brownian motion model we need a few things to hold:
- \( X(t) \sim N(\mu t, \sigma ^2 t) \). This would mean that the margin can be modelled as normal with a growing spread as a match progresses. The growth would be consistent with a variance of \( \sigma ^2 \).
- \( X(s) - X(t), s>t \), is independent of \( X(t) \) with \( X(s) - X(t) \sim N(\mu (s-t), \sigma ^2 (s-t) \). This would mean that the distribution of margins you see between two times (i.e. from quarter time to half time) is independent of the distribution of margins at the first time (i.e. quarter time), and solely dependent on \(\mu, \sigma \) and the time available (0.25 of a match in this case)
- If we assume the margin is normally distributed, we need to assume it can take any value (not just integers). The journal paper I linked to at the start of this post explores this and concludes that adjusting for this makes only a small difference in any calculations.
It turns out the margin does closely align with these points, so our Brownian motion model might actually be a good fit. To kind of show this (in a very hand wavey kind of way!), below are plots of the distribution of margins (from the start of the previous quarter to the end of the specified quarter on the left and from the start of a match to the end of specified quarters on the right).
The chart on the left shows that the distribution of margins during each quarter is roughly the same. This leads us to believe that the distribution of margins for each quarter are independent of each other as they only seem to depend on the time (0.25 of a match for each quarter) and the values for \( \mu, \sigma \).
The chart on the right shows the distribution of the margin as the match progresses. The variance does increase, and if we model as a normal distribution we get the following estimates for \( \mu \) and \( \sigma \) for each point in the match:
End of Quarter |
Mean |
Standard Deviation |
1 ( \( t=0.25\) ) |
2.1 |
17.2 |
2 ( \( t=0.5 \) ) |
4.2 |
23.9 |
3 ( \( t=0.75\) ) |
6.3 |
32.6 |
4 ( \( t=1\) ) |
8.8 |
40.3 |
Turns out these are pretty consistent with our Brownian motion model with \( \mu \approx 8 \) and \( \sigma \approx 38 \). Here, \( \mu \approx 8 \) represents an on average 8 point advantage to the home team across all matches in AFL/VFL history.
Where these assumptions break down is when you consider what the distribution of margins looks like over smaller time periods than a quarter. Below shows the distribution of margins over the first 1%, 5%, 10% and 15% of a match (you can toggle each series on and off by clicking it in the legend).
These show that the distribution only becomes kind of Normal when we consider the distribution over at least 10-15% of a match. This is because it's possible to score in 1's and 6's so until there is sufficient time to take many shots, there are fewer possible margins giving the squiggly distributions you see above.
As the purpose of this model is to generate margin distributions over the remainder of a match, this means the Normal assumption breaks down when there is somewhere between 10-15% or less of the match remaining. I'm still going to use the model for these times anyway and will show the effect it has on win probability calculations.
Pushing on, given we're modelling the margin as a normal distribution with mean and standard deviation \( \mu, \sigma \) respectively, this means to find the probability the home team wins the match, we need to find the probability the margin is greater than 0 when \( t=1 \). I.e. we need to find \( Pr (X(t=1) > 0) \).
This would actually have to be \( Pr (X(t=1) > 0.5) \) if we include the possibility for draws, but I'm going to ignore draws to keep things simple and so that \(1 - Pr(\text{Home Win} ) = Pr(\text{Away Win}) \).
To find \( Pr (X(t=1) > 0) \), we transform a margin of 0 to a standard z score: \( z = \frac{0 - \mu}{\sigma} = -\frac{\mu}{\sigma} \). Then we use some maths to find the probability of finding a \( Z \) value greater than this:
\[ Pr(Z > z) = Pr(Z < -z) = \Phi (-z) = \Phi \left( \frac{\mu}{\sigma} \right) \]
where \( \Phi (x) \) is the cumulative standard normal distribution function.
Now from this, I have a model that tells me the win probability of the home team at the start of the match given I can accurately estimate \( \mu \) and \( \sigma \). I built an ELO model (which I will go into more detail of in a future post) that predicts these parameters for each match to a fairly good accuracy (i.e. my mean absolute error is around 30 points, tipping accuracy is around 70% and bits per match is around 0.2).
Now here's where it get's cool. Using maths and assuming \( X(t) \) follows the Brownian motion model above, we can say that the probability the home team wins given a lead or deficit (\(\ell\)) at time \( t\) is:
\[ P(\ell, t; \mu, \sigma) = Pr(X(1) > 0 | X(t) = \ell) = Pr(X(1) - X(t) > -\ell) = \Phi \Biggl(\frac{\ell + (1-t)\mu}{\sigma \sqrt{1-t}} \Biggr) \]
The numerator here represents a forecast of the final margin given the net scoring ability of the teams and remaining time left. For example if we predicted the home team was a 20 point stronger team across the match, at half time we'd expect them to be a 10 point better team for the remainder of the match. If they were down by 15 points at half time we'd be forecasting a final margin of -5 points, or the away team to win by 5 points. I.e. \( \ell + \mu (1-t) = -15 + 20\times (1-0.5) = -5 \)
The denominator represents the variance in the margin for the remainder of the match (the denominator is actually the standard deviation though). So in the example above, if we estimate the standard deviation, \( \sigma \), to be 40 for the whole match, we'd expect it to be \( 40 \sqrt{1-0.5} = 28 \) for the remaining half. This gives the home team a chance of winning of \( \Phi ( \frac{-5}{28} ) = 43\% \).
There's some really nice things about this model. As \( t \to 1 \), the relative ability of the teams becomes less important and the current lead becomes the major factor in which team wins (which makes sense).
Because it's a nice simple equation we can also differentiate it to understand what the sensitivity to win probability is when the lead changes or as time passes. I've also pondered whether you could model in game injuries as a change in \( \mu \) or whether a storm blowing through may bring about a change in \( \mu \) and/or \( \sigma \).
But now you're probably wondering if this is any good? Bearing in mind I trained my ELO model to predict the result at the start of a match and did no tuning of parameters to maximise predictive power in game, I think it turns out pretty good.
Below I've plotted the calibration of the model at different points in a match. The x axis shows the predicted home team win probability, and the y axis shows the actual home team win probability given the predicted probability. A perfectly calibrated model would have dots lying on the calibration line, i.e. when the model predicts a team is a X% chance to win, they actually win X% of matches.
It's pretty good through a match up until about 95% of the game has been played. From this point on I see things like teams winning at less than expected rates when they're expected to lose and winning at higher than expected rates when they're expected to win.
This probably comes back to how the assumption of the margin being normally distributed breaks down when there is less than 15% of the match remaining.
Also, as my model doesn't take into consideration any other context/strategic moves (i.e. extra men behind the ball, taking time off the clock when in the lead and in possession, who has possession, where the ball is in the field of play, any injuries), I suspect after 85% of the match has passed and the game is still in the balance, this is when these factors play a big role in determining the result.
This suggests the decay in win probability towards the end of games is actually faster than my model suggests. I may look at tweaking the decay rate of variance to be faster to account for this, but for now I'm going to roll with this model in quantifying match context.
Now that I have a model to quantify in game win probability, I wanted to use it to give context around how good a game each match was.
I talk more about the results of these metrics in my
first post, but to calculate the excitement, tension and surprise index I did the following:
I segmented each match into one data point per 1% of match time (plus an extra 12 data points corresponding to extra time in the one game with extra time) plus a data point for each score. I then calculated the following at each point:
- Win probability at each data point.
- Change in win probability since previous data point.
- Derivative of win probability with respect to lead at each data point.
I could then use these to quantify match excitement, tension and surprisal:
- Excitement: This represents the total change in win probability observed throughout a match. High scoring during close games makes the win probability jump around a lot so these games feature heavily when looking at the top games.
- Tension: The derivative of win probability with respect to lead represents the average change in win probability when the lead changes by 1 point either way. This is a great measure of tension as it places high weight on close margins at the end of games. To calculate the match tension I average the tension at each point across the whole game and multiply by 6. This represents the average worth of a goal in terms of win probability.
Because my model tends to rate matches closer than what they are towards the end of games, this is probably an overestimation of tension, but for comparison between matches I think it does a good job.
- Surprisal. This represents how much probability had to be overcome to reach the final result. If a team was rated a 1% chance of winning at it's lowest point and came back to win, we'd say they overcame 99% of probability to win and so the Surprisal Index would be 99.
Similarly if the favourite was rated a 90% chance to win, got off to a great start and won without being challenged at all we'd give the match a surprise index of 10.
Because I ignore the fact that draws are possible, this means they overcome 100% of probability to occur and so draws have a surprise index of 100.
If you're wondering what the formula I used to calculate tension at a point of a match is, it is:
\[ \text{Tension} (\ell, t; \mu, \sigma ) = 6\frac{\partial P}{\partial \ell } (\ell, t; \mu, \sigma ) = \frac{6}{\sigma \sqrt{2 \pi (1-t)}}{\exp{ \Biggl( - \frac{(\ell +\mu (1-t))^2}{\sigma ^2 (1-t)}} \Biggr) } \]
As an example, for evenly matched teams playing each other in a close match (say the margin is 0), the expression for tension becomes:
\[ \text{Tension} (\ell=0, t; \mu=0, \sigma ) = \frac{6}{\sigma \sqrt{2 \pi (1-t)}}\]
which does become very large as \( t \to 1 \).
Anyway that's it for now, I'm working on an app that will let you explore all the matches I've calculated excitement, tension and surprise for and look at the win probability graphs so keep an eye out for that!