ODI batting is just not Test cricket… it’s even better

The men’s ICC World Test championship final is scheduled to take place this month between India and New Zealand.

This upcoming marquee match prompted me to dig into an unusual pattern in men’s international batting that has emerged in recent years. ODI batting averages have surpassed Test batting averages for the first time in their 50-year coexistence.

That shouldn’t happen. Read on for why I think so.

Diverging formats

While keeping an eye on international cricket series in various formats over the past five to ten years, I started to notice the emergence of an something unusual.

Watching many matches during 2015 Men’s Cricket World Cup in Australia, I was astounded at the jump in what seemed to be par scores in the 50-over format. Only a decade or so earlier, I remembered that a score of 220 or so would often have been considered ‘par’ on Australian pitches. But by 2015, it felt like teams were putting on close to half of that for fun in the final ten overs alone.

Since that tournament, it has felt like that trajectory has only continued in one day cricket, riding in the wake of the ultra-aggressive T20 cricket, with its hit-out-or-get-out approach.

Over the same period, my perception of the scoring ability in Test cricket has wavered. Maybe it was my Australian bias showing, but certainly in this country, where innings previously seemed to often reach above 400 and regularly 500, now both host and visitors were scrapping out Test innings scores of 200s and 300s.

This niggling came to a head when keeping an eye on the recent Indian tour of Australia over the summer of 2020/21.

All six of the lead-up white-ball cricket matches finished with higher averages (runs per wicket) than the 1st Test match, of course at batting strike rates (runs per over) far in excess of the red-ball format.

Then, earlier this year, my interest was piqued enough following two international cricket matches completed on different continents, concluding on the same day:

The Test match – the format with 540 overs on offer, and 40 wickets to take – was over in under two days, with only 387 runs scored in total. Meanwhile, the T20I match – the format with 40 overs on offer, and 20 wickets to take – put on 434 runs in just an afternoon.

What was happening here?

Batting theory 101

I have to state the obvious – a team wins a cricket match when it scores more runs than its opponents.

A rational batting team in a cricket match is therefore looking to optimise its score in runs, controlling for two resources:

  1. The wickets it has to lose, and
  2. The number of overs it has available to bat

(This true in all cricket formats, however there are additional wrinkles in long-format cricket, where to win the match the team’s opponents must close both innings twice, thus the team must also leave enough time to take 20 wickets.)

In general, overs available to bat are plentiful in comparison to wickets remaining. As such, wickets are treated as somewhat to very precious. This balance starts to tilt the other way when there are considerably fewer overs on offer to bat – now wickets are considered more disposable as more balls must looked to be scored from.

All standard cricket formats include 10 wickets per innings, with the amount of overs to bat varied. Overs are therefore relatively more scarce in shorter cricket formats, and wickets are therefore relatively more scarce in longer cricket formats.

As such, we would expect (and we have historically seen):

  • Shorter formats (i.e. ODI cricket) to feature higher batting strike rates (runs per over) and lower batting averages (runs per wicket)
  • Longer cricket formats (i.e. Test cricket) to feature lower batting strike rates and higher averages

Theoretically, there is a trade-off between the risk (batting strike rate) and the reward (batting average). Empirically, we have seen this play out over half a century. But data from recent years indicate this trade-off is being turned upside down…

The data and analysis

To investigate if there was anything to my hunch, I pulled the data from all innings across men’s Test, ODI and T20I matches from ESPN CricInfo StatsGuru. I am only going to look at Test matches since 1971 for comparison, since this was the first year ODI cricket was first played.

I also chose to limit the competing teams to the eight top-ranked men’s Test-playing nations, to both keep the team samples comparable across all formats, and to eliminate any skewed results from major nations overpowering cricket minnows in shorter-form cricket.

I calculated the three standard cricket KPIs common to all followers at an aggregate level by year:

  1. Batting strike rate (runs scored per 100 balls)
  2. Bowling strike rate (balls per wicket taken)
  3. Batting average (runs scored per wicket) (this is also equivalent to bowling average at the aggregate level)

An inverted batting market

For the first time in the 50-year history of the coexistence of men’s Test and ODI cricket formats, the year-on-year batting average curves have overlapped. Batting sides in the ODI format are consistently scoring more runs per wicket than their Test compatriots even though they are also making their runs much more quickly.

The aggregate ODI batting average (runs scored per wicket lost) is now consistently above that in Test cricket.

You can see Test cricket batting averages were relatively flat until the mid-1990s, but perhaps off the back of the momentum and philosophy of ODI cricket, surged by more than 5 runs per wicket into the mid-to-late 2000s. In the last decade particularly, Test batsmen have had a much more difficult time of it with averages plummeting to easily the lowest in the past half century.

ODI cricket has featured a steadily increasing batting average since its 1971 inception. Indeed, since about 2010 this average is starting to run away further from a relatively consistent trend, with the 2019 (37 runs per wicket) and 2020 batting averages (39) easily the best on record.

Also note that the much-narrower T20I trend is also steadily increasing with its peak also in 2020 (29 runs per wicket). In fact, this was under a run per wicket less than the Test average!

It’s not that ODI batting averages have just peeked about Test batting averages. It’s that they have consistently done so for over five years, and they are actually pulling away further!

So how is this the case? How is ODI cricket having its cake (high batting strike rates) and eating it too (high batting averages)?

A deeper dive

Although I indicated earlier the batting philosophy is finding the optimal balance between scoring rates and against number of runs for the format, this hides the true trade-off. The fundamental balancing act is between the scoring rate (batting strike rate) versus the number of balls faced (bowling strike rate). Increase one, you should expect to reduce the other. Typically what you want to optimise on is the number of runs, which is actually the product of the two.

Although the denominators are a little funny, our three metrics can be linked together via the following expression:

Batting average = Batting strike rate (/100) × Bowling strike rate

or,

[Runs / Wicket] = [Runs / 100 balls] × [Balls / Wicket]

Let’s consider these fundamental variables individually to search for a better explanation.

1. Batting strike rates

You can see below that batting strike rates have generally increased over time across all formats. What is interesting to note is the diverging trend lines during the 2010s. Where batting strike rates in both T20I and ODI cricket have surged by over 10 per cent in that time, the potency of batting in Test cricket has tapered off slightly.

Batting strike rates (runs scored per 100 balls) have generally increased over time across all formats.

While the commentary of the mid-90s to the mid-00s often attributed the increasing attacking nature of Test batting to skills and philosophies honed in ODI cricket, it appears this link may have broken over the past decade.

2. Bowling strike rates

You can also see Test cricket’s bowling strike rates have fallen consistently over the past half-century. This indicates that the level of bowling has improved relative to batting defences. Indeed, prior to the 1970s, the data show it was fairly common for strike rates of one wicket per more than 80 deliveries. In recent years, wickets are tending to fall for less than 60 deliveries, once again with the chart showing a particular drop-off in the past decade.


Bowling strike rates (balls per wicket) have consistently fallen in Test cricket over the past 50 years, but are relatively flat in ODI and T20I cricket.

When it comes to short form cricket, the above chart also shows that bowling strike rates have been noticeably flat for a long time. If anything, bowling strike rates have slightly increased on ODI and T20I cricket in the past five or so years, indicating bowlers are finding it harder to take wickets.

Bringing it all together

Using our equation from earlier, we can summarise the general trends by 2020 compared to the respective baseline levels from about a decade earlier:

Batting strike rateBowling strike rateBatting average
TestSlightly lowerLowerSignificantly lower
ODIHigherSlightly higherSignificantly higher
T20IHigher Slightly higher Significantly higher

This is where we find some answers. If we consider batting philosophy as a trade-off between risk and return, we find that Test batting is losing ground on both counts. Rather than balancing the speed of runs scored with the number of balls faced, Test batsmen are scoring at slower rates at the same time as facing fewer balls in the process.

All the while, ODI and T20I formats are also paying no credence to the trade-off, winning on both counts. Batsmen in these shorter formats are scoring at faster rates and at the same time facing more balls in the process.

On face value, this doesn’t make much sense. Yes, a global, systemic factor across formats could explain one of these trends. For example, technology producing higher quality cricket bats could suggest why bat is becoming more dominant over the ball. But here we have contrasting patterns. Why is bat winning over ball in two formats, and ball is winning over bat in the other?

Across these data, the teams are the same, the grounds are (mostly) the same, the players are pulled from the same talent pool (and are often the same)… so what could explain the difference?

Possible explanations

If everything else was equal (players, teams, grounds, pitches) – and all players and teams were attempting to play optimally at all times – I can’t see a logical reason for this trend to occur (although I may be wrong, and feel free to correct me if I am!). As such, my hypothesis is that one or a number of structural divergences between formats that have widened over the past decade that may go some way to explaining the trend.

1. Diverging match and pitch conditions

A fundamental factor underlying all cricket scoring is the match conditions, and the biggest one that could be systematically controlled over time is the quality of the pitch.

My first hypothesis is that it is likely that the quality of cricket pitches to bat on have improved moreso in short format cricket than in Test cricket over the past decade.

ODI cricket found its position under attack from both flanks towards the back end of the first decade of the 2000s,. Test cricket roared to new life with a rekindled Ashes rivalry and a new India-Australia rivalry. From the other flank, T20 cricket marketed itself as the ‘best bits of ODI’ cricket without the dawdling middle overs. So how could cricket administrators keep selling tickets and broadcast rights for the once-most popular format? Keep the boundaries flowing in ODI matches, as close to T20 pace as possible for 100 overs. That would require batting-enticing pitches.

Following the same logic, it wouldn’t surprise me that as a response the batting fireworks produced by ODI and T20I cricket, executives steered Test cricket away from being merely the vanilla cousin, by positioning it as the unique format to view the traditional battle between bat and ball.

It is plausible that the art of curating cricket pitches is now completely distinct per format, and that Test pitches are attempting to the bowlers back as significant performers on the main stage, while ODI and T20I seek flatter and flatter roads.

This would also explain the relative dominance of the ball in Test cricket and the relative dominance of the bat in ODI and T20I cricket over the same period.

2. Diverging talent pool per format

Another possible contributing factor could be the elite cricket talent pool diverging or concentrating into discrete pools per format.

We are now effectively a whole generation into the Twenty20 era, where franchise-based domestic T20 tournaments provide lucrative professional cricket pathways for a player pool an order of magnitude larger than even the previous generation.

Where Gen Y talent grew up seeing limited international cricket as the only true professional pathway, Gen Z have grown up knowing of the opportunities on the T20 circuit. Perhaps these incentives, alongside shifting schedules towards shorter formats in the junior ranks, are shifting techniques to favour dare over grit. This would align with the ongoing and continued improvements of batting trends in the ODI and T20I formats.

Perhaps this has also resulted in a reduced depth of the talent pool of batsmen able to bat through a day of Test cricket. You may then argue why are we not seeing the run rate improving at Test level? Well perhaps the talent pool has diverged so much, that those left over with enough of a technique to ‘bat time’ don’t also have the gears to go along at the same clip as the previous generation. And if they once did, their techniques may have played second fiddle to their ability to six-hit at will.

I have no evidence for this hypothesis, nor do I necessarily think it is likely. But I do think it could partly explain the data.

Improving the analysis

While enlightening and potentially uncovering at a legitimate signal, I note this analysis is shallow and could be improved by controlling for a number of factors and assumptions:

  1. Quality of participating batsmen per innings – Not all wickets are lost equally, with the top six or seven batsman expected to produce both higher averages and strike rates than the tail. As the data have been pulled at innings-level (rather than at a batting position- or partnership- level), I am unable to control for the number of wickets lost per innings. For example, in a declared Test innings, or an innings which ends prematurely when a target score is achieved (with only recognised batsmen featuring), this would skew results to higher averages and strike rates. This could be controlled if the data were available at batting position- or partnership-levels.
  2. Frequency of team participation and matchups – Although I have selected the same standard list of the top eight men’s Test batting nations for consistency, I have not controlled for the number of matches played by each team in each format in the sample. For example, there is likely to be skews towards more or less dominant countries or matchups within the sample. Or perhaps England have played a higher proportion of Test matches but India has played a relatively higher proportion of T20 internationals.
  3. Frequency of venue use – A little like the frequency of team participation, my hypothesis here is that different venues are more or less suited to batting. I have not controlled for the number of matches played at certain venues. It is very possible that scheduling patterns means that more shorter-format matches are being played on traditionally batting friendly grounds (better pitches, shorter boundaries) while Test matches are being retained at other venues.

Feedback

I have followed cricket for almost thirty years, but this is my first ever cricket post and I may gain a few first-time readers. By no means is this post meant to be authoritative, but explorative. If I have missed something, made an error, or if you have any suggestions or ideas, please feel free to comment below, shoot me a direct message or hit me up on Twitter.

A concise data-driven approach to predicting the 2020 Australian Football Hall of Fame inductees

This morning I read on Twitter that the AFL decided to hold its Hall of Fame over spread the event over four nights beginning tonight (1 June).

This move was made presumably to solve for both the lack of large events that are able to be currently held, and to help fill the footy void prior to the AFL season resuming on 11 June.

I had been keeping one eye on new plans for the Hall of Fame night but this news had somehow escaped me. I had planned to do deep-dive and expand on the analysis I first put together ahead of last year’s ceremony

If interested, you can read more about the context, the data, the statistical approach and some of the outcomes in that article.

I have thrown together an article in fits and spurts today to put some ideas on the table prior to the first inductions tonight.

Updating for 2019 inductions

Last year I used a combination of the data-driven player likelihood of induction, plus recent trends in types of selections in the previous few years, to make some ‘predictions’ (and I say that loosely given the very small sample size). I wrote: “I expect this year that the selection committee will once again focus on South Australia and induct two Croweaters, alongside two-three modern-era AFL players and possibly another dark horse.” 

My ‘predictions’ were:

  • Tom Leahy
  • Jim Deane
  • Tony McGuiness
  • Kelvin Templeton
  • Simon Black
  • Alastair Lynch

Pleasingly, I landed one Croweater, as Jim Deane (a name even most die-hard footy fans have probably never heard of) was inducted. Deane played 11 seasons for South Adelaide and two at Richmond, winning the Margarey Medal twice and South Adelaide’s best and fairest on six occasions. On the data-driven ratings (which are based on historical selector preferences, not any sense of objectivity), he was the top choice from all historical SANFL players.

On the face of it, Deane was my only direct ‘hit’ in a shallow year for inducted players (only four). The only other players inducted were Trevor Barker, Brad Hardie and Ken Hunter. The data had Hardie as the fourth ‘most likely’ from VFL players in the 1960s-80s era to be inducted, but I did not expect that the selectors would –  once again – go back to the 1970s and 1980s to induct more players out of the VFL/AFL. Since 2011, there had been a promising trend away from that cohort. If you have read ‘Footballistics’ you will understand that there exists already a vast overrepresentation of players from that demographic. 

Complementing their selections were Ron Evans (as an adminstrator) and Michael Malthouse (as a coach).

At the time, I couldn’t understand how Simon Black was not inducted. Black had become eligible that season, marking five years after his retirement. The selectors had shown an ongoing preference to induct ‘gun’ players the first year they became eligible, and the data (as well as recent memories) were very hot on him.

Turns out Black “had been voted in by the Hall of Fame committee last year” but “was unable to attend the function due to his filming of Australian Survivor overseas”. Perhaps that was not public knowledge at the time as the reality show had not been aired, but I remember his omission (and only four players inducted) left me confused.

For statistical purposes, I am going to consider Black ‘inducted’ in 2019. It also makes sense to update the data to take him of the prospective pool for analysis for this year’s selection.

There was a stronger lean back towards the VFL/AFL (pre-1990) last year than I had anticipated

I’ll give myself two ticks out of six for last year’s predictions.

  • Tom Leahy
  • Jim Deane (correct)
  • Tony McGuiness
  • Kelvin Templeton
  • Simon Black (considered correct)
  • Alastair Lynch

Updating the data for 2020

Updating the data for analysis required a few tweaks:

  1. Update the inducted players from 2019 with a new status
  2. Remove the inducted coaches and administrators from the potential pool (as they are assumed ineligible for induction as players, even though some have great playing records)
  3. Re-run the model to include the ‘class of 2013’ retirees and the inducted cohort of 2019 cohort
  4. Re-run the predictions on the new pool, including the ‘class of 2014’ retirees who are eligible for the first time 

I updated the chart I ran last year to show the relationship between various standard achievements across the leagues and how they contribute to the chances of induction.

Each dot represents an individual player – how many of that achievement they recorded (a count on the horizontal axis) and whether or not they were inducted (bottom means not inducted, top means inducted).

On top of the aggregate set of points, a smoothed line (logistic curve) which best fits all points for every player in each particular league. As all of these achievements indicate success, we would expect to see the average line of all players sloping up from bottom left to top right – which is exactly what we see. The slopes and points at which the lines tilt up differ, and here is where we see differences in how the selection committee has historically not deemed achievements in some leagues as worthy as others.

The likelihood of induction of Australian Football Hall of Fame candidate players, by league and playing era

Marked by the blue curves rising almost exclusively faster than the red and yellow curves, the VFL/AFL pre- and post-1990 honours appear to be considered far more worthy in the eye of selectors than those in the SANFL or WAFL. In other words, it takes a much more glittery CV in those latter leagues to have the same chances of induction as in the VFL/AFL, even during the state-based eras.

Eyeing off the chart, it appears that a player with one Brownlow medal is more likely to be inducted than a player with two Magarey or Sandover medals. Or that players with five club best and fairest awards in South Australia and Western Australia are deemed to have similar chances to those with just two in Victoria or in the national era.

I interpret these results with the hypothesis that the Hall of Fame selection committee naively assesses SANFL and WAFL players, with the shortlist only coming from those players with absolutely stand-out CVs in some of the most prestigious award categories. It seems that some statistics, such as games played or premierships, are not considered whatsoever. This explains why there are a whole host of Western and South Australian footballers with large tallies of games played and premierships won who have not been inducted. 

Predictions for 2020

There have been few changes to the below visual from 2019, save from the inducted players dropping off and Jonathan Brown appearing right at the top of the list of recent players.

An article on the AFL website notes that “the likes of Luke Ball, Jonathan Brown, Dean Cox, Darren Glass, Lenny Hayes, Ryan O’Keefe and Ben Rutten are in contention for the first time” after retiring in 2014. This list of players, with Brown and perhaps Cox as exceptions, do not have CVs that tend to be acknowledged by the committee. Ben Cousins has been eligible for five years now, but given his current circumstances I expect him to be overlooked again. 

The modelled likelihood of future Australian Football Hall of Fame induction of candidate players, by league and playing era (chart updated 3 June to fix a data error)

A Fox Footy article last week notes “each inductee [will take] part in a long-form interview to air alongside a career highlights package”. Given there is far more air time to fill in this made-for-television ‘event’ over four nights, and the expectation of in-person interviews and lots of footage, sadly once again I don’t expect the long list of “neglected heroes” to be honoured this week. 

Therefore my prediction have an ultra-modern focus, with some names from last year popping in again to round out the selections:

  • Simon Black (confirmed and as predicted in 2019)
  • Jonathan Brown (first year eligible)
  • Alastair Lynch (making it a real Brisbane premiership flavour)
  • Paul Couch (posthumously)
  • Kelvin Templeton
  • Tony McGuiness
  • Don Lindner/Steve Malaxos (one token ‘interstate’ player)

Using data to predict the 2019 Australian Football Hall of Fame inductees

Embed from Getty Images

Mel Whinnen giving his acceptance speech at the 2018 ceremony

Tonight (Tuesday 4 June), the Australian Football Hall of Fame will honour its next batch of players, coaches, administrators and media performers with induction. Each year in the lead up to the ceremony there is often a range of speculation from various media types on which ex-players will be in the latest batch to be inducted. The Victorian writers tend to focus on the newly eligible candidates from the recent pool of AFL retirees, while the focus interstate is often to wonder when the ledger will be tipped a little back in their favour to honour some of their long-overlooked football legends.

The Australian Football Hall of Fame was established in 1996 and “seeks to recognise and enshrine players, coaches, umpires, administrators and media representatives who have made significant contributions to Australian Football – at any level – since the game’s inception in 1858”. In total, 136 individuals were inducted into the Hall of Fame in the initial intake of 1996 and a further 121 have been added in the 22 years since. Players make up the bulk of the intake (coaches, administrators, umpires and media personalities also have a presence), with 202 inducted off the back of their playing career and another 28 which have since been elevated to Legend status. This post will focus on players, as the cohort with both the largest sample and most easily accessible records and playing honours.

It is important to note that the Australian Football committee “considers candidates from all parts of Australia and from all competitions within Australia” rather than merely the sole elite-level competition today, the AFL. Newer or younger enthusiasts of the sport may not realise that indeed the current pinnacle of the game is a somewhat recent change within the history of the structure of the sport at all levels, with state leagues dominating the landscape for over a century until the late 1980s. The three states which have always played the most prominent role founded formal football associations well back into the 19th century, namely Victoria, South Australia and Western Australia.

For a number of years, football historians have bemoaned the seemingly inequitable induction rates into the Hall, specifically favouring both playing careers in Victoria (over the other state leagues) through the VFL/AFL, and those that tended to remain in the memories of selectors (from the 1960s onwards) at the expense of those prior to the television era.

The challenge

Woven through a lovely narrative of the careers and legacy of South Australian champion footballers Sampson ‘Shine’ Hosking and Tom Leahy, one chapter of the 2018 book ‘Footballistics’ analysed the skew across states and seasons of the inductees into the Hall of Fame. It investigated the dis-proportional induction rates in player groups by different eras and in different competitions. It also looked at some of these trends have evolved over time since the inaugural Hall of Fame induction in 1996. Neither Hosking nor Leahy have been inducted, despite both objective and anecdotal evidence to their favour, whilst vast numbers of particularly VFL players from the 1960s-1980s have been inducted ahead of them.

What I also wanted to do was understand which career achievements may be meaningful in the eyes of the committee in order to gain induction. This was not an easy task, as there are no minimum achievements required to reach eligibility and the committee “considers candidates on the basis of record, ability, integrity, sportsmanship and character. How could one even begin to objectively or quantitatively measure many of these attributes, such as “ability” or “character”? These are hard enough to even hypothetically consider a relevant metric, never mind go about finding an available data set where these metrics might exist for elite Australian Football! The one attribute that, of course, can be at least partly assessed is the playing record of individuals. The short biographical sketches written about each inductees provided on Hall of Fame website give us a good starting point.  They typically include statistics such as career span, total games and goals, league and club best and fairest awards, league and club leading goal kicker awards, league and club teams of the century, All-Australian selections, state appearances, grand final best on ground awards and years captained.

Given (to my knowledge) a lack of a comprehensive and complete database across the elite levels of Australian football, to research for ‘Footballistics’ I spent a lot of time collating (through best efforts) an aggregate data set which summarised player-level data spanning leagues, including attributes such as seasons played, games and goals, and various honours and achievements. This process was fairly labour intensive and detailed, so I’ll spare the details until the bottom of the chapter. It is fair to say that the some of the numbers in the data set used this analysis range greatly from the precise to the approximate, such is the time scale of the players’ careers assessed and evolving nature of competitions over more than a century – as well as a large number of ‘fuzzy’ joins on my end which also require a leap of faith. Accordingly, all values should be read as estimates only, providing a good overview and understanding of patterns but not as ‘ground truth’.

I ended up with a relatively broad data set of Australian footballers over time, including their career games and goals at top-league clubs, the seasons they played, competition-wide honours such as league best and fairests, league goal kicking awards and number of premierships, and team-based achievements such as club best and fairests, club goal kicking awards, and club captaincies. All of these achievements were summarised per player and split out over four major league categories, namely the VFL/AFL (until 1989), the SANFL (until 1990 prior to Adelaide joining the AFL), the WAFL (until 1986 before West Coast joined the AFL) and the VFL/AFL (the national era since 1990). I was also able to include representative selection, specifically All-Australian selection in both the Carnival and national eras. One piece I would have loved to include (and I think would have been particularly explanatory) was number of intra-state league and state-of-origin matches, however I couldn’t find a comprehensive data set with this information.

For simplicity, I chose to concentrate my efforts on the playing careers of inductees only. All inductees referred to in this section have been inducted as ‘Players’ and/or ‘Legends’ (unless otherwise specified) and all games referred to are as players (rather than coaches), as this by far comprises the biggest sample size of inductees for analysis purposes. The Hall of Fame inducts individuals who participated in more than one role still in a single discrete category. For example, John Cahill is listed as ‘Coach’, even with very respectable playing careers, while five-time VFL premiership coach Jack Worrall is listed as a ‘Player’! Perhaps erroneously (a theme…), the website has Western Australia’s Jack Sheedy as the only individual listed on both the ‘Player’ and ‘Coach’ pages.

The approach

With this data set, I was able to fit a number of classical and machine learning models to ‘reverse engineer’ a data-focused criteria to induction into the Hall of Fame. In the end I settled on the best performing logistic regression model, as it was philosophically suitable to model a binary induction status of players based on their the aggregation of their achievements, performed well with many of the input variables statistically significant, and was more explainable to a pleb like myself.

Hall of Fame induction status and its relationship to a range of various playing honours and achievements, by league

The only set of attributes I dropped were the club leading goal kicking awards for players across all four league categories, as they showed no explanatory power. I would assume this is as leading a club’s goal kicking can be done by some relatively ordinary players in ordinary sides, and total career goals is a much better indicate of strong forwards who are deemed worthy. Some of the other attributes exhibited statistical significance across two or three of the league categories, and in that case I left the entire set in the model for consistency.

I also only considered at players that commenced their careers after 1897 (to standardise the pool across the leagues), and stripped out the inducted Hall of Fame ‘Coaches’ entirely from the training data set. I didn’t want to muddy the data set by having their playing careers not matched to a Hall of Fame ‘Player’ induction, nor include them as inducted players either.

With the combination of the sum of all these playing achievements and attributes, I was able to generate a propensity for all yet-to-be inducted footballers. A rating towards 1 suggests the player is very likely to be inducted, while a rating closer to 0 suggests the player has little chance under the current criteria. For example, highly-decorated and newly-eligible champion Simon Black topped the board with a modelled rating of 0.97.

It is also important to note that what this model does not do is suggest who ‘should’ and ‘should not’ be inducted. It merely tells us what factors may have been significant contributors to the induction of players in the eyes of Hall of Fame committees of the past, and gives us a rating or propensity for similar-type players to be inducted given the committee’s selection history. For this reason, the model does not try to explicitly account for any skews towards the VFL or the 1950s to 1970s – in fact, by applying it to the careers of all players not yet inducted, it will tend to favour the same type of players. You could say that you only get out what you put in. 

As such, it’s hard to compare like-for-like as the players with any history in the VFL/AFL and modern era tend to outshine all others numerically. Instead, I have forced its hand and broken out the top five results for a combination of league and era (two-decade increments). Players were allocated to both the league and the era in which they played the highest proportion of matches in their careers. This way, we can compare ‘like with like’ and at least understand which yet-to-be-inducted players our data suggests should be closer to Hall of Fame worthiness.

The modelled induction chances of prospective Australian Football Hall of Fame candidate players, by league and playing era

The skew towards the VFL/AFL and the more modern decades is pretty stark, with much higher top-five ratings in those selections.

Our South Australian pioneers ‘Shine’ Hosking and Tom Leahy pleasingly sit top two in the first two decades of the 20th century, while Jim Deane (two-time Margarey Medal winner and six-time South Adelaide best and fairest) and Don Lindner (Margarey Medallist and three-time North Ad(Sandover Medallist, two-time West Perth best and fairest and three-time WAFL premiership player)elaide best and fairest) lead the way for the Croweaters in other eras.

Over to the west, and Hugh ‘Bonny’ Campbell (four-time WAFL premiership player and once kicked 23 goals in state game), Ted Flemming (Sandover Medallist, two-time West Perth best and fairest and three-time WAFL premiership player) and Steve Malaxos (Sandover Medallist and inaugural West Coast best and fairest) seem to be the most likely candidates in their respective eras from the WAFL. Prior to the 2018 induction ceremony, there were two more names at the top of this list who are now Hall of Famers. An earlier iteration of my model placed Bernie Naylor and Mel Whinnen within the top three selections for WAFL players at this time last year, which spurred me on to consider this analysis for 2019.

A 2018 tweet showing the then-top-ranked WAFL candidates heading into that year’s Hall of Fame ceremony

The VFL/AFL has had a stack of players inducted each year, and a case could be made that some ‘very good’ players have been a little lucky (if that is possible) to receive the honour. One name missing that has jumped out me for years is that of Kelvin Templeton. He is one of just five players to have won both the Brownlow and Coleman Medals (two), and remains the only such player to not be honoured with an induction. Combine that record with two Footscray best and fairest awards and five leading goal kicker awards and surely his name must be floating near the top. Tony McGuiness doesn’t quite fit properly into the ‘1960s-1980s’ VFL era as he only played 87 of his career 335 matches in that dimension. I’ve decided to place him there as he played 60% of his career games in the 1980s, and played 66% of his games in the VFL/AFL – his career wasn’t largely in the SANFL in the 1980s, nor largely in the 1990s AFL either.

In previous generations, Bill Cubbins (one of the premier full-backs of his era and four-time St Kilda best and fairest) and Alby Morrison (five-time Footscray leading goal kicker and two-time club best and fairest winner) are rated highly in compariso to their peers.

In the national era of the AFL, Simon Black is eligible for the first time in 2019 and outshines all with a hat-trick of premierships at Brisbane, Brownlow and North Smith medals and three All-Australian guernseys. 

The model is calling out some features it sees as important to have on the CV of a Hall of Fame footballer, but we must remember that it is kept in the dark from so many other features that football fans and historians could call out in an instant. How can it factor in the individual ability of Gary Ablett Senior, the courage of Francis Bourke or the defensive resolve of Vic Thorp, when they only shared four club best and fairests between the three of them? As such, perhaps it’s not surprising given its limitations – and the limited samples of inductees from South Australia and Western Australia – that there may be some notable omissions from the highly-rated candidate list. It may only take some further ‘squaring up’ by the committee in future years to help recalibrate the model and smooth the results.

It is important to understand in this analysis that our model – and indeed, all analytical models, to various degrees – is useful for understanding but must be considered flawed. And in this case, it is heavily flawed on multiple levels, due to the question we are trying to answer and availability of structured information we have at our disposal with which we want to use to answer it. Before we even begin we have recognised we are unable to even measure most qualities considered by the committee and once we consider playing records only, we miss such anything qualitative or anecdotal and must be constrained to the objective list of achievements. But even then, few honours can easily be compared over many decades. For example, the current All-Australian selection system today recognises the best players by position across a season, however prior to the modern era it was selected from an inter-league pool based on performances at interstate carnivals. And then finally, it due to the structure of most awards and statistics, it is likely that certain types of players (particularly defenders) are likely to be statistically under-represented by any analysis, due to the lack of quantitative metrics that tend to relate to those who shut down, rather than create.

Predicting the 2019 inductees

Although not reflected on the official Hall of Fame website (the stale criteria outlined is now a number of iterations old), news articles in recent months have pointed to an expansion (to eight) of the number of possible inductees within a given year. I am foreseeing the first female to be inducted, along with perhaps another non-player (administrator, coach or media type) this year if the option is chosen.

That will leave five or six male players on the brink of induction into the Hall of Fame for 2019.

We have modelled our most likely player candidates for induction, however in recent years the selectors pleasingly are starting to lean a little towards a more representative mix of induction candidates, across both states and eras.

All inducted players following the inaugural intake, split by the amount of games played in major leagues

As discussed, last year there was a strong Sandgroper flavour to the ceremony with both Bernie Naylor and Mel Whinnen inducted. The year prior, South Australia’s John Halbert was honoured alongside ex-Collingwood but also VFA-legend Ron Todd. In 2016, there was again a decent lean outside Victoria with Paul Bagshaw representing South Australia, with Ray Sorrell and Maurice Rioli spending considerable chunks in the WAFL.

The selection committee have in recent years been pretty consistent with inducting two or three AFL-era players, including the immediately eligible Matthew Scarlett last season. The selectors also leant into the 1970s and 1980s with Terry Wallace and Wayne Johnston last season, however there are been few VFL-types in the seasons prior to 2018.

I expect this year that the selection committee will once again focus on South Australia and induct two Croweaters, alongside two-three modern-era AFL players and possibly another dark horse.

To put my money where my mouth is, bringing together everything the data has told us about the Australian Football Hall of Fame, my tips for 2019 induction are:

  • Tom Leahy
  • Jim Deane
  • Tony McGuiness
  • Kelvin Templeton
  • Simon Black
  • Alastair Lynch

Dedication

This post, and analysis, is dedicated to SANFL statistician and historian Mark Beswick, who passed away in April 2018. Mark was one of the first people I contacted when trying to hunt down footy data sets outside the VFL/AFL in 2017. 

Appendix: The data

As far as I am aware, there exists no comprehensive database or structured data sets of top flight Australian footballers, including clubs, tenure, games, goals, and league and club achievements and honours.

This meant that I needed to do my best to stitch one together myself. For the best ‘single view’ of all footballers over time and states, I took the view provided by the wonderful website AustralianFootball.com. Building on the work of footy history doyen (John Devaney / Full Points Footy), this provided the most comprehensive and consistent data set covering the VFL/AFL, SANFL, WAFL, VFA/VFL and indeed some other major leagues. Although I am aware of some missing players across the three major leagues (from Western Australia and South Australia), certainly this view contained a broad enough view that the majority of noteworthy players across the country (both inducted and otherwise) had been captured.

The human delineation of fluid history is always somewhat arbitrary, but effectively I wanted to pull out the three major state leagues prior to the national era, and then separate out the modern national competition from its state-based past. Therefore I summarised the league and club data (games and goals) into aggregates into various state- (or league-) based buckets:

  • VFL/AFL pre-1990 (“VFL” – the state era)
  • SAFA/SAFL/SANFL pre-1991 (“SANFL” – the state era)
  • WAFA/WANFL/WASFL/Westar Rules/WAFL pre-1987 (“WAFL” – the state era)
  • VFL/AFL post-1990 (“AFL” – the national era)

For consistency, I took players only from 1897 onwards, as this allows like-for-like comparison between the three-major leagues (and also most of the league and club honours are recorded after this point anyway). I also used this data set to derived out the career start (earlier season listed) and career end (latest season listed) for each player.

Next I had to join a varied selection of league and club honours and achievements. For availability and consistency purposes, I narrowed the chase down to the following:

  • League best and fairests
  • League leading goal kicker awards
  • League premierships
  • Club best and fairests
  • Club leading goal kicker awards
  • Club captains
  • Representative selections (i.e. All-Australian carnival and AFL All-Australian teams)

Most of the club and league honours were pulled league-by-league and team-by-team from various official and unofficial websites. The premiership counts were generously provided by Greg Wardell-Johnson, Ric Gauci and Steve Davies for the WAFL and Kyle Smith for the SANFL. The WAFL Footy Facts website was also very useful and clearly provides the best availability and accessibility of any league’s data outside of the VFL/AFL.

I wanted to chase down state league/state of origin games, however I was unable to find a comprehensive enough data set. This would be a good inclusion to the model going forward, as it would provide a deeper indication of the better players playing in and/or from each state at a given point in time.

Next required the arduous and tedious task of joining the achievements onto the player name details, which I did via ‘fuzzy’ joins with some supervision. There is no doubt that some of these joins will be incorrect, however for the most part (and definitely for all inductees), I can confirm the matches were sufficient for ‘good enough’ analysis purposes.

I also built out the data set of Hall of Fame inductees from the official website, which again was a little tedious as different induction years have their information structured in slightly different formats. With the same process, I created flags for the inducted players, including their induction year (and year induction as a Legend, if applicable).

For the purposes of our analysis, I set the eligibility to include players who commenced their careers from 1897, to create a common baseline to compare the new VFL against the SANFL and WAFL. The data includes all VFL/AFL players, all players and achievements in the SANFL up until 1990 (before Adelaide’s induction into the VFL/AFL in 1991) and all players in the WAFL up until 1986 (before West Coast’s induction into the VFL/AFL in 1987). As all inductees debuting after 1990 have played the vast (if not all) of their careers in the AFL, these exclusions attempt to capture all achievements from the three major leagues before the national modern era and only those records and honours in the AFL since that point. Further, the current eligibility criteria is for five years retired from the sport, so all players who were active from 2014 and onwards were likewise excluded.

I also stripped out the inducted Hall of Fame ‘Coaches’ entirely from the model training data set. I didn’t want to muddy the data set by having their playing careers not matched to a Hall of Fame ‘Player’ induction, nor include them as inducted players either.

Are the new rules ‘serving’ up another mode of footy in 2019?

Embed from Getty Images

Footy-Love

The new “6-6-6 rule” might have been a headline grabber in the early weeks of the 2019 AFL season, but what I was thinking more about leading into the 2019 season is more akin to a “6-4 6-4 6-4” scoreline.

That’s right, thinking about footy more like tennis.

Wait, what?

For a few years now I’ve noticed a few similarities between the team dynamic of football and individual dynamic of tennis. In both sports, teams/players exchange ‘plays’ and essentially the aim to transport the ball past their opponent/s to hit the scoreboard.

In Australian football these individual plays takes the form of possession chains, with teams trading successive sequences of possessions essentially up and down the ground until one is able to ‘pass’ the other’s defence and a score is recorded. In tennis, players trade successive shots up and back across the net until finally one is able ‘pass’ the other’s defence for a winner (or, enforce an error) to register a point.

alt text
Map of possession chains in Australian football | Map of shots in tennis
Sources: Figuring Footy: http://figuringfooty.com/2016/09/22/a-fresh-way-to-think-about-footy-gws-v-western-bulldogs-guest-post/
ESRI: https://www.esri.com/arcgis-blog/products/arcgis-desktop/analytics/using-arcgis-for-sports-analytics/

I also like to think of the strategies and strengths of footy teams in the style of tennis player types. Here I think of a team’s ‘first serve’ as its first play out of the centre bounce when it has won the clearance. Like on the court, in football not only can this be a point scorer on its own, but it can heavily dictate the rest of the play until the next score. Then a team’s ‘shots’ are their possession chains, in which they are successively trying to gain territorial advantage (‘court position’) in order to eventually overwhelm the opposition and score. Here may be some analogies of teams’ strategies with regard to commonly accepted types of tennis players:

  • The big first server – These teams have their biggest strength in the initial play from the restarts at the centre bounce, both in quantity (number of clearance wins relative to the opposition) and quality (how much damage they can do from them). Through their centre clearances they either serve a lot of ‘aces’ (centre clearance leading to direct scores), or they set up a lot of the rest of their play via putting their opposition heavily on the back foot through winning strong field position via a forward press.
  • The aggressive baseliner – These teams are less likely to worry about damaging the opposition with their first serve (centre clearance), but are really successful at grinding down their opponent around the ground (‘around the court’). With each possession chain typically a little more damaging than the next one coming back the other way, they are able to score ‘winners’ from all over the ground. They typically prevail by being a little bit better for a little bit longer.
  • The counter-puncher – These teams are strong in defence and are effective in soaking up what ‘shots’ their opponent is hitting their way. They wait until their opponent has found themselves out of position after unsuccessfully attempting a few ‘winners’, and are able to cause damage on the turnover. These teams generate a higher percentage of scores from defensive intercepts, where they exploit their out-of-position opposition to convert defence into attack and score quickly the other way.

First serve wins?

In recent years in the AFL, I think it’s fair to say that ‘court speed has slowed’ and as a result there are ‘longer rallies’ between scores. The consistent and persistent underlying conditions of play have been horde of midfielders surrounding the ball and one or two additional defenders outnumbering a couple of forwards a kick or so either side of the play.

One of my hypotheses heading into this season was that the “6-6-6 rule” (now to be known in this article as the ‘restart positions’ rule) was going to change the initial mode of play for some period of time following each restart. As restricted restart positions would, for the first time, significantly alter this status quo for at least some period after each restart, I was interested in both the first order and potential second order effects of these changes. I was expecting that the ‘first serve’ would become more significant as teams could either potentially score with more ‘aces’ or at least set up to be in more dangerous field position from the first possession chain following a centre clearance. I thought teams would be able to gain more ground from centre clearances (as there would be fewer players closing into the middle), both taking the opportunity to exploit the opposition backline and score directly as well as moving the ball as far away from their own equally-numbered defence, in turn finding scoring easier with quicker and cleaner entries.

Going into 2019, I was wondering to what extent the court would ‘speed up’ in the initial play following a centre bounce, counteracting the overall trend towards slower plays and longer rallies. I was interested in how far the needle would swing back towards favouring the ‘big first server’ and away from the ‘aggressive baseliner’ and/or ‘counter-puncher’.

The approach

There are some limitations in the publicly-available data in trying to address my hypothesis. The first is that play-by-play data is not available in any sense. Score source aggregates by game are available through some channels, but to my knowledge that requires manual collation. The other issue with that data is that it does not contain a temporal aspect so we cannot see the timeliness of the original starting position effects before players are once again able to more freely over the ground.

Instead, my approach was to look to score progression data from AFL Tables and infer the ability to score based on each phase following a centre bounce.

I defined a ‘phase’ as the time it takes to realise an event (goal, behind or siren) following a previous event (goal, behind, or start of the quarter). For each restart from the centre bounce (either at the beginning of a quarter or following a goal within a quarter), I considered what was the following event and how long it would take for that event to occur. Because the data is recorded in ‘count up’ format and doesn’t account for stoppages in play, I used a manual adjustment based on sight of the data (it turned out to be about 50 seconds) to line up the plays following a goal with those from the start of a quarter (as best as I could). This time more or less accounts for the time taken for television broadcasters to play their ads and the ball to be returned back to the middle (where there is a rare double-goal from an immediate free-kick this time is reduced accordingly).

I then aggregated all of these phases following centre bounces each season back to 2018 and looked at three metrics over game duration:

  1. The ability to score in a phase (how frequently are scores recorded?)
  2. The accuracy of scores in a phase (of those scores, how many are goals?)
  3. The scoreboard impact per phase (what are the average points scored?)

The data returns only a mini-break from the usual

1. The ability to score

My first hypothesis was that teams would find it easier to generate shots at goal (resulting in scores) for some period following the centre bounce, while positions are constrained. The data shows that, to date in 2019, any micro story in the uptick from the new ‘restart positions’ rule has been swamped by macro trend ongoing downward pressure on scores. There is some evidence that the rule has retained the ease of scoring from the previous decade for the first minute or so following each centre bounce, but after that there is quite a stark reduction in the ability to score this season (refer 2019 pane).

The average number of scores (per 15-second time interval across the first four minutes) following a restart at the centre bounce. The size of the circles refer to the number of phases in that interval. Note the high rate in 2008 and the gradual drop-off per year until the lowest values in the data set in 2019, countered only by a similar rate within the first minute of a centre clearance.

Within the first minute of game play following the restart, this year’s scoring ability tends to match the long-term trend, which suggests to me that the new rule has temporarily offset the increasing defensive prowess and structures of teams for around 60 seconds. After this time, the average number of scores per phase drops away well under the long-term trend.

Looking across the seasons, it’s interesting to note how much easier it was to generate scores in 2008 than it was in 2009 (where almost a goal per team per game was lost). Of course, we are only looking at a sub-set of all game play in this chart (within the first four minutes of play following each centre bounce), rather than all game play, but these trends should reflect game play and score dynamics following a clean restart (rather than, say, a kick out from one end of the ground).

There is always some risk with binning continuous data that it creates a signal that doesn’t otherwise exist. I played around with displaying this data in a number of different ways (including using smoothing methods) however they tended to have issues fitting to the first few data points (where there can often be a secondary stoppage and rarely a shot within the first few seconds) as well as the overall trend. I hope a set of 15-second intervals both displays the data simply enough to understand but doesn’t create signal where there is none. I think in this case it’s clear to see that until approximately one minute of game play, the 2019 scoring ability matches the long-term trend, whereas after this point it is consistently below the overall average.

2. The accuracy of scores

My second hypothesis was that teams would be able to generate cleaner entries from centre clearances, create shots from better positions and therefore improving conversion from goal kicking. The overall trend has the goal kicking conversion rate a little lower (50.5%) in the first minute than in all post-restart phases (overall 52.7%). The (admittedly a little noisy) data does support this hypothesis, with the goal kicking conversion rate above the long-term trend (52.6%) once again for about the first 60 seconds following a restart. After this point, the conversion rate hasn’t had quite the same drop off as the ability to create scores, however there has been a slight drop off this season (overall 51.4%) in that area too.

The average goal kicking conversion rate, or accuracy, (per 15-second time interval across the first four minutes) following a restart at the centre bounce. The size of the circles refer to the number of shots taken in that interval. Note the relative drop-off in 2019, other than in the first minute or so following a centre bounce, particularly as this period historically has been a little under the stable trend.

If you are interested in an in-depth discussion on factors affecting the reduction in accuracy in the modern game, refer to the ‘Footballistics’ chapter titled Goal Kicking Accuracy which addressed a number of these.

3. The scoreboard impact

My overall hypothesis was that through cleaner entries there would be greater ability to impact the scoreboard in the early moments after a centre bounce. The trend of average points per phase over game duration is effectively a combination of the average scores per phase and goal kicking accuracy.

Overall, there does seem to be some positive impact on the scoreboard within sixty seconds of the centre bounce, particularly in comparison to the overall slow-down of scoreboard impact more generally. The new ‘restart position’ rule looks to have retained a similar potency to the long-term modern trend, while the more macro evolution of the AFL has seen the unconstrained score dynamics dry up much further.

The average points scored (per 15-second time interval across the first four minutes) following a restart at the centre bounce. The size of the circles refer to the number of phases in that interval. Note the high rate in 2008 and the low rates across 2018 and then even moreso in 2019. This suggests the uptick in 2019 following a centre bounce may be significant.

The ace in the pack?

I must admit that during the preseason I had expected the ‘restart position’ rule may have had a bigger impact than what we’ve seen in the first six rounds and that it would have fallen out of this analysis. I thought that we may have seen similar to 2008-style scoring dynamic patterns for potentially 90-120 seconds following a centre bounce. Although this doesn’t seem to be the case, certainly one reason appears to be that any decent uptick in initial scoring conditions has been heavily diluted by the greater overall downward trend. I think I am confident that data shows the mode of football in the first minute post-centre bounce adheres to a different behaviour than the resulting game play from then on.

In summary, we are a quarter of the way through the 2019 AFL season and there is some evidence for a little extra dominance by teams on their ‘first serve’, however at the same times they ‘court has slowed’ significantly during the rallies. It may be slightly more advantageous to have a big first serve this year, but to be a successful side you would also need the ability to either wear your opponent down ‘from the baseline’ during gameplay or ‘counter-punch’ swiftly when the opportunity arises!

I have a few ideas for future analysis looking at this data set and analytical approach. It would be worth addressing the score dynamics at a team-level to see which clubs are stronger or weaker from the centre bounce or in general gameplay. Further, adding in phases following kickouts can add an extra dynamic and can potentially call out the counter-punch ability of sides. This way we could infer and categorise teams into different styles, perhaps equivalent to the tennis analogies I provided above.

I am very open to feedback on this or any of my analysis. If I have missed something, made an error, or if you have any suggestions or ideas, please feel free to comment below, shoot me a direct message or hit me up on Twitter.

All data provided for analysis on this page comes thanks to AFL Tables.

Buddy 900 – and benchmarking the greatest goal kickers in VFL/AFL history

Buddy and Plugger

Back in Round 17 against North Melbourne, Lance Franklin belted through his 900th AFL goal – somewhat appropriately from the forward flank outside the 50. It seems as though Buddy has kicked half his career tally from that spot, although that is probably not quite the case.

A total of 900 hundred goals is indeed a significant milestone, as Franklin became just the ninth player to reach the mark in 122 VFL/AFL seasons. But what is perhaps more telling that the all time league record sits more than another 50% higher again than this mark.

When Tony Lockett wobbled a drop punt through for goal against Collingwood at the SCG in 1999, he became the first man in VFL/AFL history to reach 1300 career goals. He would play on, and finish his career with 1360 goals from 281 senior games. Lockett is the most prolific goal kicker in 122 years of the VFL/AFL – but can we say he is the best ever?

The spearheads

Lockett surpassed a mark set by legendary Collingwood spearhead Gordon Coventry which had stood untouched for 62 years. A generation earlier, the Magpies boasted the league’s first great full-forward in Dick Lee who had debuted when the league record was only 144 career goals but by the time he retired he had pushed it way out to 707 goals. Following World War II, Essendon’s John Coleman kicked 537 goals in just 98 games before a serious knee injury prematurely ended his career at just 25. Likewise, two decades later an injured knee also interrupted the career of Peter Hudson however the Hawthorn superstar still averaged 5.6 goals per game in an golden era for full-forwards. Which takes us to the modern era, where Lance Franklin has kicked more goals than any other this century and now has over 900 goals in a period where team scoring has fallen to 50-year lows.

Embed from Getty ImagesLockett and Franklin have booted 2,277 VFL/AFL goals between them

It is hard to compare like-with-like on raw numbers alone, as over more than 120 seasons of the VFL/AFL scoring trends have evolved continuously like a living organism. Yes, no player has kicked more goals than Lockett, but Lockett played in an era where team scores were high and the full-forward thrived. In other eras, defence has been king or a team approach to scoring has been in vogue. In this post I look to benchmark the records of the VFL/AFL’s greatest goal kickers across multiple eras and propose, once and for all, the best we’ve seen.

Goal kicking trends

There are two major trends in scoring over 122 years of the VFL/AFL that have had major impacts on goal kicking tallies:

  1. The number of goals typically kicked in a game
  2. The typical spread of goals across players within a team

Scoring spreads

The ‘Footballistics’ chapter titled Goal Kicking Accuracy, a number of factors which have impacted conversion rates to varying extents over time were assessed. One such factor was the changing nature of the ‘scoring spread’ of teams over time, and this effect was two-fold:

  1. Players who tend to have more shots tend to be more accurate, and
  2. Over time, the proportion of shots/goals taken by the more predominant goal kickers for a team in a given game has tended to fall consistently over time

You can read much more about this and other trends which have impacted goal kicking conversion rates in that chapter.

It is not just the proportion of goals kicked within games that have changed. Even adjusting for the proportion of goals kicked by predominant forwards within a game, this doesn’t full account for changes as players have become more flexible and roles have become more blurred. The leading goal kickers of clubs within seasons are also more likely to have games where they are not one of the leading goal kickers for their team in a game.

For benchmarking purposes, I want to introduce the concept of ‘era-adjusted goals’ (EAGs). The goals of each player will be adjusted accordingly for their rank of goals within a game so that the average proportions of goals in each season is equivalent (1). Then, the season tallies of each player will be adjusted accordingly for their rank of goals within a year, so that the average proportion of season goals in each year is equivalent (2).

Scoring rates

Average scores per game have roller-coasted over the league’s history. From the inaugural breakaway year of 1897, scores bottomed out only two years later when teams managed an average of just 5.01 goals per game. There was then a steady rise for the next four or so decades, peaking at 13.3 goals per game in 1941, before another dip to just 9.7 goals per game in 1952. Once again scores were on the climb for the next 30 seasons, with an all-time league high reached in 1982 with teams kicking an average of 16.2 goals per game. It has been well reported that scoring rates have fallen in more recent times, but this has typically been a gradual regression in the past 36 seasons to 12.0 goals per game this year.

For benchmarking purposes, each game will be scaled proportionally so that the average number of goals per game is equivalent in each and every season (3).

Benchmarking spearheads

To account for the evolution of the league, three separate adjustments were carried out to benchmark the conditions for all goal kickers over time. In order, these were:

  1. On a game-by-game basis, standardise the proportions of goals kicked by each team’s ‘goal rank’ player in each given match
  2. On a season-by-season basis, standardise the proportions of these new scaled goals kicked by each team’s ‘goal rank’ player across each given year
  3. Finally then scale each game goal tally to standardise the average number of goals per game across each season

The methodologies are a little clunky to explain without losing my entire audience, so instead I’ve chosen four examples from different eras to articulate how the benchmarking played out across some famous performances:

Jim McShane’s bag of 11 goals for Geelong in 1899 is scaled up to 17.4 era-adjusted goals (+6.4):

  1. First scaled down because in 1899 a typical team’s top goal kickers in a match kicked a higher proportion of goals in a game than the 122-season average
  2. Then further scaled down because in 1899 a typical team’s top goal kickers in a season kicked a higher proportion of team goals in a year than the 122-season average
  3. Then finally scaled up because in 1899 games featured much fewer goals in a game than 122-season average

Fred Fanning’s bag of 18 goals for Melbourne in 1947 is scaled down to 15.9 era-adjusted goals (-2.1):

  1. First scaled down because in 1947 a typical team’s top goal kickers in a match kicked a higher proportion of goals in a game than the 122-season average
  2. Then further scaled down because in 1947 a typical team’s top goal kickers in a season kicked a higher proportion of team goals in a year than the 122-season average
  3. Then finally scaled down because in 1947 games featured slightly more goals in a game than 122-season average

Lance Franklin’s bag of 13 goals for Hawthorn in 2012 is scaled up to 14.9 era-adjusted goals (+1.9):

  1. First scaled up because in 2012 a typical team’s top goal kickers in a match kicked a lower proportion of goals in a game than the 122-season average
  2. Then further scaled up  because in 2012 a typical team’s top goal kickers in a season kicked a lower proportion of team goals in a year than the 122-season average
  3. Then finally scaled down because in 2012 games featured slightly more goals in a game than 122-season average

Jack Riewoldt’s bag of 10 goals for Richmond in 2018 is scaled up to 12.5 era-adjusted goals (+2.5):

  1. First scaled up because in 2018 a typical team’s top goal kickers in a match kicked a lower proportion of goals in a game than the 122-season average
  2. Then further scaled up  because in 2018 a typical team’s top goal kickers in a season kicked a lower proportion of team goals in a year than the 122-season average
  3. Then finally scaled up because in 2018 games featured slightly fewer goals in a game than 122-season average

Every such goal tally from every player in every game from 122 years has received the exact same treatment based on the scoring characteristics of the league in that season. We therefore end with both actual goals and era-adjusted goals tallies for all players, in every match, season and career.

The greatest of them all

The best careers

This analytical approach to benchmarking the greatest goal kickers in VFL/AFL history presents… Tony Lockett as the most prolific sharpshooter ever! Was that a surprise? I’m not sure. However as a result of the era in which Plugger played, his tally is adjusted significantly downwards to 1209 EAGs (-151 on his actual tally). Jason Dunstall (1103 EAGs, also -151) hops into second position on the overall adjusted tally, jumping Gordon Coventry (1082 EAGs, -217) who slips into third.

It is significant that Lance Franklin’s career record is well respected by the analysis, jumping from eighth to fourth on the tally (1034 EAGs, +117). As big as Buddy has been in the modern era, does the current media and footy pundit still underrate the imprint left by him on this league? Only two others fair better in additional adjusted goals than Franklin, both from the early decades of the league – namely Dick Lee (863 EAGs, +157) and Jack Leith (279, +117) who in played in such a dour era he is awarded an additional 72% of his actual tally as a result.

The top 20 goal kicking performances across a career by total era-adjusted goals

With regards to averages, it probably comes as no surprise that the brightest star shining is that of John Coleman. The player whose name is enshrined on the annual medal for the season’s leading goal kicker stands above all others, average 5.26 EAGs per game (-0.22 on his actual average), leapfrogging Peter Hudson (4.99 EAGs per game, -0.64). Dick Lee (3.75 EAGs per game, +0.68) and Lance Franklin (3.56, +0.40) are the big winners on this measure, with many of the prolific goal kickers in history more harshly punished by the high-scoring eras they played in.

The top 20 goal kicking performances across a career by average era-adjusted goals per game (minimum 50 games)

It tends to be that the most rewarded (or, underappreciated by raw goal measures) are for the primary forwards playing in three eras: the early days (until about 1920); an approximate decade between the mid-1950s and mid-1960s, and indeed in the modern era (since about 2000). Perhaps expectedly, those spearheads from those halcyon eras of the 1970s-1990s are more harshly penalised (or, had the benefit of playing in eras that more suited their craft).

The following dashboard contains two dynamic views summarising the top VFL/AFL goal kickers to 2018, with the ability to toggle between era-adjusted and actual goals. The default views filter those players with at least 500 actual goals, at least 2.5 actual goals per game, and at least 50 career games (but you can change these if you wish).

Please note – the above dashboard is correct to the end of the 2018 season, and this blog post. To view the most up-to-date numbers, please refer to my ongoing era-adjusted goals dashboard.

The first pane effectively combines the two above charts into one, comparing those goal kickers on both total (horizontal axis) and average (vertical axis) era-adjusted goals per game (first pane, ‘By totals and averages’). Those towards the right of the chart are those who have kicked the most era-adjusted/actual goals, while those towards the top of the chart are those who have averaged the most era-adjusted/actual goals per game.

The second pane provides a view of running tally of career era-adjusted/actual goals by match number. Here you can compare goal kickers like-for-like, particularly in adjusted terms. Note the similarity of the paths of Gordon Coventry and Lance Franklin, for example, which we will refer to again later. Also note how the era-adjusted measures pushes John Coleman’s trajectory slightly above that of Peter Hudson.

The best seasons

Looking across the best performances in a given year, famously both Bob Pratt (1934) and Peter Hudson (1971) managed 150 goals in a season. In terms of EAGs, Jason Dunstall’s 1989 season (132 EAGs, -6) becomes the most impressive tally in a given year, with the 1971 season of Hudson (126 EAGs, -24) dealt with more harshly and demoted into second. Of the best adjusted seasons, Dunstall (1989, 1988) and Hudson (1971, 1970 and 1968) end up with the top five on the list. Pratt’s 1934 (105 EAGs, -45) is not rated anywhere near as rosily, falling even below Lance Franklin’s century topping year of 113 actual goals (110 EAGs, -3) and indeed out of the top 20. Most of the top 20, even on adjusted terms, tend to be scaled down – primarily because they tend to be from the modern era, and that is primarily because those players have tended to play in longer seasons.

The top 20 goal kicking performances across a season by total era-adjusted goals

For season averages (minimum ten games), Tony Lockett’s injury-affected 1989 tops the charts with an astronomical 6.82 EAGs per game from 11 matches (-0.27 goals per game down on the actual), followed by Peter Hudson in 1968 (6.37 EAGs per game, -0.21) and Lockett again in 1991 (6.20, -1.28). Alongside Hudson, John Coleman’s name appears three times, in 1952 (5.94 EAGs, +0.22), 1953 (5.71, +0.32) and 1950 (5.25, -1.06).

The best games

One of the best known records in the VFL/AFL is Fred Fanning’s mark of 18 goals in, remarkably, his final senior game in 1947 before taking up a coaching position in the sticks. But with our approach to benchmarking, this performance is relegated to third with an adjusted mark of 15.9 goals (-2.1 on the actual). It is surpassed by the exploits of Jim McShane (17.4 EAGs, +6.4), who kicked 11 majors for Geelong in 1899 against a hapless St Kilda, and Harold Robertson (17.2, +3.2) who booted 14 also against St Kilda in 1919.

The top 20 goal kicking performances in a game by era-adjusted goals

Thanks to Anthony Hudson, Lance Franklin’s 13 majors against North Melbourne in 2012 won’t be forgotten in a while, and the data shows this is for good reason. It is ranked as the sixth greatest individual era-adjusted goal kicking performance on record.

Buddy beyond

So Lance Franklin is ranked fourth all-time in our era-adjusted goals, behind only Coventry, Dunstall and Lockett. As you can view in the dynamic dashboard view (second pane, ‘By number of games played’), he has tracked alongside Coventry game-by-game in EAGs for his entire career, recently pulling ahead as Buddy heads towards 300 games. Can he continue and overcome all three on relative measures, if not absolute tallies?

Franklin has averaged 3.12 goals per game at Sydney, just under his Hawthorn average of 3.19 per game. Given the slight changes in scoring trends even in the past half-decade making goal kicking even harder for main spearheads, his adjusted average at the Swans (3.78 EAGs per game) even outrates his time at the Hawks (3.44).

Helped by finals appearances, he has averaged 21.6 games per season at the Swans. With four years left on his contract, let’s assume he averages 15 games per season in the twilight of his career, for a total of 60 games left to come. This would take him to 350 total, which would take him to equal-16th on the all-time VFL/AFL games tally list. At his current Sydney goals per game rate, we can expect another 187 goals or another 227 EAGs – taking Buddy to 1,104 goals (leaving him in fourth position, behind the same three players) and 1,261 EAGs (surpassing the lot of them). In fact at his current rates, it would only take another 47 games or so for Franklin to eclipse Lockett, Dunstall and Coventry and be ranked as the number one adjusted goal kicker of all time, holding scoring conditions equal.

As a result of this analysis I keep asking myself this question: is the career of Lance Franklin still underrated?

Please comment below or reply to me on Twitter!

The ‘Miracle of the Saints’ and more on Win Probabilities

The ‘Miracle of the Saints’

Embed from Getty Images

Three weeks ago, Carrara played host to a remarkable result, in what may have easily been scripted as an unremarkable match-up between 15th-placed Gold Coast and 16th-placed St Kilda at Metricon Stadium.

The off-Broadway match was one that may have easily slipped into the ether, with two no-hopers of 2018 playing in front of just over 10,000 spectators on the Gold Coast, and hardly demanding a television audience with a Saturday twilight time slot ahead of the Socceroos’ 2018 FIFA World Cup campaign launch against France.

The ScoreWorm as per the AFL Match Centre for 2018 R13 Gold Coast v St Kilda

Journalists were no doubt starting to sharpen the knives to finish off embattled coach Alan Richardson as the Saints trailed by 39 points with 26 minutes played in the third quarter. About 35 minutes – and 6.5 (41) to 0.0 (0) later – the Saints had completed the miracle with a dramatic two-point victory. It was the first match this season where a team had come back to win from a deficit of 30 points or greater.

‘Footballistics’ context

One of the 15 chapters in James Coventry’s new book ‘Footballistics’, named Win Probabilities, assesses various ‘heuristics’ or rules of thumb which are commonly adopted by the footy fan, pundit or commentator as a way to ‘call’ a victory for a particular team during a game. Thinking about it another way, this is when the individual believes that the in-game chance of victory for a particular team approaches close to 100 per cent.

In particular, the chapter considers the winning chances of teams meeting the criteria for three heuristics:

  1. Teams reaching 100 points before to their opposition (which considers only points scored)
  2. Teams with at least a 30-point lead (which considers points scored and points conceded)
  3. Teams leading by more goals than minutes remaining (which considers points scored, points conceded and time remaining – but typically relates only to the dying minutes of a match)

The winning rates of all these situations are discussed in some depth within the chapter.

The natural extension to these themes is the creation of a model which can provide an approximation to a probability of victory given point in a match – taking into consideration both scoring factors as well as time remaining. Left on the cutting room floor of the chapter were a list of the most unlikely comebacks, based on a simple logistic regression model I created at the time.

Coincidentally, the same topic was raised on Twitter over the following weekend following various throwbacks to the Bombers’ victory over the Kangaroos from a 69-point deficit in 2001.

I had dragged together much of this content following the St Kilda win over Gold Coast, but this Twitter conversation was the catalyst to (eventually) finishing the post (two weeks later).

A simple in-game model for win probabilities

Recently I have slightly updated the model to create a number of variations. One simpler but stronger combination, which I’ve named WinProb2, takes into account:

  • Percentage of game time duration remaining
  • In-game running margin of the chosen team
  • In-game running combined score (points) of both teams
  • In-game running combined scoring shots of both teams
  • In-game running difference in scoring shots of the chosen team relative to the opposition
  • Home team

For simplicity, estimate probabilities have been calculated at the time of each score, and a second prior to each score, for all in-game scoring events since 2008. In practice, what this means is that probabilities have been estimated as the ball has effectively carried through for a score but before the score has been officially recorded. It is an important point to note: the model doesn’t know where the play is situated, or what may have happened and is yet to be signalled – it only considers the simple attributes above.

The most unlikely successful comebacks since 2008

Using WinProb2 as the basis for estimate in-game win probabilities, the following lists the top 100 most unlikely successful comebacks across all AFL Premiership Season matches between Round 1 2008 and Round 16 2018 inclusive.

WP2SeasonRoundQuarterTimeTeamOpponentIn_game_marginEventual_margin
0.3%2013R13325:08Brisbane LionsGeelong-51+5
0.4%2008R4324:13Brisbane LionsPort Adelaide-47+20
0.5%2013R9415:13AdelaideNorth Melbourne-30+1
0.6%2015R6304:31St KildaWestern Bulldogs-55+7
0.7%2011R5329:33Gold CoastPort Adelaide-40+3
0.7%2011R23410:17EssendonPort Adelaide-34+7
0.8%2018R13326:06St KildaGold Coast-39+2
0.8%2008R21401:51CarltonBrisbane Lions-32+6
1.1%2013R23321:30CarltonPort Adelaide-38+1
1.2%2013R5320:00Port AdelaideWest Coast-41+5
1.2%2012R8413:42Port AdelaideNorth Melbourne-32+2
1.3%2008R7402:15MelbourneFremantle-32+6
1.5%2016R21400:19GeelongRichmond-35+4
1.5%2008R11319:06CarltonPort Adelaide-38+12
1.8%2017R15405:10Brisbane LionsEssendon-27+8
1.9%2008R17329:46RichmondBrisbane Lions-31+3
1.9%2015R1325:42SydneyEssendon-41+12
2.1%2017R2334:03GeelongNorth Melbourne-31+1
2.2%2009EF405:23Brisbane LionsCarlton-29+7
2.3%2008R17323:44CarltonWestern Bulldogs-31+28
2.4%2013R19422:01Port AdelaideAdelaide-20+4
2.6%2010R14322:25RichmondSydney-33+4
2.6%2009R17404:07GeelongHawthorn-28+1
2.7%2011R20413:48AdelaideBrisbane Lions-22+5
2.7%2014R6414:26CarltonWest Coast-24+3
3.1%2011R23319:53West CoastBrisbane Lions-29+8
3.1%2014R13305:06MelbourneEssendon-33+1
3.2%2008R6402:08GeelongFremantle-25+1
3.2%2008R5412:51North MelbourneCollingwood-21+7
3.3%2011R17315:38EssendonAdelaide-34+11
3.3%2008R11304:31SydneyWest Coast-37+5
3.3%2013R16325:02Brisbane LionsNorth Melbourne-33+12
3.4%2017R9405:30Greater Western SydneyRichmond-27+3
3.6%2013R22405:05EssendonCarlton-19+6
3.6%2013R2229:50GeelongNorth Melbourne-41+4
3.7%2011R11403:58HawthornFremantle-26+22
3.7%2017R13313:10SydneyRichmond-31+9
3.8%2015R9301:16CollingwoodNorth Melbourne-39+17
3.8%2013R14330:36EssendonWest Coast-23+7
3.9%2016R21423:57West CoastGreater Western Sydney-18+1
3.9%2013R3302:23EssendonFremantle-36+4
3.9%2015R15311:48Western BulldogsGold Coast-37+22
4.0%2015R13304:12RichmondSydney-33+18
4.0%2017R14425:13SydneyEssendon-19+1
4.1%2009R21131:48Brisbane LionsPort Adelaide-47+15
4.4%2012R16418:31Gold CoastRichmond-18+2
4.4%2017R9211:54CollingwoodHawthorn-43+18
4.5%2008R8414:00Western BulldogsFremantle-18+3
4.5%2012R22214:29HawthornSydney-38+7
4.5%2012R15219:31West CoastNorth Melbourne-35+2
4.5%2016R12316:36AdelaideWest Coast-26+29
4.6%2014R18406:38EssendonWestern Bulldogs-20+7
4.8%2015R13401:34AdelaideBrisbane Lions-24+13
5.1%2010R22417:30HawthornCollingwood-19+3
5.1%2017R14419:08MelbourneWest Coast-16+3
5.3%2012R23415:57St KildaCarlton-15+15
5.4%2008R13319:15Western BulldogsCollingwood-23+10
5.5%2008R16228:39SydneyCarlton-29+2
5.5%2011R9408:52CollingwoodAdelaide-23+43
5.6%2014R21310:35EssendonWest Coast-34+3
5.6%2008R16229:47St KildaHawthorn-34+30
5.6%2013R11315:47EssendonCarlton-31+5
5.6%2018R6401:56SydneyGeelong-22+17
5.6%2010R15210:17CollingwoodPort Adelaide-36+26
5.8%2008R9314:53North MelbourneWestern Bulldogs-24+3
5.9%2016R8416:15CarltonPort Adelaide-18+2
5.9%2011R17200:44Gold CoastRichmond-36+15
6.0%2010R6320:31St KildaWestern Bulldogs-23+3
6.2%2009R6302:34FremantleWest Coast-23+13
6.2%2012R22331:58Brisbane LionsPort Adelaide-19+11
6.2%2016R1328:26MelbourneGreater Western Sydney-22+2
6.2%2009R1210:06Brisbane LionsWest Coast-38+9
6.2%2013EF308:16CarltonRichmond-31+20
6.2%2014R22312:04HawthornGeelong-33+23
6.3%2011R1416:07FremantleBrisbane Lions-17+2
6.3%2017R4231:12FremantleMelbourne-27+2
6.3%2018R1408:38EssendonAdelaide-20+12
6.3%2017R18404:07CollingwoodWest Coast-24+8
6.3%2017R14327:01GeelongFremantle-27+2
6.3%2016R2423:12CollingwoodRichmond-17+1
6.4%2018R15327:29AdelaideWest Coast-27+10
6.4%2014R12303:15North MelbourneRichmond-35+28
6.5%2010R3330:25FremantleGeelong-21+7
6.5%2011R23315:00RichmondAdelaide-24+22
6.5%2014EF306:05North MelbourneEssendon-33+12
6.6%2009R11315:55AdelaideEssendon-21+16
6.7%2008R14209:36St KildaNorth Melbourne-33+15
6.8%2013R21210:00CarltonRichmond-30+10
6.8%2009R14206:03CarltonFremantle-36+15
6.9%2017R4307:17Western BulldogsNorth Melbourne-29+3
7.0%2017R10312:21MelbourneGold Coast-30+35
7.0%2014R4324:01Western BulldogsGreater Western Sydney-20+27
7.0%2017R8219:37MelbourneAdelaide-28+41
7.0%2012R16322:29FremantleMelbourne-20+34
7.0%2016R3401:28HawthornWestern Bulldogs-19+3
7.1%2008R19422:38SydneyFremantle-14+4
7.1%2012R10407:09Brisbane LionsWest Coast-21+2
7.2%2013R21325:09Western BulldogsAdelaide-22+17
7.2%2013R1226:52GeelongHawthorn-30+7
7.3%2008R12315:45CarltonCollingwood-24+30

The most unlikely victory across ten years was the ‘miracle on grass’ by Brisbane against Geelong once again in 2013. The Lions trailed 5.6 (36) to 13.10 (88) into time-on in the third quarter, with our model giving them just an estimated 0.3 per cent chance of victory. In what really was a remarkable result, Brisbane managed 10.7 (67) to 1.4 (10) in just over a quarter to pinch the game – and even then still needed a kick after a siren, thanks to Ashley McGrath’s effort from about 50 metres in a fairytale 200th game. Clearly an unlikely result, whether you are talking in layman’s terms or mathematically.

It is of interest that even though only three of the top ten most bleak situations occurred in the final quarter, the going was that tight that all but one still only resulted in a winning margin of a seven point or fewer. It suggests that, even with well over a quarter of play remaining in the match, many of the wins were that tight that the trailing team needed every last minute remaining. The only match which ended in comfortable result was the remarkable 2008 match between Brisbane and Port Adelaide. In that game, the Power led by 47 points during time-on in the third quarter, before the Lions somehow found 11.8 (74) to 1.1 (7) from that point onwards to eventually win easily. Given nine of the top 31 results are from that season of 2008, I am conscious that the model perhaps is too well-calibrated to a modern, more-stodgier style of game play and may overstate the chances of a large comeback required in a higher-scoring game ten years ago.

But other comebacks came from particular dire circumstances are found late in final terms. In Round 9 of 2013, the Crows were 30 points down half way through the final quarter against the Kangaroos with an estimated chance of just 0.5 per cent, but managed to win by a point in a stunning result. At 21st position on the table, another game later in that season featured Adelaide finding itself on the other side of the ledger. The Crows this time led the Power in the Showdown by 20 points well into time on, with Port Adelaide given just a 2.4 per cent chance but still able to find a way to win by four points.

And lo and behold, well entrenched inside the top ten was the St Kilda performance against Gold Coast back in Round 13. The Saints may not have had too much to get excited about this season, but that result was one of the most unlikely in over a decade.

The ‘greatest comeback of all time’

Embed from Getty Images

Casting our minds back to the Bombers’ greatest comeback of all time against the Kangaroos, I applied the WinProb2 model according to the scores at the moment prior to Essendon’s goal eating into the then-69-point margin. Intriguingly, given the length of game duration remaining combined with the high scoring rate, the model estimated the Bombers still maintained a 2.4% chance of victory – placing 20 comebacks more unlikely in the past ten years alone. One thing I must note is that match was an outrageously high scoring game for the time, never mind for the dour matches of today. Because the model has only been trained on in-game score data since 2008, the model has had very little chance at being exposed to such extreme scoring , which may weaken the level of certainty around its estimate predictions.

Possible improvements

There were other various parameters I played with but they didn’t greatly improve the model, and I liked how well this model performed against one with many other combination of parameters given its simplicity. For example, I attempted trying to control for a ‘scoring end’ (based on wind or weather conditions), by taking into account the scoring rate at each end and the time teams had played to each end. This perhaps may pick up a stronger signal in a future improved model by factoring for various grounds (say, University of Tasmania Stadium in Launceston which seems to be often dominated by windy conditions).

One other definite improvement would be to understand contextualised chances of victory based on team quality and the pre-game estimate chance of victory. The only factor in the current models which takes into account any team context is a flag for the home team (which proves to be advantageous). This would come at the cost of moving the model away from its current simplicity, but would be a natural progression from considering factors based upon merely scoring, margin and time. To get a much better understanding of the chance of winning for a specific side against a specific opponent, a team- or player-based rating model would be required. I am yet to build one of these, but you will find team ratings at excellent sites like Matter of StatsSquiggle, The Arc and a lovely player-based rating model at HPN.

Another element that needs to be modelled differently is the probability of victory within the last minutes of a match where the margin is small. The model is not well calibrated to deal with predicting results with margins tight in the last five per cent or so of matches. I’ve played with a few combinations but as yet I haven’t clocked this one, so I would hesitate to use the existing model to provide estimate probabilities on these types of matches.

Finally, the bias towards 2008 in the results may suggest that it is not adequately fitted to different modes of play across seasons earlier in the sample.

Further analysis

I am looking to build out these models and add more analysis on this topic in future weeks. To view every ScoreWorm since 2008, feel free to browse and play with my dynamic visualisation.

To read more about footy analysis driven by data, why not support great footy writing and our nation’s broadcaster by purchasing ‘Footballistics: How the data analytics revolution is uncovering footy’s hidden truths’ from the ABC Shop.

Feedback

I am very open to feedback on this or any of my analysis. I expect many readers may have a deeper grounding in statistics than myself, and love to learn. If I have missed something, made an error, or if you have any suggestions or ideas, please feel free to comment below, shoot me a direct message or hit me up on Twitter.

All data provided for analysis on this page comes thanks to AFL Tables.

Footballistics on sale today!

Over the past 18 months I have had the great opportunity to work with ABC journalist James Coventry to provide statistical analysis and data visualisation for three chapters of his new book ‘Footballistics’. James also utilised a raft of very talented other analysts from other websites such as Figuring Footy, Matter of Stats, Ranking Software and HPN Footy.

This proved to be a lot of fun (and a lot of work!) and I’m looking forward to future questions and further investigation into a number of these areas. I contributed to three chapters, namely:

  1. Goal kicking accuracy
  2. Win probabilities
  3. Australian Football Hall of Fame

In each chapter there was some content that was left on the cutting room floor. In the next little while I will be looking to utilise this ‘extra content’ and fill out some of the analysis that was threaded throughout these chapters.

If you are keen to find out more or if you are interested in a copy, you can read more about Footballistics on the ABC Shop online.

IL

A new beginning

Welcome to the new InsightLane website. This has been a long time coming, two years after I launched my bits-and-pieces Twitter account and launched a temporary website without my own domain.

This website is the hub of my statistics, insights, analytics, and data visualisation across a range of Australian-centric domains. It primarily will cover Australian football (specifically the AFL), with plans for bits of pieces of other sports, as well the odd snippet of weather and politics data.

The past 12 months has been a particularly busy time for myself on multiple fronts. One of these is particularly exciting and I am looking forward to being able to announce some news later on in the year.

The new footy season approaches and this year the AFL analytics community has (seemingly) (for the most part) lost the regular blog posts of pioneers @TheArc and @FiguringFooty. I have some new ideas and will look to take up some of their slack.

Please bear with me as I get to terms with the WordPress functionality and what I can and cannot do with this theme.

Always feel free to add me on Twitter or shoot me through an message with a question, compliment, criticism or idea.

Cheers,

IL