Using data to predict the 2019 Australian Football Hall of Fame inductees

Embed from Getty Images

Mel Whinnen giving his acceptance speech at the 2018 ceremony

Tonight (Tuesday 4 June), the Australian Football Hall of Fame will honour its next batch of players, coaches, administrators and media performers with induction. Each year in the lead up to the ceremony there is often a range of speculation from various media types on which ex-players will be in the latest batch to be inducted. The Victorian writers tend to focus on the newly eligible candidates from the recent pool of AFL retirees, while the focus interstate is often to wonder when the ledger will be tipped a little back in their favour to honour some of their long-overlooked football legends.

The Australian Football Hall of Fame was established in 1996 and “seeks to recognise and enshrine players, coaches, umpires, administrators and media representatives who have made significant contributions to Australian Football – at any level – since the game’s inception in 1858”. In total, 136 individuals were inducted into the Hall of Fame in the initial intake of 1996 and a further 121 have been added in the 22 years since. Players make up the bulk of the intake (coaches, administrators, umpires and media personalities also have a presence), with 202 inducted off the back of their playing career and another 28 which have since been elevated to Legend status. This post will focus on players, as the cohort with both the largest sample and most easily accessible records and playing honours.

It is important to note that the Australian Football committee “considers candidates from all parts of Australia and from all competitions within Australia” rather than merely the sole elite-level competition today, the AFL. Newer or younger enthusiasts of the sport may not realise that indeed the current pinnacle of the game is a somewhat recent change within the history of the structure of the sport at all levels, with state leagues dominating the landscape for over a century until the late 1980s. The three states which have always played the most prominent role founded formal football associations well back into the 19th century, namely Victoria, South Australia and Western Australia.

For a number of years, football historians have bemoaned the seemingly inequitable induction rates into the Hall, specifically favouring both playing careers in Victoria (over the other state leagues) through the VFL/AFL, and those that tended to remain in the memories of selectors (from the 1960s onwards) at the expense of those prior to the television era.

The challenge

Woven through a lovely narrative of the careers and legacy of South Australian champion footballers Sampson ‘Shine’ Hosking and Tom Leahy, one chapter of the 2018 book ‘Footballistics’ analysed the skew across states and seasons of the inductees into the Hall of Fame. It investigated the dis-proportional induction rates in player groups by different eras and in different competitions. It also looked at some of these trends have evolved over time since the inaugural Hall of Fame induction in 1996. Neither Hosking nor Leahy have been inducted, despite both objective and anecdotal evidence to their favour, whilst vast numbers of particularly VFL players from the 1960s-1980s have been inducted ahead of them.

What I also wanted to do was understand which career achievements may be meaningful in the eyes of the committee in order to gain induction. This was not an easy task, as there are no minimum achievements required to reach eligibility and the committee “considers candidates on the basis of record, ability, integrity, sportsmanship and character. How could one even begin to objectively or quantitatively measure many of these attributes, such as “ability” or “character”? These are hard enough to even hypothetically consider a relevant metric, never mind go about finding an available data set where these metrics might exist for elite Australian Football! The one attribute that, of course, can be at least partly assessed is the playing record of individuals. The short biographical sketches written about each inductees provided on Hall of Fame website give us a good starting point.  They typically include statistics such as career span, total games and goals, league and club best and fairest awards, league and club leading goal kicker awards, league and club teams of the century, All-Australian selections, state appearances, grand final best on ground awards and years captained.

Given (to my knowledge) a lack of a comprehensive and complete database across the elite levels of Australian football, to research for ‘Footballistics’ I spent a lot of time collating (through best efforts) an aggregate data set which summarised player-level data spanning leagues, including attributes such as seasons played, games and goals, and various honours and achievements. This process was fairly labour intensive and detailed, so I’ll spare the details until the bottom of the chapter. It is fair to say that the some of the numbers in the data set used this analysis range greatly from the precise to the approximate, such is the time scale of the players’ careers assessed and evolving nature of competitions over more than a century – as well as a large number of ‘fuzzy’ joins on my end which also require a leap of faith. Accordingly, all values should be read as estimates only, providing a good overview and understanding of patterns but not as ‘ground truth’.

I ended up with a relatively broad data set of Australian footballers over time, including their career games and goals at top-league clubs, the seasons they played, competition-wide honours such as league best and fairests, league goal kicking awards and number of premierships, and team-based achievements such as club best and fairests, club goal kicking awards, and club captaincies. All of these achievements were summarised per player and split out over four major league categories, namely the VFL/AFL (until 1989), the SANFL (until 1990 prior to Adelaide joining the AFL), the WAFL (until 1986 before West Coast joined the AFL) and the VFL/AFL (the national era since 1990). I was also able to include representative selection, specifically All-Australian selection in both the Carnival and national eras. One piece I would have loved to include (and I think would have been particularly explanatory) was number of intra-state league and state-of-origin matches, however I couldn’t find a comprehensive data set with this information.

For simplicity, I chose to concentrate my efforts on the playing careers of inductees only. All inductees referred to in this section have been inducted as ‘Players’ and/or ‘Legends’ (unless otherwise specified) and all games referred to are as players (rather than coaches), as this by far comprises the biggest sample size of inductees for analysis purposes. The Hall of Fame inducts individuals who participated in more than one role still in a single discrete category. For example, John Cahill is listed as ‘Coach’, even with very respectable playing careers, while five-time VFL premiership coach Jack Worrall is listed as a ‘Player’! Perhaps erroneously (a theme…), the website has Western Australia’s Jack Sheedy as the only individual listed on both the ‘Player’ and ‘Coach’ pages.

The approach

With this data set, I was able to fit a number of classical and machine learning models to ‘reverse engineer’ a data-focused criteria to induction into the Hall of Fame. In the end I settled on the best performing logistic regression model, as it was philosophically suitable to model a binary induction status of players based on their the aggregation of their achievements, performed well with many of the input variables statistically significant, and was more explainable to a pleb like myself.

Hall of Fame induction status and its relationship to a range of various playing honours and achievements, by league

The only set of attributes I dropped were the club leading goal kicking awards for players across all four league categories, as they showed no explanatory power. I would assume this is as leading a club’s goal kicking can be done by some relatively ordinary players in ordinary sides, and total career goals is a much better indicate of strong forwards who are deemed worthy. Some of the other attributes exhibited statistical significance across two or three of the league categories, and in that case I left the entire set in the model for consistency.

I also only considered at players that commenced their careers after 1897 (to standardise the pool across the leagues), and stripped out the inducted Hall of Fame ‘Coaches’ entirely from the training data set. I didn’t want to muddy the data set by having their playing careers not matched to a Hall of Fame ‘Player’ induction, nor include them as inducted players either.

With the combination of the sum of all these playing achievements and attributes, I was able to generate a propensity for all yet-to-be inducted footballers. A rating towards 1 suggests the player is very likely to be inducted, while a rating closer to 0 suggests the player has little chance under the current criteria. For example, highly-decorated and newly-eligible champion Simon Black topped the board with a modelled rating of 0.97.

It is also important to note that what this model does not do is suggest who ‘should’ and ‘should not’ be inducted. It merely tells us what factors may have been significant contributors to the induction of players in the eyes of Hall of Fame committees of the past, and gives us a rating or propensity for similar-type players to be inducted given the committee’s selection history. For this reason, the model does not try to explicitly account for any skews towards the VFL or the 1950s to 1970s – in fact, by applying it to the careers of all players not yet inducted, it will tend to favour the same type of players. You could say that you only get out what you put in. 

As such, it’s hard to compare like-for-like as the players with any history in the VFL/AFL and modern era tend to outshine all others numerically. Instead, I have forced its hand and broken out the top five results for a combination of league and era (two-decade increments). Players were allocated to both the league and the era in which they played the highest proportion of matches in their careers. This way, we can compare ‘like with like’ and at least understand which yet-to-be-inducted players our data suggests should be closer to Hall of Fame worthiness.

The modelled induction chances of prospective Australian Football Hall of Fame candidate players, by league and playing era

The skew towards the VFL/AFL and the more modern decades is pretty stark, with much higher top-five ratings in those selections.

Our South Australian pioneers ‘Shine’ Hosking and Tom Leahy pleasingly sit top two in the first two decades of the 20th century, while Jim Deane (two-time Margarey Medal winner and six-time South Adelaide best and fairest) and Don Lindner (Margarey Medallist and three-time North Ad(Sandover Medallist, two-time West Perth best and fairest and three-time WAFL premiership player)elaide best and fairest) lead the way for the Croweaters in other eras.

Over to the west, and Hugh ‘Bonny’ Campbell (four-time WAFL premiership player and once kicked 23 goals in state game), Ted Flemming (Sandover Medallist, two-time West Perth best and fairest and three-time WAFL premiership player) and Steve Malaxos (Sandover Medallist and inaugural West Coast best and fairest) seem to be the most likely candidates in their respective eras from the WAFL. Prior to the 2018 induction ceremony, there were two more names at the top of this list who are now Hall of Famers. An earlier iteration of my model placed Bernie Naylor and Mel Whinnen within the top three selections for WAFL players at this time last year, which spurred me on to consider this analysis for 2019.

A 2018 tweet showing the then-top-ranked WAFL candidates heading into that year’s Hall of Fame ceremony

The VFL/AFL has had a stack of players inducted each year, and a case could be made that some ‘very good’ players have been a little lucky (if that is possible) to receive the honour. One name missing that has jumped out me for years is that of Kelvin Templeton. He is one of just five players to have won both the Brownlow and Coleman Medals (two), and remains the only such player to not be honoured with an induction. Combine that record with two Footscray best and fairest awards and five leading goal kicker awards and surely his name must be floating near the top. Tony McGuiness doesn’t quite fit properly into the ‘1960s-1980s’ VFL era as he only played 87 of his career 335 matches in that dimension. I’ve decided to place him there as he played 60% of his career games in the 1980s, and played 66% of his games in the VFL/AFL – his career wasn’t largely in the SANFL in the 1980s, nor largely in the 1990s AFL either.

In previous generations, Bill Cubbins (one of the premier full-backs of his era and four-time St Kilda best and fairest) and Alby Morrison (five-time Footscray leading goal kicker and two-time club best and fairest winner) are rated highly in compariso to their peers.

In the national era of the AFL, Simon Black is eligible for the first time in 2019 and outshines all with a hat-trick of premierships at Brisbane, Brownlow and North Smith medals and three All-Australian guernseys. 

The model is calling out some features it sees as important to have on the CV of a Hall of Fame footballer, but we must remember that it is kept in the dark from so many other features that football fans and historians could call out in an instant. How can it factor in the individual ability of Gary Ablett Senior, the courage of Francis Bourke or the defensive resolve of Vic Thorp, when they only shared four club best and fairests between the three of them? As such, perhaps it’s not surprising given its limitations – and the limited samples of inductees from South Australia and Western Australia – that there may be some notable omissions from the highly-rated candidate list. It may only take some further ‘squaring up’ by the committee in future years to help recalibrate the model and smooth the results.

It is important to understand in this analysis that our model – and indeed, all analytical models, to various degrees – is useful for understanding but must be considered flawed. And in this case, it is heavily flawed on multiple levels, due to the question we are trying to answer and availability of structured information we have at our disposal with which we want to use to answer it. Before we even begin we have recognised we are unable to even measure most qualities considered by the committee and once we consider playing records only, we miss such anything qualitative or anecdotal and must be constrained to the objective list of achievements. But even then, few honours can easily be compared over many decades. For example, the current All-Australian selection system today recognises the best players by position across a season, however prior to the modern era it was selected from an inter-league pool based on performances at interstate carnivals. And then finally, it due to the structure of most awards and statistics, it is likely that certain types of players (particularly defenders) are likely to be statistically under-represented by any analysis, due to the lack of quantitative metrics that tend to relate to those who shut down, rather than create.

Predicting the 2019 inductees

Although not reflected on the official Hall of Fame website (the stale criteria outlined is now a number of iterations old), news articles in recent months have pointed to an expansion (to eight) of the number of possible inductees within a given year. I am foreseeing the first female to be inducted, along with perhaps another non-player (administrator, coach or media type) this year if the option is chosen.

That will leave five or six male players on the brink of induction into the Hall of Fame for 2019.

We have modelled our most likely player candidates for induction, however in recent years the selectors pleasingly are starting to lean a little towards a more representative mix of induction candidates, across both states and eras.

All inducted players following the inaugural intake, split by the amount of games played in major leagues

As discussed, last year there was a strong Sandgroper flavour to the ceremony with both Bernie Naylor and Mel Whinnen inducted. The year prior, South Australia’s John Halbert was honoured alongside ex-Collingwood but also VFA-legend Ron Todd. In 2016, there was again a decent lean outside Victoria with Paul Bagshaw representing South Australia, with Ray Sorrell and Maurice Rioli spending considerable chunks in the WAFL.

The selection committee have in recent years been pretty consistent with inducting two or three AFL-era players, including the immediately eligible Matthew Scarlett last season. The selectors also leant into the 1970s and 1980s with Terry Wallace and Wayne Johnston last season, however there are been few VFL-types in the seasons prior to 2018.

I expect this year that the selection committee will once again focus on South Australia and induct two Croweaters, alongside two-three modern-era AFL players and possibly another dark horse.

To put my money where my mouth is, bringing together everything the data has told us about the Australian Football Hall of Fame, my tips for 2019 induction are:

  • Tom Leahy
  • Jim Deane
  • Tony McGuiness
  • Kelvin Templeton
  • Simon Black
  • Alastair Lynch

Dedication

This post, and analysis, is dedicated to SANFL statistician and historian Mark Beswick, who passed away in April 2018. Mark was one of the first people I contacted when trying to hunt down footy data sets outside the VFL/AFL in 2017. 

Appendix: The data

As far as I am aware, there exists no comprehensive database or structured data sets of top flight Australian footballers, including clubs, tenure, games, goals, and league and club achievements and honours.

This meant that I needed to do my best to stitch one together myself. For the best ‘single view’ of all footballers over time and states, I took the view provided by the wonderful website AustralianFootball.com. Building on the work of footy history doyen (John Devaney / Full Points Footy), this provided the most comprehensive and consistent data set covering the VFL/AFL, SANFL, WAFL, VFA/VFL and indeed some other major leagues. Although I am aware of some missing players across the three major leagues (from Western Australia and South Australia), certainly this view contained a broad enough view that the majority of noteworthy players across the country (both inducted and otherwise) had been captured.

The human delineation of fluid history is always somewhat arbitrary, but effectively I wanted to pull out the three major state leagues prior to the national era, and then separate out the modern national competition from its state-based past. Therefore I summarised the league and club data (games and goals) into aggregates into various state- (or league-) based buckets:

  • VFL/AFL pre-1990 (“VFL” – the state era)
  • SAFA/SAFL/SANFL pre-1991 (“SANFL” – the state era)
  • WAFA/WANFL/WASFL/Westar Rules/WAFL pre-1987 (“WAFL” – the state era)
  • VFL/AFL post-1990 (“AFL” – the national era)

For consistency, I took players only from 1897 onwards, as this allows like-for-like comparison between the three-major leagues (and also most of the league and club honours are recorded after this point anyway). I also used this data set to derived out the career start (earlier season listed) and career end (latest season listed) for each player.

Next I had to join a varied selection of league and club honours and achievements. For availability and consistency purposes, I narrowed the chase down to the following:

  • League best and fairests
  • League leading goal kicker awards
  • League premierships
  • Club best and fairests
  • Club leading goal kicker awards
  • Club captains
  • Representative selections (i.e. All-Australian carnival and AFL All-Australian teams)

Most of the club and league honours were pulled league-by-league and team-by-team from various official and unofficial websites. The premiership counts were generously provided by Greg Wardell-Johnson, Ric Gauci and Steve Davies for the WAFL and Kyle Smith for the SANFL. The WAFL Footy Facts website was also very useful and clearly provides the best availability and accessibility of any league’s data outside of the VFL/AFL.

I wanted to chase down state league/state of origin games, however I was unable to find a comprehensive enough data set. This would be a good inclusion to the model going forward, as it would provide a deeper indication of the better players playing in and/or from each state at a given point in time.

Next required the arduous and tedious task of joining the achievements onto the player name details, which I did via ‘fuzzy’ joins with some supervision. There is no doubt that some of these joins will be incorrect, however for the most part (and definitely for all inductees), I can confirm the matches were sufficient for ‘good enough’ analysis purposes.

I also built out the data set of Hall of Fame inductees from the official website, which again was a little tedious as different induction years have their information structured in slightly different formats. With the same process, I created flags for the inducted players, including their induction year (and year induction as a Legend, if applicable).

For the purposes of our analysis, I set the eligibility to include players who commenced their careers from 1897, to create a common baseline to compare the new VFL against the SANFL and WAFL. The data includes all VFL/AFL players, all players and achievements in the SANFL up until 1990 (before Adelaide’s induction into the VFL/AFL in 1991) and all players in the WAFL up until 1986 (before West Coast’s induction into the VFL/AFL in 1987). As all inductees debuting after 1990 have played the vast (if not all) of their careers in the AFL, these exclusions attempt to capture all achievements from the three major leagues before the national modern era and only those records and honours in the AFL since that point. Further, the current eligibility criteria is for five years retired from the sport, so all players who were active from 2014 and onwards were likewise excluded.

I also stripped out the inducted Hall of Fame ‘Coaches’ entirely from the model training data set. I didn’t want to muddy the data set by having their playing careers not matched to a Hall of Fame ‘Player’ induction, nor include them as inducted players either.

Footballistics on sale today!

Over the past 18 months I have had the great opportunity to work with ABC journalist James Coventry to provide statistical analysis and data visualisation for three chapters of his new book ‘Footballistics’. James also utilised a raft of very talented other analysts from other websites such as Figuring Footy, Matter of Stats, Ranking Software and HPN Footy.

This proved to be a lot of fun (and a lot of work!) and I’m looking forward to future questions and further investigation into a number of these areas. I contributed to three chapters, namely:

  1. Goal kicking accuracy
  2. Win probabilities
  3. Australian Football Hall of Fame

In each chapter there was some content that was left on the cutting room floor. In the next little while I will be looking to utilise this ‘extra content’ and fill out some of the analysis that was threaded throughout these chapters.

If you are keen to find out more or if you are interested in a copy, you can read more about Footballistics on the ABC Shop online.

IL

A new beginning

Welcome to the new InsightLane website. This has been a long time coming, two years after I launched my bits-and-pieces Twitter account and launched a temporary website without my own domain.

This website is the hub of my statistics, insights, analytics, and data visualisation across a range of Australian-centric domains. It primarily will cover Australian football (specifically the AFL), with plans for bits of pieces of other sports, as well the odd snippet of weather and politics data.

The past 12 months has been a particularly busy time for myself on multiple fronts. One of these is particularly exciting and I am looking forward to being able to announce some news later on in the year.

The new footy season approaches and this year the AFL analytics community has (seemingly) (for the most part) lost the regular blog posts of pioneers @TheArc and @FiguringFooty. I have some new ideas and will look to take up some of their slack.

Please bear with me as I get to terms with the WordPress functionality and what I can and cannot do with this theme.

Always feel free to add me on Twitter or shoot me through an message with a question, compliment, criticism or idea.

Cheers,

IL