On Analysis And Statistics in Cricket - Part I
What are useful statistics? What is trivia? Can we formalise cricket analytics into a field of its own?
Since the very beginning, cricket has recorded and collated statistics of batting and bowling performance, and the general belief is that it is a game “heavy on numbers”. But what is the use and value of the statistics that we conventionally use in cricket? After all, baseball, which took inspiration from cricket to construct its own rudimentary statistics, has now moved much beyond its original repertoire of numbers, and uses detailed numbers to analyse and predict aspects of performance heavily, spawning a whole field of study called Sabermetrics. Cricket, on the other hand, mostly remains confined to the same set of aggregate numbers and averages that it used a century ago. What is the present and future of cricket statistics?
To begin with, let us clearly dissociate statistics from what often masquerades as that in cricket: trivia. Random nuggets borne of numbers, albeit good for a fun trivia night or a gasp-worthy fact, are of no real use in shaping discourse, settling debates, or helping strategy. Popular websites often publish “listicles” of such facts, labelling them “Stats”. And people who write these take the liberty of labelling themselves “statsmen”.
As an example: “30 – Number of centuries made by Virat Kohli as captain”. While this is a nice fact to know (for lack of a better term), there is no way to know whether captaincy fuels Virat Kohli’s form, or if it is by pure coincidence that the peak of his batting is at the same time as his captaincy stint. This kind of “stat” runs the risk of advancing the narrative that Kohli’s good batting is somehow causally connected to his leadership. While this might be true to an extent, there is no real way of knowing. And this is one of the less ridiculous examples.
So, let’s discount this manner of “stats” as pure trivia, for it is important to know what is an actual analytical tool, and what is a frivoluous feel-good read. Correcting the terminology that plagues the discourse is the first step to making that distinction well-known.
***
Challenges of Discourse
Cricket discourse is infested by subjective opinion parroted as fact, clichés, abstractions, and general fluff that ignores the nuances of the game in the quest to make easy and generalized statements. Central to this malaise is an actual illiteracy of statistics and how to use them to guide debate and answer questions.
Immense complexities hide in the game: variations of era, conditions, inherent ability, competition. And aggregate statistics, while good descriptors of ability over large sample sizes, cannot describe correctly many situations in cricket. And yet, they are used for inferences with certainty, often as a proxy for analysing the intricacies of the sport. For instance, one might find that Cheteshwar Pujara’s average in South Africa is a meagre 34.45, and thus conclude that Pujara is incapable of doing very well in seaming conditions. Upon closer inspection, one can see that Pujara has had one good series there, scoring 153 and 70, and the last series he played there was full of resolute ball-blunting knocks on devilish pitches.
As a cricketer’s career is broken into series, all coming with different conditions, roles and contexts, averages and other conventional numbers can be tricky tools to infer truths from.
To take another common belief in cricket, it is said that MS Dhoni is one of the best “finishers” in the ODI game. What does this mean? No one clearly knows. No one can clearly define what a finisher is, or what the adequate fulfilment of his role is.
***
A Field of Study
There are more detailed analyses of aggregate performance, done by various interested watchers and fans, but these novel methods and their results hide on niche blogs, most unknown corners on the internet, where a small but passionate population crunches numbers, draws conclusions, and debates methods and results, much like a scientific discipline. But these ideas are largely disconnected from the mainstream, and almost unheard of by the average cricket fan.
First and foremost, the need is to collate and organize actual cricket statistics in the form of a scientific field of study. To borrow from Kartikeya Date, an astute analyst and essayist of the game, there are two aspects of an academic field that are missing from cricket: method, and peer review.
Date defines “method” as an “element of demonstrability”, which is the cornerstone of any empirical and serious scientific discipline. This entails using structured arguments, facts, and meaningful statistics to support a claim. As an example, he himself has analysed MS Dhoni’s actual role in India’s ODI chases to pinpoint his exact method, to dispel the lazy narrative of him being the perfect “finisher”.
The second element, peer review, is essential to discuss and refine the methods used for cricket analysis, making our tools better and our inferences more robust. This incrementally improves the relation between the statistics we use and the cricketing world it describes, while also checking for the correctness of our processes. Think of it as improving a theory of physics through constant debate and update.
Closely related to these two is a third characteristic: historical knowledge of the field. The work of each practitioner should ideally be added to a corpus of common knowledge upon which new work is built. Old conclusions and methods should be remembered and debated, for both learning and challenging them. For instance, I once wrote an article on survival curves for Test batsmen, and only later found out that Kartikeya Date had done something similar many years ago. There should have been a central compendium of all such work done.
The past and present work in the field, along with these three tenets, should codify cricket statistics into a serious field of study. Guiding this should be a “statistical logic”, which defines standards and practices for how to use numbers for debating narratives, answering questions, or proving assertions.
***
The Kinds of Statistical Exercises in Cricket
To my mind, there are two broad categories of cricket statistical analysis: retrospective and predictive.
Retrospective analysis uses contextual and relevant statistics to:
Give a nuanced description of a match/day gone by, in the style of a match report.
Analyse trends in gameplay over a short or long time / Describe cricketing performances in more detail than archaic methods can.
Make a claim/thesis about a certain cricketing phenomenon, and then investigate it.
For an example of the first kind, look to Karthik Krishnaswamy’s “Pujara swears by his survival guide”, from the India-Australia Test in Ranchi in 2017. Here, Pujara’s progression of strike rate is used perfectly to describe his pattern of pacing the innings. A story is expertly constructed, building a narrative and a conclusion, but using well-thought-out method, and significant statistics.
For the second kind, I will point to my own article on contextual numbers in T20 cricket, a method which normalises scoring rates for the expected scoring rates in each phase of an innings (as I found out later, this method was used comprehensively by Date before). Another example is this great blog, trying to factor in the effect of home grounds on batting performances in county cricket.
This category is broad, and largely encompasses all retrospective rankings / performance assessments. From relooking at batting through the ability to occupy the crease for recent batsmen, to looking at control percentages to analyse the error rates of batsmen, everything that uses new methods to take a deeper look at performance and explain it better falls in this bracket. Another great example is CricViz’s batting profiles.
A magnificent instance of this kind of work is Ananth Narayanan’s batting rankings, that account for a wide array of factors to weigh performances in context.
In fact, this kind of piece, that aims to use detailed novel methods to factor in the various vagaries of the game, is the beginning of cricket Sabermetrics.
The third style of analysis picks up a phenomenon, cricketing cliché or idea, and seeks to investigate it. A great example would be the aforementioned article on Dhoni’s role. From my own work, this article on Rohit Sharma’s ODI batting came forth from an investigation into the narrative that he was “inconsistent”. A perusal of his distribution of scores, and a look at the distance between his median and mean score indeed confirmed that, but also showed very clearly his improvement in that regard over time. It also unearthed this characteristic shift in his playing style after crossing an initial barrier.
The other kind is predictive analysis, which uses detailed numbers and new methods to analyse trends with the objective of making predictions and/or strategic recommendations. This category too includes a variety of styles, beginning with this kind of scrutiny of Dhoni’s changing efficacy in ODIs, which attempts recommending a possible use for him, and going up to a machine learning model that predicts matchups between batsmen and bowlers in order to guide tactics.
-------------
In this way, a story/essay in cricket statistics is exactly like a research project in the sciences. One picks up a cricketing question one wants to probe, creates a statistical model that best describes the various aspects of performance that are needed for this probe, runs the model, gets the data, and then analyses that, to finally write what is essentially a report.
This is the way analytical writing in cricket should function: taking a deeper look at cricketing questions of the past, present and future using well-built and debated robust statistical methodology, in order to inform cricketing knowledge, dispel myths, and centrally drive discourse. As an example, the conclusion that Dhoni plays an anchor as other people hit around him to win India chases should become mainstream and a part of cricketing canon, replacing the lazy and inaccurate idea of him being some supernatural “finisher”.
***
In a possible Part 2, I want to look at the challenges in cricket statistics, internal and external, and ask whether the cricketing public is even ready to give stats a bigger role to play in discourse.
This is an awesome post and should be required reading for cricket and/or statistics enthusiasts! It just elevates the discourse from folks looking at anecdotal evidence and making broad claims and worse ... cricketing decisions.
I'm interested in these kinds of analysis percolating down to the amateur levels of the game- and helping players understand percentage play, patterns of play and of course strategy and tactics. It might also help folks improve specific skills given that amateur cricketers typically don't tend to invest time in deliberate practice. When I used to play club cricket a decade and half back - there was definitely less awareness.
Could you point me to the raw data sources you use ?