Lindley Coetzee

edX Final Project - IPL data insights

The below is the submission of my final project of the DSE200x: Python for Data Science course.

Abstract

We will take a dive into the complete IPL dataset that stretches from 2008 to 2020. We will attempt to answer questions regarding batsman strike rates, batsman vs bowler and boundary percentage of total runs during the powerplay . The methods used to derive the answers were groupings and pivot tables and visualized through charts and tables. For two of the three questions, the findings appeared congruent to anyone who is a cricket fan. The last question’s finding was however a surprise.

Motivation

Batting average, runs and strike rate are some examples of the stats you will find in any cricketer’s profile . I have always been interested in more than just the normal cricket batting stats. So my motivation for this project was to create some unique insights that fit into a narrower criteria . I hope that those who love cricket and analytics will find them useful and pleasant.

Dataset

The dataset is the IPL Complete Dataset(2008-2020). This is a comprehensive ball by ball record that took place for every match played. There are 1 93,468 rows and 18 columns of data. The dataset can be found on Kaggle at : https://www.kaggle.com/patrickb1912/ipl complete dataset 20082020.

You can find my notebook here.

Data Preparation and Cleaning

Although the dataset is large, it i s arranged in a good order. The main cleaning and preparing was sorting the dataset by game, by over and then finally by ball. Also removing certain data that was not needed for calculations. There were no major problems after that.

Research Question(s)


Question 1 : Who are the top 10 batsmen that has the best strike rate that scored more than 1,000 runs?

Question 2 : How many batsmen have scored 100 runs or more against a single bowler?

Question 3 : Which batsmen has the highest boundary percentage per total runs in the powerplay overs?

Methods

The groupby () function was mostly used as the total runs scored for each batsman were required in all of the questions. This method grouped all the unique batsman(batsmen appear in each row every time they face a ball), then summed all the runs for each batsman.


The df.pivot_table () calculated how much runs each batsman scored against each bowler. The df.pivot_table () was a crucial time saver as a for loop would have
been computationally expensive.


A scatter plot was used for question 1 to visualize the relationship between the strike rate and runs scored. For the other questions, a table was sufficient and simple enough to view the conclusion.

Important terms

Batsman strike rate
The runs a batsman scores from facing 100 balls. A batsman who scored 98 runs in 100 balls has a strike rate of 98.00 and a batsman who scored 20 runs of 10 balls has a strike rate of 200.00. The formula to calculate the strike rate is (runs ÷ balls) x 100.


IPL powerplay overs
Powerplay in an IPL game is a mandatory fielding restriction imposed on every fielding side for the first six overs of the 20 over innings. This means that only two fielders can stay outside the inner circle, making the batting comparatively easy. After the end of the powerplay , i.e., 6 overs at the beginning, the fielding team can keep up to 5 players outside the inner circle, and 4 compulsory fielders must remain inside the inner circle. source : https://www.cricindeed.com/the importance of powerplay overs in ipl/

Boundary
A boundary is the scoring of four or six runs from a single delivery, with the ball reaching or crossing the boundary of the playing field. source : https://en.wikipedia.org/wiki/Boundary_(cricket)#Scoring_runs
Look at the below wagon wheel of Chris Gayle's innings against England. The 6s are displayed in red,where the ball crossed the boundary without touching the ground. The 4s are the blue lines, where the ball crossed the boundary while touching the ground at least once.

image : SkySports official website

Findings for question 1

Who are the top 10 batsmen that thas the best strike rate that scored more than 1,000 runs? Table below

Who are the top 10 batsmen that thas the best strike rate that scored more than 1,000 runs? Chart below

The batsman with the best strike rate is Andre Russell aka Dre Rus

Source : IPL official website
Source : IPL official website

Findings for question 2

How many batsmen have scored 100 runs or more against a single bowler? Table below

There are 44 batsmen that have scored 100 runs or more against a single bowler. At the top of the list is Suresh Raina scoring 175 runs off Piyush Chawla.

Source : IPL official website

Findings for question 3

Which batsman has the highest boundary percentage per total runs in the powerplay overs? Table below

Source : IPL official website

Source : IPL official website

Sunil Narine scored 84.82%of all his runs in boundaries during the powerplay overs. Who would have thought that the "mystery bowler" would have a better boundary to total runs percentage than the "Universe Boss", Chris Gayle. Gayle has scored almost 4 times as many runs (2,541) than Narine(672) though.

Limitations

The findings are limited only tothe Indian Premier League. Most Indian and other subcontinent batsmen do nothave great batting records away from home.

The IPL is also the most exclusive domestic cricket tournament, thus attracting higher quality players.Comparing the results to other domestic tournaments would not be just.

Conclusions

The findings for questions 1 had no surprises. The names of AB DeVilliers, Chris Gayle, Andre Russell, Glenn Maxwell and Virender Sehwag will come to mind if you ask any avid cricket fan regarding best batting strike rates.

The findings for question 2 makes sense as well. This list is dominated by Indian players(74% (7 out of 11) of a team), top four batting positions(facing the most balls therefore scoring the most runs on average) and by spinners(bowlers that concede more runs on average than seam/swing bowlers).

The only surprise was the finding for question 3. Not the entire result but the player at the top of the list. Sunil Narine started his career as a spinner and batted at no.11. His batting improved and is now an opening batsman for the part couple of seasons. Most of the players in that list(Jayasuriya,Gayle, Gilchrist etc)are well know for being boundary hitting IPL opening batsmen.

Acknowledgements

Presentation were sent out to one friend, three family members and one co-worker.

Feedback were as follows:

One friend

“Perhaps add the strike rate at question 2 to explore the insights further. Does the batsman have a high strike rate againstthe bowler indicating that the batsman likes facing that bowler? Or are the runs scored at a lower or normal strike rate, which could mean that this batsman just happened to face a lot of balls from this bowler and while facing a lot of balls he happened to score a lot of runs.”

But my friend has been playing cricket for more than 20 years hence the technical question.

My friend appreciated the scatter plot in question 1 and the image in question 2 showing Raina and Chawla in the same image.

Three family members

Two of the three have been watching cricket for more than thirty years so they had quite a lot to say. The main ones were:

They liked the deep dive especially question 3;

majority of question 1 and questions 3’s results are foreign players while they only make up 36%(4 our of 11) of each team;

they would have liked to have a similar analysis of question 3 for the last 5 overs of the innings; and

they would have liked to have a similar analysis of question3 for the bowlers.

The last of the family members does not understand cricket at all and more explanation was needed for terms like “powerplay”and “fielding restrictions” but liked the fact that the answers were easy to spot.

One co-worker

My co-worker played a bit of street cricket so had very basic understanding and liked the wagon wheel image showing the difference between a four and a six.

References

Credit needs to go to :

the lecturers in this course(Ilkay Altintas and Leo Porter);

patrickb1912 for the kaggledataset;

the IPL and SkySports’ official websites for images;

https://www.cricindeed.com/the-importance-of-powerplay-overs-in-ipl/  on the explanation of the powerplay overs; and

https://en.wikipedia.org/wiki/Boundary_(cricket)#Scoring_runson the explanation of boundaries.