Introduction
I re-read Naked Statistics by Charles Wheelan. This book has many practical examples so I decided to use some of those examples on other data. This may or may not be a start to a Naked Statistics series. If it is, then this will be part 1. All errors or omissions that you may encounter are my own and not the fault of the author of the book.
We will explore the heights and weights of hockey players. As always you can download the data sets as well by following the below link.
If you are interested in the code you check out my kaggle notebook here.
From the data we will try to get the
1. Mean(average)
2. Standard deviation
After calculating for the above, we can plot the normal distribution with percentages.
The stats
Let us first look at how the data are spread.
Formula to calculate mean height = Sum of heights ÷ Number of players
Formula to calculate mean weight = Sum of weights ÷ Number of players
The means (averages) for the players are below.
Height 182.04cm
Weight 83.74kg
Now that we have the mean, we can calculate the standard deviation. The standard deviation is a number that shows the dispersion around the mean. The formula for calculating the standard deviation is a bit more complex. I will cover this in a separate post here. The figures for standard deviation are show below.
Height-standard deviation 5.87
Weight-standard deviation 8.55
All that this means (pun intended) is the observations that fall within one standard deviations are between 176.17cm and 187.91 cm tall and weighs between 75.19kg and 92.29kg. See illustration below.
Now lets look at the normal distribution (bell curve).
Source : https://analystprep.com
What the normal distribution shows is that 68.3% of the observations lie within onstandard deviation of the mean, 95.5% of the observation lie within two standard deviations of the mean and 99.7% of the observations lie within threestandard deviations of the mean.
The results
We can test the above with our data. You can use Excel or Python to do this(I used both).
Findings for hockey heights
62.5% of observations fall within 1 standard deviations of the mean
96.0% of observations fall within 2 standard deviations of the mean
99.6% of observations fall within 3 standard deviations of the mean
Findings for hockey heights in plain English
62.5% of players are between 176.17cm and 187.91cm tall
96.0% of player sare between 170.30cm and 193.78cm tall
96.0% of players are between 164.43cm and 199.65cm tall
Findings for hockey weights
68.4% of observations fall within 1 standard deviations of the mean
95.0% of observations fall within 2 standard deviations of the mean
99.4% of observations fall within 3 standard deviations of the mean
Findings for hockey weights in plain English
62.5% of players are weigh between 75.19kg and 92.29kg
96.0% of players are weigh between 66.64kg and 100.84kg
96.0% of players are weigh between 58.09kg and 109.39kg
Conclusion
A word of caution. The normal distribution is only confined to the realms of Type 1 randomness which Nassim Taleb calls Mediocristan”. This is includes height,weight, IQ, calorie consumption etc. The bell curve will not produce the best result when dealing with Type 2 randomness – “Extremistan”. This includes wealth, book sales per author, commodities, sizes of planets etc.
Mediocristan and Extremistan can be found in Nassim’s famous book The Black Swan. A full explanation can be found in Chapter Three : THE SPECULATOR AND THE PROSTITUTE.
That is all for now. See you next time.