Lindley Coetzee

Naked Statistics by Charles Wheean: Part 1 - Normal distribution

Introduction

I re-read Naked Statistics by Charles Wheelan. This book has many practical examples so I decided to use some of those examples on other data. This may or may not be a start to a Naked Statistics series. If it is, then this will be part 1. All errors or omissions that you may encounter are my own and not the fault of the author of the book.

We will explore the heights and weights of hockey players. As always you can download the data sets as well by following the below link.

hockey

If you are interested in the code you check out my kaggle notebook here.

From the data we will try to get the

1.      Mean(average)

2.      Standard deviation

After calculating for the above, we can plot the normal distribution with percentages.

The stats

Let us first look at how the data are spread.

Formula to calculate mean height = Sum of heights ÷ Number of players

Formula to calculate mean weight = Sum of weights ÷ Number of players

The means (averages) for the players are below.

Height      182.04cm

Weight      83.74kg

Now that we have the mean, we can calculate the standard deviation. The standard deviation is a number that shows the dispersion around the mean. The formula for calculating the standard deviation is a bit more complex. I will cover this in a separate post here. The figures for standard deviation are show below.

 

Height-standard deviation      5.87

Weight-standard deviation     8.55

All that this means (pun intended) is the observations that fall within one standard deviations are between 176.17cm and 187.91 cm tall and weighs between 75.19kg and 92.29kg. See illustration below.

Now lets look at the normal distribution (bell curve).

Source : https://analystprep.com

What the normal distribution shows is that 68.3% of the observations lie within onstandard deviation of the mean, 95.5% of the observation lie within two standard deviations of the mean and 99.7% of the observations lie within threestandard deviations of the mean.

The results

We can test the above with our data. You can use Excel or Python to do this(I used both).

Findings for hockey heights

62.5% of observations fall within 1 standard deviations of the mean

96.0% of observations fall within 2 standard deviations of the mean

99.6% of observations fall within 3 standard deviations of the mean

 

Findings for hockey heights in plain English

62.5% of players are between 176.17cm and 187.91cm tall

96.0% of player sare between 170.30cm and 193.78cm tall

96.0% of players are between 164.43cm and 199.65cm tall

Findings for hockey weights

68.4% of observations fall within 1 standard deviations of the mean

95.0% of observations fall within 2 standard deviations of the mean

99.4% of observations fall within 3 standard deviations of the mean

 

Findings for hockey weights in plain English

62.5% of players are weigh between 75.19kg and 92.29kg

96.0% of players are weigh between 66.64kg and 100.84kg

96.0% of players are weigh between 58.09kg and 109.39kg

 

Conclusion

A word of caution. The normal distribution is only confined to the realms of Type 1 randomness which Nassim Taleb calls Mediocristan”. This is includes height,weight, IQ, calorie consumption etc.  The bell curve will not produce the best result when dealing with Type 2 randomness – “Extremistan”. This includes wealth, book sales per author, commodities, sizes of planets etc.

Mediocristan and Extremistan can be found in Nassim’s famous book The Black Swan.  A full explanation can be found in Chapter Three : THE SPECULATOR AND THE PROSTITUTE.

That is all for now. See you next time.