Lindley Coetzee

Naked Statistics by Charles Wheelan: Part 2 - Standard deviation

Introduction

As promised in the previous post, I will show how the standard deviation is calculated for our hockey players’ height and weight. We should arrive at the below figures.

Height-standard deviation      5.87

Weight-standard deviation      8.55

The distance from the mean

Let us have a look at our data first. I will only display the first 10 since our list contains 84,426 players.

The standard deviation is a number that shows the dispersion around the mean. Before we cancalculate the standard deviation, we need to calculate the difference from the mean and the variance.

We need to add two new columns to show the distance from the mean in height and the distance from the mean in weight. The distance from the mean is calculated as the absolute value. Therefore, the difference will always be positive. Let’s update our data set with our two new columns.

Our mean height and weight was 182.04cm and 83.74kg respectively. We can use Dave Taylor's data to check our calculations.

Dave Taylor height(cm)          183

Less mean (cm)                      182.04

Difference (cm)                       0.96    

 

Dave Taylor weight(kg)           88

Less mean (cm)                       83.74

Difference (cm)                        4.26                                        

 

The variance

Now we can solve for the variance. The formula for the varianceis shown below.

distance from the mean x distance from the mean

or

distance from the mean²

We square the distance from the mean for height and weight. Now let us update our data set with the additional two columns: variance in height and variance in weight.

The standard deviation

We can now finally calculate the standard deviation for height and weight.

We sum the column for variance in height (2,842,791.24) then divide that by the number of players (82,424). This will give us 34.489848112. Then you solve for the square root of 34.489848112. The answer comes to 5.872805813, which is 5.87(what we had in Part 1 of this series).

We can repeat the process to calculate the standard deviation in weight.

Sum of variance in weight:                            6,025,927.45.

Number of players                                          82,424

Variance in weight ÷ number of players        73.108893615

Square root                                                     8.550373887 = 8.55

Function for standard deviation in Excel and Python

Now after all of those calculations, there are functions in Excel and Python which calculates the standard deviation quickly and easily.

For Excel the function is

=STDEV.P(array).

For Python :

import statistics

statistics.stdev(df["variable"])

 

I hope this short example was useful and I will see you in the next post.