what do you Mean?

July 9, 2023

I watched as the Amazon delivery guy strutted up my driveway yesterday to bring me a package. I’ve never been so excited to one arrive, and guess what I got…

A math book.

Yeah, that might sound a bit crazy, but I bought the O’Reilly Practical Statistics for Data Scientists to hone my stats superpowers. The book is gonna take some time to finish, so it’s a good thing I’ve got this 66 Days of Math and Programming experiment.

Today, I’m writing about one of the key steps in exploratory data analysis: finding the “central tendency” for your data.

In case you don’t know what central tendency means, it’s searching for where most of the data is located. I’m about to give you some PTSD to high school math because the most common measure of central tendency is…. Drum roll…

The Mean.

You probably think your high school math teacher taught you all you needed to know about the mean, but I’m here to tell you you’re wrong. 100% wrong. Here are 3 types of means that matter in stats…

1) Mean

First, I’ll start with the basic mean you know from high school.

You find it by adding all the values in your data and dividing it by the number of values you add up.

Here’s the formula…

how to calculate the mean for a dataset

2) Weighted Mean

The weighted mean is calculated by adding all the values of your data multiplied by a weight. Then you divided that sum by the sum of all the weights.

This metric takes into account the values you have and moves the answer closer to where most of the data is. It helps you:

  1. Focus on more accurate data
  2. Equally represent groups in the dataset

Here’s the formula…

how to calculate the weighted mean for a dataset

3) Trimmed Mean

And last but not least, we’ve got the trimmed mean.

You find this by dropping a fixed number of values at both extremes then adding up those values and dividing by the number of values you added.

This is useful when you have non-robust data with extreme outliers on either side of the data.

Here’s the formula…

how to calculate the trimmed mean for a dataset