Statistics Jargon

July 15, 2023

Yesterday was a boring day. I didn’t learn anything exciting enough to share here, so I’m gonna just freestyle today. I just spent some time reading my statistics book and decided to regurgitate everything I read here.

But to make it fun, everything I’m sharing is what I memorized.

I’m not going back to the book to double-check myself or fill in any gaps.

So without further ado, let’s see what definitions I remember…

Sample

A sample is a subset of a larger group of data.

Population

This is an entire set of data.

Size

The size of a population is denoted as N(n). While this number may not seem crucial to data analysis, it’s useful to find the population mean.

Random Sampling

Random sampling is when you create a sample set from the population set, where every element of the population has an equal probability of being chosen for the sample.

Selection Bias

Selection bias is when you consciously or unconsciously choose elements to be in your sample that create misleading outcomes.

A great example of this is the 1936 Literary Digest Presidential poll. They asked a “random sample” of their readers who they’re voting for in the election. And Alfred Landon seemed to absolutely destroy Franklin D Roosevelt.

But guess what…

They fell victim to selection bias.

It turns out the majority of their readers are upper-class citizens with a lot of money. So there was a bias that favored Alfred Landon. The sample set didn’t accurately represent the whole population, so it created a major misleading prediction.

Data Snooping

Data snooping is when you extensively hunt for interesting outcomes in your data. You try forcing specific results to happen which in turn leads to misleading outcomes.

Regression to Mean

Regression to mean states that extreme outliers are likely to move toward the mean.

What this means is that unlikely outcomes are not likely to occur again.

Here are two examples:

1. Sophomore slumps

In most major sports, the coveted Rookie of the Year tends to underperform in his/her second season because the baseline for their performance is set so high.

2. Tall Parents, Tall Children

The children of tall parents are likely to be tall too, but it’s unlikely that the children will be taller than the parents.