Understanding Averages: What are Estimates of Location?
In an effort to spread the love on data science, I’m going to try to tackle some of the most common concepts and keywords and re-explain them in my own words, without all the boring jargon. Join me.
As you begin your journey into data and data science, one of the first things you’ll learn about is exploratory data analysis, or EDA. Part of that data exploration is about understanding the average; but, because the “average” can be measured in different ways, its often referred to as estimates of location or measures of central tendency in order to capture the various formulas used for finding this “average”. Let’s take a look at some of them.
Mean
The mean is what we traditionally call the average and is what most of us are more familiar with. You take the sum of all values and divide it by the total number, or count, of those values.
Sadly, statisticians have decided this isn’t good enough.
Trimmed Mean
Consider a dataset with all employee salaries from one company. The CEO of that company may be making a lot more money than everyone else. This is known as an outlier. If we calculated the standard mean, or average, it would be slightly higher and may not paint an accurate picture of what the majority of the employees experience. Removing an extreme value like this would give a more truthful insight on what the average salary is at the company. The fact that the trimmed mean can account for these extreme values, or outliers, means it is more robust.
Weighted Mean
Another variation on the standard mean is the weighted mean. Here, you, the data specialist, would add a “weight” of your choosing to some data points before you calculate the mean. This could come up if you have data from a variety of sources and some sources are more important, less accurate, or underrepresented. You would add a weight to the data to increase or decrease the impact that source would have on the overall mean. For example, a teacher calculating a students overall grade for the year may add more weight to exam scores compared to quiz scores.
Median
Another way of finding the average is finding the literal middle of a dataset. You would first need to sort the data from smallest to biggest and then find the middle number. This would ignore things like the CEO’s high salary in a dataset (outliers) without using complicated math. If you have an even-numbered dataset, you would take the average of the middle two numbers. For example, a dataset consisting of the numbers 1 through 10 would have a median of 5.5 (the average of 5 and 6).
Weighted Median
We had a weighted mean, is there also a weighted median? Of course there is. For the same reasons you would use a weighted mean, a weighted median can be calculated in such a way that the sum of the weights of each half is equal. There’s no standard formula for this, and the method will depend on the context. One way to do it is by sorting the values from smallest to largest and then adding up the weights until you’ve hit half of them.
And there you have it. Consider yourself above average on finding the average.
If you enjoyed this and found it helpful (or totally unhelpful) please let me know by leaving a reaction or comment, or find me via LinkedIn.