Using the Standard Deviation Formula on the GMAT
BY KARISHMA, VERITAS PREP
We have discussed standard deviation (SD) in detail above. We know what the formula is for finding the standard deviation of a set of numbers, but we also know that GMAT will not ask us to actually calculate the standard deviation because the calculations involved would be way too cumbersome. It is still a good idea to know this formula, though, as it will help us compare standard deviations across various sets – a concept we should know well.
Today, we will look at some GMAT questions that involve sets with similar standard deviations such that it is hard to tell which will have a higher SD without properly understanding the way it is calculated. Take a look at the following question:
Which of the following distribution of numbers has the greatest standard deviation? (A) {-3, 1, 2}
(B) {-2, -1, 1, 2}
(C) {3, 5, 7}
(D) {-1, 2, 3, 4}
(E) {0, 2, 4}
At first glance, these sets all look very similar. If we try to plot them on a number line, we will see that they also have similar distributions, so it is hard to say which will have a higher SD than the others. Let’s quickly review their deviations from the arithmetic means:
For answer choice A, the mean = 0 and the deviations are 3, 1, 2
For answer choice B, the mean = 0 and the deviations are 2, 1, 1, 2
For answer choice C, the mean = 5 and the deviations are 2, 0, 2
For answer choice D, the mean = 2 and the deviations are 3, 0, 1, 2
For answer choice E, the mean = 2 and the deviations are 2, 0, 2
We don’t need to worry about the arithmetic means (they just help us calculate the deviation of each element from the mean); our focus should be on the deviations. The SD formula squares the individual deviations and then adds them, then the sum is divided by the number of elements and finally, we find the square root of the whole term. So if a deviation is greater, its square will be even greater and that will increase the SD.
If the deviation increases and the number of elements increases, too, then we cannot be sure what the final effect will be – an increased deviation increases the SD but an increase in the number of elements increases the denominator and hence, actually decreases the SD. The overall effect as to whether the SD increases or decreases will vary from case to case.
First, we should note that answers C and E have identical deviations and numbers of elements, hence, their SDs will be identical. This means the answer is certainly not C or E, since Problem Solving questions have a single correct answer.
Let’s move on to the other three options:
For answer choice A, the mean = 0 and the deviations are 3, 1, 2
For answer choice B, the mean = 0 and the deviations are 2, 1, 1, 2
For answer choice D, the mean = 2 and the deviations are 3, 0, 1, 2
Comparing answer choices A and D, we see that they both have the same deviations, but D has more elements. This means its denominator will be greater, and therefore, the SD of answer D is smaller than the SD of answer A. This leaves us with options A and B:
For answer choice A, the mean = 0 and the deviations are 3, 1, 2
For answer choice B, the mean = 0 and the deviations are 2, 1, 1, 2
Now notice that although two deviations of answers A and B are the same, answer choice A has a higher deviation of 3 but fewer elements than answer choice B. This means the SD of A will be higher than the SD of B, so the SD of A will be the highest. Hence, our
answer must be A. This question is discussed
HERE.
Let’s try another one:
Which of the following data sets has the third largest standard deviation?(A) {1, 2, 3, 4, 5}
(B) {2, 3, 3, 3, 4}
(C) {2, 2, 2, 4, 5}
(D) {0, 2, 3, 4, 6}
(E) {-1, 1, 3, 5, 7}
How would you answer this question without calculating the SDs? We need to arrange the sets in increasing SD order. Upon careful examination, you will see that the number of elements in each set is the same, and the mean of each set is 3.
Deviations of answer choice A: 2, 1, 0, 1, 2
Deviations of answer choice B: 1, 0, 0, 0, 1 (lowest SD)
Deviations of answer choice C: 1, 1, 1, 1, 2
Deviations of answer choice D: 3, 1, 0, 1, 3
Deviations of answer choice E: 4, 2, 0, 2, 4 (highest SD)
Obviously, option B has the lowest SD (the deviations are the smallest) and option E has the highest SD (the deviations are the greatest). This means we can automatically rule these answers out, as they cannot have the third largest SD.
Deviations of answer choice A: 2, 1, 0, 1, 2
Deviations of answer choice C: 1, 1, 1, 1, 2
Deviations of answer choice D: 3, 1, 0, 1, 3
Out of these options, answer choice D has a higher SD than answer choice A, since it has higher deviations of two 3s (whereas A has deviations of two 2s). Also, C is more tightly packed than A, with four deviations of 1. If you are not sure why, consider this:
The square of deviations for C will be 1 + 1+ 1 + 1 + 4 = 8
The square of deviations for A will be 4 + 1 + 0 + 1 + 4 = 10
So, A will have a higher SD than C but a lower SD than D. Arranging from lowest to highest SD’s, we get: B, C, A, D, E. Answer choice A has the third highest SD, and therefore,
A is our answer. This question is discussed
HERE.
Although we didn’t need to calculate the actual SD, we used the concepts of the standard deviation formula to answer these questions.