Author 
Message 
TAGS:

Hide Tags

Math Expert
Joined: 02 Sep 2009
Posts: 61385

Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 01:05
Statistics Made Easy  All in One Topic!The Meaning of Arithmetic Mean Let’s start today with statistics – mean, median, mode, range and standard deviation. The topics are simple but the fun lies in the questions. Some questions on these topics can be extremely tricky especially those dealing with median, range and standard deviation. Anyway, we will tackle mean today. So what do you mean by the arithmetic mean of some observations? I guess most of you will reply that it is the ‘Sum of Observations/Total number of observations’. But that is how you calculate mean. My question is ‘what is mean?’ Loosely, arithmetic mean is the number that represents all the observations. Say, if I know that the mean age of a group is 10, I would guess that the age of Robbie, who is a part of that group, is 10. Of course Robbie’s actual age could be anything but the best guess would be 10. Say, I tell you that the average age of a group of 10 people is 15 yrs. Can you tell me the sum of the ages of all 10 people? I am sure you will say that it is 10*15 = 150. You can think of it in two ways: Mean = Sum of all ages/No of people So Sum of all ages = Mean * (No of people) = 15*10 Or Since there are 10 people and each person’s age is represented by 15, the sum of their ages = 10*15. Basically, the total sum was distributed evenly among the 10 people and each person got 15 yrs. Now, let’s say you made a mistake. A boy whose age you thought was 20 was actually 30. What is the correct mean? Again, you can think of it in two ways: New sum = 150 + 10 = 160 New average = 160/10 = 16 Or You can say that there is an extra 10 that has to be distributed evenly among the 10 people, so each person gets 1 extra. Hence, the average becomes 15 + 1 = 16. As you might have guessed, we will work on the second interpretation. Let’s look at an example now. Example 1: The average age of a group of n people is 15 yrs. One more person aged 39 joins the group and the new average is 17 yrs. What is the value of n?(A) 9 (B) 10 (C) 11 (D) 12 (E) 13 Solution: First tell me, if the age of the additional person were 15 yrs, what would have happened to the average? The average would have remained the same since this new person’s age would have been the same as the age that represents the group. But his age is 39 – 15 = 24 more than the average. We know that we need to evenly split the extra among all the people to get the new average. When 24 is split evenly among all the people (including the new guy), everyone gets 2 extra (since average age increased from 15 to 17). There must be 24/2 = 12 people now (including the new guy) i.e. n must be 11 (without including the new guy). This question is discussed HERE. Let’s look at another similar example though a little trickier. Try solving it on your own first. If not logically, try using the formula approach. Then see how elegant the solution becomes once you start ‘thinking’ instead of just ‘calculating’. Example 2: When a person aged 39 is added to a group of n people, the average age increases by 2. When a person aged 15 is added instead, the average age decreases by 1. What is the value of n?(A) 7 (B) 8 (C) 9 (D) 10 (E) 11 Solution: What is the first thing you can say about the initial average? It must have been between 39 and 15. When a person aged 39 is added to the group, the average increases and when a person aged 15 is added, the average decreases. Let’s look at the second case first. When the person aged 15 is added to the group, the average becomes (initial average – 1). If instead, the person aged 39 were added to the group, there would be 39 – 15 = 24 extra which would make the average = (initial average + 2). This difference of 24 creates a difference of 3 in the average. This means there must have been 24/3 = 8 people (after adding the extra person). The value of n must be 8 – 1 = 7. This question is discussed HERE. If you use the formula instead, it would take you quite a while to manipulate the two variables to get the value of n. I hope you see the beauty of this method. Next week, we will discuss some GMAT questions based on Arithmetic mean!
_________________




Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 01:17
Some Mean Questions I hope the theory of arithmetic mean we discussed above is clear to you. Let’s see the theory in action today. I will pick some mean questions from various sources ( Official Guide, GMAT prep tests, etc.) and we will try to use the concepts we learned last week to solve them. Let’s start with a simple question. Question 1: For the past n days the average daily production at a company was 60 units. If today’s production of 100 units raises the average to 65 units per day, what is the value of n?(A) 30 (B) 18 (C) 10 (D) 9 (E) 7 Solution: If today’s production were also 60 units, what would have happened to the average? Obviously, it would have stayed the same! But today’s production is 40 units extra and hence it raised the average. It raised the average by 5 units which means that each one of the n observations and today’s observation got an extra 5. Since 40 got distributed and each was given 5, there must have been a total of 40/5 = 8 observations including today’s. Therefore, the value of n must have been 8 – 1 = 7. Answer (E) This question is discussed HERE. I know you can solve the question using the formula of averages. In fact, you can solve every question using the formula and working out the values. But the point is that the logical method helps you solve the question very quickly and you are less likely to make calculation errors since there aren’t too many calculations to perform! Let’s go on now. Question 2: When Anna makes a contribution to a charity fund at school, the average contribution size increases by 50%, reaching $75 per person. If there were 5 other contributions made before Anna’s, what is the size of her donation?(A) $100 (B) $150 (C) $200 (D) $250 (E) $450 Solution: After Anna’s contribution, the average size increases by 50% and reaches $75. What must have been the average size of contribution before Anna’s donation? It must have been $50 since a 50% increase would lead us to $75. So, $50 was the average size of 5 donations before Anna made her donation. Had Anna donated $50 as well, the average would have stayed the same i.e. $50. But the average increased to $75 which means that Anna donated an extra $25 for each of the 6 observations (including her) in addition to the $50 she would have donated to keep the average same. Hence, the amount Anna donated = 50 + 6*25 = $200 Answer (C) This question is discussed HERE. Again, this was a relatively straight forward question. Let’s look at a tricky one now. Question 3: A set of numbers has an average of 50. If the largest element is 4 greater than 3 times the smallest element, which of the following values cannot be in the set?(A) 85 (B) 90 (C) 123 (D) 150 (E) 155 Solution: This question might look a little ominous but it isn’t very tough, really! The set has an average of 50 so that already tells us that we can represent each element of the set by 50. If there is an element which is a little less than 50, there will be another element which is a little more than 50. The largest element is 4 greater than 3 times the smallest element so L = 4 + 3S. The smallest element must be less than 50 and the largest must be greater than 50. Say, if the smallest element is 20, the largest will be 4 + 3*20 = 64. Is there any limit imposed on the largest value of the largest element? Yes, because there is a limit on the largest value of the smallest element. The smallest element must be less than 50. The smallest member of the set can be 49.9999… The limiting value of the smallest number is 50. As long as the smallest number is a tiny bit less than 50, you can have the greatest number a tiny bit less than 4 + 3*50 = 154. The number 154 and all numbers greater than 154 cannot be a part of the set. Say if the smallest element is 49, the largest element will be 4 + 3*49 = 151. So the set could look something like this: S = {49, 49, 49, 49, … (101 times to balance out the extra 101 in 151), 50, 50, 151} Only option (E) cannot be a part of the set. This question is discussed HERE. These were some of the basic (and not so basic) questions of mean that we could come across in GMAT. We will look at some more stats concepts in next post. Till then, keep practicing!
_________________




Intern
Joined: 06 Jul 2018
Posts: 2

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
02 Mar 2019, 09:29
Bunuel you are the best thing that happened to me on this GMAT quants journey. God bless you BIG!




Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 01:20
Finding Arithmetic Mean Using Deviations In this post is again focused on arithmetic mean. Let’s start our discussion by considering the case of arithmetic mean of an arithmetic progression. We will start with an example. What is the mean of 43, 44, 45, 46, 47? (Hint: If you are thinking about adding the numbers, that’s not the way I want you to go.) As we discussed in our previous posts, arithmetic mean is the number that can represent/replace all the numbers of the sequence. Notice in this sequence, 44 is one less than 45 and 46 is one more than 45. So essentially, two 45s can replace both 44 and 46. Similarly, 43 is 2 less than 45 and 47 is 2 more than 45 so two 45s can replace both these numbers too. The sequence is essentially 45, 45, 45, 45, 45. Hence, the arithmetic mean of this sequence must be 45! (If you have doubts, you can calculate and find out.) It makes sense, doesn’t it? The middle number in the sequence of consecutive positive integers will be the mean. The deviations of all numbers to the left of the middle number will balance out the deviations of all the numbers to the right of the middle number. (In this post, we will assume that the given numbers are in increasing/decreasing order. If that is not the case, you can always put them in increasing order and use these concepts.) Once again, what is the mean of 192, 193, 194, 195, 196, 197, 198? It is 195 since it is the middle number! Ok, what about 192, 193, 194, 195, 196, 197? What is the mean in this case? There is no middle number here since there are 6 numbers. The mean here will be the middle of the two middle numbers which is 194.5 (the middle of the third and the fourth number). It doesn’t matter that 194.5 is not a part of this list. If you think about it, arithmetic mean of some numbers needn’t be one of the numbers. What about 71, 73, 75, 77, 79? What will be the mean in this case? Even though these numbers are not consecutive integers, the difference between two adjacent numbers in the list is the same (it is an arithmetic progression). So the deviations of the numbers on the left of the middle number will cancel out the deviations of the numbers on the right of the middle number (71 is 4 less than 75 and 79 is 4 more than 75. 73 is 2 less than 75 and 77 is 2 more than 75). Hence, the mean here will be 75 (just like our first example). Just to reinforce: 102, 106, 110 –> Mean = 106 102, 106, 110, 114 > Mean = 108 (Middle of the second and third numbers) Let’s twist this concept a little now. What is the mean of 36, 40, 42, 43, 44, 47? This is not an arithmetic progression. So do we need to sum and then divide by 6 to get the mean? Not so fast! Let’s try and use the deviations concept we have just learned. Given sequence: 36, 40, 42, 43, 44, 47 It seems that the mean would be around 42, right? Some numbers are less than 42 and others are more than 42. 36 is 6 less than 42. 40 is 2 less than 42. Overall, the numbers less than 42 are 6+2 = 8 less than 42. 43 is 1 more than 42. 44 is 2 more than 42. 47 is 5 more than 42 Overall, the numbers more than 42 are 1+2+5 = 8 more than 42. The deviations of the numbers less than 42 get balanced out by deviations of the numbers greater than 42! Hence, the average must be 42. This method is especially useful in cases involving big numbers which are close to each other. Example 1: What is the average of 452, 453, 463, 467, 480, 499, 504?What would you say the average is here? Perhaps, around 470? Let’s see: 452 is 18 less than 470. 453 is 17 less than 470. 463 is 7 less than 470. 467 is 3 less than 470. Overall, the numbers less than 470 are 18 + 17 + 7 + 3 = 45 less. 480 is 10 more than 470. 499 is 29 more than 470. 504 is 34 more than 470. Overall, the numbers more than 470 are 10 + 29 + 34 = 73 more than 470. The shortfall is not balanced by the excess. There is an excess of 73 – 45 = 28. So what is the average? If we assume the average of these 7 numbers to be 470, there is an excess of 28. We need to distribute the excess evenly among all the numbers and hence the average will increase by 28/7 = 4. (Go back to the first post on arithmetic mean if this is not clear.) Hence, the required mean is 470 + 4 = 474. (If we had assumed the mean to be 474, the shortfall would have balanced the excess.) Let’s go through one more example using this concept: Example 2: What is the mean of 99, 103, 104, 109, 120, 123, 128, 130?Let’s start by guessing a mean for this sequence. Say, around 115? Let’s see if the shortfall is balanced by the excess. 99 is 16 less, 103 is 12 less, 104 is 11 less and 109 is 6 less than 115. Overall shortfall = 16 + 12 + 11 + 6 = 45 120 is 5 more, 123 is 8 more, 128 is 13 more and 130 is 15 more than 115. Overall excess = 5 + 8 + 13 + 15 = 41 We are close, but not quite there yet! There is a shortfall of 4. Since there are a total of 8 numbers, the average must be 4/8 = 0.5 less than 115. Hence, the average here is 114.5 Once you get a hang of this method and understand what you are doing, it is much faster than adding all the big numbers and then dividing the sum since you only deal with small numbers in this method. Let’s wrap up this post here. In the next post, we will see these concepts in action!
_________________



Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 01:49
Application of Arithmetic Means In the above post we discussed arithmetic means of arithmetic progressions in GMAT math problems. Now, let’s see those concepts in action. Question 1: If x is the sum of the even integers from 200 to 600 inclusive, and y is the number of even integers from 200 to 600 inclusive, what is the value of x + y?(A) 200*400 (B) 201*400 (C) 200*402 (D) 201*401 (E) 400*401 Solution:There are various ways of getting the answer here. We will use the concepts we learned last week. The given sequence is 200, 202, 204, … 600 It is an arithmetic progression. What is the total number of terms here? You can use one of two methods to get the number of terms here: Method 1: Using Logic In every 100 consecutive integers, there are 50 odd integers and 50 even integers. So we will get 50 even integers from each of 200 – 299, 300 – 399, 400 – 499 and 500 – 599 i.e. a total of 50*4 = 200 even integers. Also, since the sequence includes 600, number of even integers = 200 + 1 = 201 Method 2:Recall that in our arithmetic progressions post, we saw that the last term of a sequence which has n terms will be first term + (n – 1)* common difference. \(600 = 200 + (n – 1)*2\) \(n = 201\) Hence \(y = 201\) (because y is the number of even integers from 200 to 600) Let’s go on now. What is the average of the sequence? Since it is an arithmetic progression with odd number of integers, the average must be the middle number i.e. 400. Notice that since this arithmetic progressions looks like this: (n – m), … (n – 6), (n – 4), ( n – 2), n, (n + 2), (n + 4), (n + 6), … (n + m) We can find the middle number i.e. the average by just averaging the first and the last terms. \(\frac{(n – m) + (n + m)}{2} = \frac{2n}{2} = n\) \(Average = \frac{(200 + 600)}{2} = 400\) Sum of all terms in the sequence = x = Arithmetic Mean * Number of terms = 400*201 \(x + y = 400*201 + 201 = 401*201\) Answer (D) This question is discussed HERE. This question was simple. You could have found the sum using the formula \(\frac{n}{2}*(2a + (n1)d)\) that we saw in the AP post. But this method is more intuitive since if you don’t want to, you don’t have to use any formula here. Anyway, let’s go on to our second question for today. Question 2: The sum of n consecutive positive integers is 45. What is the value of n?Statement I: n is even Statement II: n < 9 Solution: First I will give the solution of this question and then discuss the logic used to solve it. In how many ways can you write n consecutive integers such that their sum is 45? Let’s see whether we can get such numbers for some values of n. n = 1 > Numbers: 45 n = 2 > Numbers: 22 + 23 = 45 n = 3 > Numbers: 14 + 15 + 16 = 45 n = 4 > No such numbers n = 5 > Numbers: 7 + 8 + 9 + 10 + 11 = 45 n = 6 > Numbers: 5 + 6 + 7 + 8 + 9 + 10 = 45 Let’s stop right here. Statement I: n must be even. n could be 2 or 6. Statement I alone is not sufficient. Statement II: n < 9 n can take many values less than 9 hence statement 2 alone is not sufficient. Both statements together: Since n can take values 2 or 6 which are even and less than 9, both statements together are not sufficient. Answer (E) This question is discussed HERE. Now, the interesting thing is how do we get these numbers for different values of n. How do we know the values that n can take? It’s pretty easy really. Follow my thought here. Of course, n can be 1. In that case we have only one number i.e. 45. n can be 2. Why? When we divide 45 by 2, we get 22.5. Since 2*22.5 is 45, we have to find 2 consecutive integers such that their arithmetic mean is 22.5. The integers are obviously 22 and 23. n can be 3. When we divide 45 by 3, we get 15. So we need 3 consecutive integers such that their mean is 15. They are 14, 15, 16. When we divide 45 by 4, we get 11.25. Do we have 4 consecutive integers such that their mean is 11.25? No, because mean of even number of consecutive integers is always of the form x.5. n can be 5. When we divide 45 by 5, we get 9 so we need 5 consecutive integers such that their mean is 9. They must be 7, 8, 9, 10, 11. n can be 6. When we divide 45 by 6, we get 7.5. We need 6 consecutive integers such that their mean is 7.5. The integers are 5, 6, 7, 8, 9, 10 Obviously, we just need to focus on getting 2 even values of n which are less than 9. So we check for 2, 4 and 6 and we immediately know that the answer is (E). We don’t have to do this process for all numbers less than 9 and we don’t have to do it for odd values of n. We will move on to median in the next post. Till then, keep practicing!
_________________



Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 01:59
Means Questions on Median As promised, we discuss medians here! Conceptually, the median is very simple. It is just the middle number. Arrange all the numbers in increasing/decreasing order and the number you get right in the middle, is the median. So it is quite straight forward when you have odd number of numbers since you have a “middle” number. What about the case when you have even number of numbers? In that case, it is just the average of the two middle numbers. Median of [2, 5, 10] is 5 Median of [3, 78, 102, 500] is \(\frac{(78 + 102)}{2} = 90\) If it’s that simple, why are we discussing it? – because it isn’t “that simple”! Conceptually it is, but when the test writers make questions using median and arithmetic mean together, they make some very mean questions! I will show you with an example, but first, we will look at a simpler question. Question 1: A, B and C have received their Math midterm scores today. They find that the arithmetic mean of the three scores is 78. What is the median of the three scores?(1) A scored a 73 on her exam. (2) C scored a 78 on her exam. Solution: Recall from the arithmetic mean post that the sum of deviations of all scores from the mean is 0. i.e. if one score is less than mean, there has to be one score that is more than the mean. e.g. If mean is 78, one of the following must be true: All scores are equal to 78. At least one score is less than 78 and at least one is greater than 78. For example, if one score is 70 i.e. 8 less than 78, another score has to make up this deficit of 8. Therefore, there could be a score that is 86 (8 more than 78) or there could be two scores of 82 each etc. Statement 1: A scored 73 on her exam. For the mean to be 78, there must be at least one score higher than 78. But what exactly are the other two scores? We have no idea! Various cases are possible: 73, 78, 83 or 73, 74, 87 or 70, 73, 91 etc. In each case, the median will be different. Hence this statement alone is not sufficient. Statement 2: C scored 78 on her exam. Now we know that one score is 78. Either the other two will also be 78 or one will be less than 78 and the other will be greater than 78. In either case, 78 will be the middle number and hence will be the median. This statement alone is sufficient. Answer (B) This question is discussed HERE. Were you tempted to say (C) is the answer? I hope this question shows you that median can be a little tricky. Let’s go on to the tougher question now. Question 2: Five logs of wood have an average length of 100 cm and a median length of 116 cm. What is the maximum possible length, in cm, of the shortest piece of wood?(A) 50 (B) 76 (C) 84 (D) 96 (E) 100 Solution:First thing that comes to mind – median is the 3rd term out of 5 so the lengths arranged in increasing order must look like this: ___ ___ 116 ___ ___ The mean is given and we need to maximize the smallest number. Basically, the smallest number should be as close to the mean as possible. This means the greatest number should be as close to the mean as possible too (if the shortfall deviation is small, the excess deviation should by equally small). If this doesn’t make sense, think of a set with mean 20: 19, 20, 21 (smallest number is very close to mean; greatest number is very close to the mean too) 1, 20, 39 (smallest number is far away from the mean, greatest number is far away too) Using the same logic, let’s make the greater numbers as small as possible (so the smallest number can be as large as possible). The two greatest numbers should both be at least 116 (since 116 is the median). Now the lengths arranged look like this: ___ ___ 116 116 116 Since the mean is 100 and each of the 3 large numbers are already 16 more than 100 i.e. total 16*3 = 48 more than the mean (excess deviation is 48), the deviations of the two small numbers should be a total of 48 less than the mean. To make the smallest number as great as possible, each of the small numbers should be 48/2 = 24 less than the mean i.e. they both should be 76. Answer (B). This question is discussed HERE. Hopefully, it made sense to you.
_________________



Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 02:08
A Range of Questions Let’s discuss the idea of “range” today. It is simply the difference between the smallest and the greatest number in a set. Consider the following examples: Range of {2, 6, 10, 25, 50} is 50 – 2 = 48 Range of {20, 100, 80, 30, 600} is 600 – (20) = 620 and so on… That’s all the theory we have on the concept of range! So let’s jump on to some questions now (therein lies the challenge)! Question 1: Which of the following cannot be the range of a set consisting of 5 odd multiples of 9?(A) 72 (B) 144 (C) 288 (D) 324 (E) 436 Solution:There are infinite possibilities regarding the multiples of 9 that can be included in the set. The set could be any one of the following (or any one of the other infinite possibilities): S = {9, 27, 45, 63, 81} or S = {9, 63, 81, 99, 153} or S = {99, 135, 153, 243, 1071} The range in each case will be different. The question asks us for the option that ‘cannot’ be the range. Let’s figure out the constraints on the range. A set consisting of only odd multiples of 9 will have a range that is an even number (Odd Number – Odd Number = Even number) Also, the range will be a multiple of 9 since both, the smallest and the greatest numbers, will be multiples of 9. So their difference will also be a multiple of 9. Only one option will not satisfy these constraints. Do you remember the divisibility rule of 9? The sum of the digits of the number should be divisible by 9 for the number to be divisible by 9. The sum of the digits of 436 is 4 + 3 + 6 = 13 which is not divisible by 9. Hence 436 cannot be divisible by 9 and therefore, cannot be the range of the set. Answer (E). This question is discussed HERE. On to another one now: Question 2: If the arithmetic mean of n consecutive odd integers is 20, what is the greatest of the integers?(1) The range of the n integers is 18. (2) The least of the n integers is 11. Solution: We have discussed mean in case of arithmetic progressions in the previous posts. If mean of consecutive odd integers is 20, what do you think the integers will look like? 19, 21 or 17, 19, 21, 23 or 15, 17, 19, 21, 23, 25 or 13, 15, 17, 19, 21, 23, 25, 27 or 11, 13, 15, 17, 19, 21, 23, 25, 27, 29 etc. Does it make sense that the required numbers will represent one such sequence? The numbers in the sequence will be equally distributed around 20. Every time you add a number to the left, you need to add one to the right to keep the mean 20. The smallest sequence will have 2 numbers 19 and 21, the largest will have infinite numbers. Did you notice that each one of these sequences has a unique “range,” a unique “least number” and a unique “greatest number?” So if you are given any one statistic of the sequence, you will know the entire desired sequence. Statement 1: Only one possible sequence: 11, 13, 15, 17, 19, 21, 23, 25, 27, 29 will have the range 18. The greatest number here is 29. This statement alone is sufficient. Statement 2: Only one possible sequence: 11, 13, 15, 17, 19, 21, 23, 25, 27, 29 will have 11 as the least number. The greatest number here is 29. This statement alone is sufficient too. Answer (D). This question is discussed HERE. Note that you don’t actually have to find the exact sequence. All you need to understand is that each sequence will have a unique “range” and a unique “least number.”
_________________



Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 02:17
Dealing with Standard Deviation In this post, we will work our way through the concepts of Standard Deviation (SD). Let’s take a look at how you calculate standard deviation first: \(A_i\) – The numbers in the list \(A_{avg}\) – Arithmetic mean of the list \(n\) – Number of numbers in the list Say you have 3 numbers : 11, 13 and 15. Their standard deviation is the “square root of the average of their squared deviations from the arithmetic mean.” Let’s see what we mean by this. Mean of 11, 13 and 15 is 13. Focus on these words: “deviations from mean” The important point to note is that SD is a measure of dispersion or deviation from the mean (the mean is approximately the middle of the list if there are no outliers). In other words, SD is a measure of whether the numbers are very far away from the mean or close together. Since GMAT isn’t calculation intensive, you probably won’t need to calculate the actual SD in the test. The calculations are shown here only to illustrate the concept. But you must have a feel for how the numbers are distributed around the mean and what that implies for the SD. Your statistics book explains how to visualize SD using the number line in detail, therefore, I am not going to delve deep into it but will quickly recap so that we can move ahead. Recall that if you plot the numbers on the number line, it gives you a sense of how far the numbers are from the mean. The farther the numbers, higher is the SD. Let’s check out a few different cases to internalize the SD concept. Do not calculate anything in these questions. Just look at the number line for each case and figure out whether it makes sense to you. Question: Which set, S or T, has higher SD?Case 1: S = {3, 3, 3} or T = {0, 10, 20} Case 2: S = {3, 4, 5} or T = {5, 6, 7} Case 3: S = {3, 4, 5, 6} or T = {2, 3, 4, 5, 6, 7} Case 4: S = {1, 3, 5} or T = {1, 1, 3, 5, 5} Case 5: S = {1, 3, 5} or T = {1, 3, 3, 5} Case 6: S = {6, 8, 10} or T = {12, 16, 20} Case 7: S = {6, 8, 10} or T = {3, 4, 5} Let me represent the first four cases on the number line. Check them out and then think which set should have the higher SD. Let’s discuss each of these four cases now. Case 1: S = {3, 3, 3} or T = {0, 10, 20} T has higher SD. We will obtain the SD of T by calculating as shown in the example above. But we don’t really need to calculate it because we see that for set S, SD = 0. Each number is at the mean and hence has 0 deviation from the mean. Since SD cannot be negative, whatever the SD of T, it will be higher than the SD of S which is 0. Case 2: S = {3, 4, 5} or T = {5, 6, 7} Both sets have the same SD. We can see from the number line that they are equally dispersed around their respective means. Case 3: S = {3, 4, 5, 6} or T = {2, 3, 4, 5, 6, 7} Set T has higher SD. T has two extra numbers which are farther from the mean. Hence these 2 numbers will add to the total deviation. (There is a caveat here which we will discuss next week.) Case 4: S = {1, 3, 5} or T = {1, 1, 3, 5, 5} T has higher SD. It has two extra numbers far from the mean. (There is a caveat here too!) What do you think about cases 5, 6, and 7? I will give you the answers to these three cases in the next post! Attachment:
June4_2011_Image1.jpg [ 6.92 KiB  Viewed 142633 times ]
Attachment:
June4_2011_Image2.jpg [ 14.08 KiB  Viewed 143644 times ]
Attachment:
June4_2011_Image3.jpg [ 29.92 KiB  Viewed 144082 times ]
_________________



Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 02:25
Dealing with Standard Deviation II In this post, we pick from where we left in the post above. Let’s discuss the last 3 cases first. Question: Which set, S or T, has higher SD?Case 5: S = {1, 3, 5} or T = {1, 3, 3, 5} The standard deviation (SD) of T will be less than the SD of S. Why? The mean of 1, 3 and 5 is 3. If you add another 3 to the list, the mean stays the same and the sum of the squared deviations is also the same but the number of elements increases. Hence, the SD decreases. Case 6: S = {6, 8, 10} or T = {12, 16, 20} Put the numbers on the number line. You will see that the SD of T is greater than the SD of S. When you multiply each element of a set by the same number (T is obtained by multiplying each element of S by 2), the SD increases. Case 7: S = {6, 8, 10} or T = {3, 4, 5} Put the numbers on the number line. You will see that the SD of T is less than the SD of S. When you divide each element of a set by the same number (T is obtained by dividing each element of S by 2 OR you can say that S is obtained by multiplying each element of T by 2), the SD decreases. Now that we have an understanding of how SD behaves, let’s look at a question. Question 1: A certain list of 300 test scores has an arithmetic mean of 75 and a standard deviation of d, where d is positive. Which of the following two test scores, when added to the list, must result in a list of 302 test scores with a standard deviation less than d?(A) 75 and 80 (B) 80 and 85 (C) 70 and 75 (D) 75 and 75 (E) 70 and 80 Solution: As discussed above, the standard deviation of a set measures the deviation from the mean. A low standard deviation indicates that the data points are very close to the mean whereas a high standard deviation indicates that the data points are spread far apart from the mean. When we add numbers that are far from the mean, we are stretching the set and hence, increasing the SD. When we add numbers which are close to the mean, we are shrinking the set and hence, decreasing the SD. Therefore, adding two numbers which are closest to the mean will shrink the set the most, thus decreasing SD by the greatest amount. Numbers closest to the mean are 75 and 75 (they are equal to the mean) and thus adding them will decrease SD the most. Answer: D. This question is discussed HERE. Now that we have seen that difficult looking questions on SD can be quite simple, I want you to think about something – when you add some new numbers to a set, how do you decide whether SD increases or decreases? If you notice, we have seen two different cases (case 4 and case 5) – in one of them SD increases when you add two numbers to the set and in the other, SD decreases. So how do you decide whether SD will increase or decrease? Say, what happens in case S = {3, 4, 5, 6, 7} and T = {3, 4, 4, 5, 6, 6, 7}? Will SD increase or decrease in this case? How do you decide the point at which the increase in the numerator offsets the increase in the denominator? Meanwhile, let’s look at one more question. Question 2: If 100 is included in each of sets A, B and C (given A= {30, 50, 70, 90, 110}, B = {20, 10, 0, 10, 20} and C= {30, 35, 40, 45, 50}), which of the following represents the correct ordering (largest to smallest) of the sets in terms of the absolute increase in their standard deviation?(A) A, C, B (B) A, B, C (C) C, A, B (D) B, A, C (E) B, C, A Solution: The question looks a little convoluted but actually you don’t have to calculate anything. SD measures the deviation of the elements from the mean. If a new element is added which is far away from the mean, it will add much more to the deviations than if it were added close to the mean. The means of A, B and C are 70, 0 and 40, respectively. 100 is farthest from 0 so it will change the SD of set B the most (in terms of absolute increase). It is closest to 70 so it will change the SD of set A the least. Hence the correct ordering is B, C, A. Answer (E) This question is discussed HERE. Simple enough, right? SD questions are generally straight forward once you understand the basics well.
_________________



Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 02:31
Some Tricky Standard Deviation Questions In the above post, we promised you a couple of tricky standard deviation (SD) GMAT questions. We start with a 600700 level question and then look at a 700 – 800 level one. Question 1: During an experiment, some water was removed from each of the 8 water tanks. If the standard deviation of the volumes of water in the tanks at the beginning of the experiment was 20 gallons, what was the standard deviation of the volumes of water in the tanks at the end of the experiment?Statement 1: For each tank, 40% of the volume of water that was in the tank at the beginning of the experiment was removed during the experiment. Statement 2: The average volume of water in the tanks at the end of the experiment was 80 gallons. Solution:We have 8 water tanks. This implies that we have 8 elements in the set (volume of water in each of the 8 tanks). SD of the volume of water in the tanks is 20 gallons. We need to find the new SD i.e. the SD after water was removed from the tanks. Statement 1: For each tank, 40% of the volume of water that was in the tank at the beginning of the experiment was removed during the experiment. Initial SD is 20. When 40% of the water is removed from each tank, the leftover water is 60% of the initial volume of water i.e. 0.6*initial volume of water. This means that each element of the initial set was multiplied by 0.6 to obtain the new set. The SD will change. It will become 0.6*previous SD i.e. 0.6*20 = 12 (think of the formula of SD we discussed in the first SD post). This statement alone is sufficient. Statement 2: The average volume of water in the tanks at the end of the experiment was 80 gallons. The average volume doesn’t give us the SD of the new set. Hence, this statement alone is not sufficient. Answer (A) This question is discussed HERE. Now that we are done with the easier one, let’s go on to the tougher one. Question 2: M is a collection of four odd integers. The range of set M is 4. How many distinct values can standard deviation of M take?(A) 3 (B) 4 (C) 5 (D) 6 (E) 7 Solution:Since the range of M is 4, it means the greatest difference between any two elements is 4. One way of doing this will be M = {1, x, y, 5} (obviously, there are innumerable ways of writing M) Here, x and y can take one of 3 different values: 1, 3 and 5 (x and y cannot be less than 1 or greater than 5 because the range of the set is 4). Both x and y could be same. This can be done in 3 ways. Or x and y could be different. This can be done in 3C2 = 3 ways. Total x and y can take values in 3 + 3 = 6 ways. (Note here that the number of ways in which you can select x and y is not 3*3 = 9. Why?) For clarification, let me enumerate the 6 ways in which you can get the desired set: {1, 1, 1, 5}, {1, 3, 3, 5}, {1, 5, 5, 5}, {1, 1, 3, 5}, {1, 1, 5, 5}, {1, 3, 5, 5} Note here that standard deviations of {1, 1, 1, 5} and {1, 5, 5, 5} are same. Why? Because SD measures deviation from mean. It has nothing to do with the actual value of mean and actual value of numbers. Mean of {1, 1, 1, 5} is 2. Three of the numbers are distance 1 away from mean and one number is distance 3 away from mean. Mean of {1, 5, 5, 5} is 4. Three of the numbers are distance 1 away from mean and one number is distance 3 away from mean. Sum of the squared deviations will be the same in both the cases and the number of elements is also the same in both the cases. Therefore, both these sets will have the same SD. Similarly, {1, 1, 3, 5} and {1, 3, 5, 5} will have the same SD. From the leftover sets, {1, 3, 3, 5} will have a distinct SD and {1, 1, 5, 5} will have a distinct SD. In all, there are 4 different values that SD can take in such a case. Note: It doesn’t matter what the actual numbers are. Since we have found 4 distinct values for SD, we will always have 4 distinct values of SD for a set under the given constraints. Answer (B) This question is discussed HERE. Hope the question was fun for you too!
_________________



Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 02:38
3 Important Concepts for Statistics Questions on the GMAT We have discussed these three concepts of statistics in detail: – Arithmetic mean is the number that can represent/replace all the numbers of the sequence. It lies somewhere in between the smallest and the largest values. – Median is the middle number (in case the total number of numbers is odd) or the average of two middle numbers (in case the total number of numbers is even). – Standard deviation is a measure of the dispersion of the values around the mean. A conceptual question is how these three measures change when all the numbers of the set are varied is a similar fashion. For example, how does the mean of a set change when all the numbers are increased by say, 10? How does the median change? And what about the standard deviation? What happens when you multiply each element of a set by the same number? Let’s discuss all these cases in detail but before we start, we would like to point out that the discussion will be conceptual. We will not get into formulas though you can arrive at the answer by manipulating the respective formulas. When you talk about mean or median or standard deviation of a list of numbers, imagine the numbers lying on the number line. They would be spread on the number line in a certain way. For example, ——0—a———b—c———————d———e————————f—g———————Case I:When you add the same positive number (say x) to all the elements, the entire bunch of numbers moves ahead together on the number line. The new numbers a’, b’, c’, d’, e’, f’ and g’ would look like this ——0——————a’———b’—c’———————d’———e’————————f’—g’——————The relative placement of the numbers does not change. They are still at the same distance from each other. Note that the numbers have moved further to the right of 0 now to show that they have moved ahead on the number line. The mean lies somewhere in the middle of the bunch and will move forward by the added number. Say, if the mean was d, the new mean will be \(d’ = d + x\). So when you add the same number to each element of a list, New mean = Old mean + Added number.On similar lines, the median is the middle number (d in this case) and will move ahead by the added number. The new median will be \(d’ = d + x\) So when you add the same number to each element of a list, New median = Old median + Added numberStandard deviation is a measure of dispersion of the numbers around the mean and this dispersion does not change when the whole bunch moves ahead as it is. Standard deviation does not depend on where the numbers lie on the number line. It depends on how far the numbers are from the mean. So standard deviation of 3, 5, 7 and 9 is the same as the standard deviation of 13, 15, 17 and 19. The relative placement of the numbers in both the cases will be the same. Hence, if you add the same number to each element of a list, the standard deviation will stay the same.Case II:Let’s now move on to the discussion of multiplying each element by the same positive number. The original placing of the numbers on the number line looked like this: ——0—a———b—c———————d———e————————f—g———————The new placing of the numbers on the number line will look something like this: ——0———a’——————b’———c’————————————d’—————————e— etc The numbers spread out. To understand this, take an example. Say, the initial numbers were 10, 20 and 30. If you multiply each number by 2, the new numbers are 20, 40 and 60. The difference between them has increased from 10 to 20. If you multiply each number by x, the mean also gets multiplied by x. So, if d was the mean initially, d’ will be the new mean which is \(x*d\). New mean = Old mean * Multiplied numberSimilarly, the median will also get multiplied by x. New median = Old median * Multiplied numberWhat happens to standard deviation in this case? It changes! Since the numbers are now further apart from the mean, their dispersion increases and hence the standard deviation also increases. The new standard deviation will be x times the old standard deviation. You can also establish this using the standard deviation formula. New standard deviation = Old standard deviation * Multiplied numberThe same concept is applicable when you increase each number by the same percentage. It is akin to multiplying each element by the same number. Say, if you increase each number by 20%, you are, in effect, multiplying each number by 1.2. So our case II applies here. Now, think about what happens when you subtract/divide each element by the same number.
_________________



Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 02:44
How to Quickly Solve Standard Deviation Questions on the GMAT The quantitative section of the GMAT is designed to test your understanding and application of concepts you learned in high school. The exam focuses on core mathematical concepts such as algebra, geometry and statistics. However some concepts are more engrained in the high school curriculum than others. Everyone’s done addition, multiplication, subtraction and division, but sometimes figuring out factorials or square roots may be a little more unusual. Perhaps no concept perplexes students on the GMAT more than the standard deviation. The standard deviation (often represented by σ) is measure of dispersion around the mean. It indicates how close the numbers in a set are to the set’s average. As a simple example, the sets {5, 10, 15} and {8, 10, 12} both have the same mean (10); however they do not have the same standard deviation. Knowing how to calculate the standard deviation is not required on the GMAT, but knowing how it’s calculated gives you a tremendous edge in answering questions. It’s a four step process: 1) Find the average (mean) of the set.
2) Find the differences between each element of the set and that average.
3) Square all the differences and take the average of the differences. This gives you the variance.
4) Take the square root of the variance. In this example, the average of the first set is clearly 10. The differences between the three elements are (5, 0 and 5). Taking the square of these numbers, we get (25, 0 and 25). The average of these numbers is 50/3 or 16.67. The square root of this number will not be an integer, but it will be very close to 4. So we can assume roughly ~4 or ~4.1. In contrast, the second set of numbers will have a much smaller standard deviation. The average is still 10, but the differences are now (2, 0 and 2). Taking the square of these numbers, we get (4, 0 and 4). The average of these numbers is 8/3 or 2.67. The square root of 2.67 is roughly ~1.6 or ~1.7, but it’s very hard to pin down without a calculator or a lot of extra time. This example should help highlight why the standard deviation is not explicitly calculated on an exam without a calculator: the chances of it being an integer are relatively low. However the concept it represents and the idea behind it are fair game on the test. One of the simple takeaways from the math behind the process is that, the farther the number is from the mean of the set, the more the standard deviation will increase. Specifically, the distance increases with the square of the difference, so 5 looks much farther out than 2. This kind of concept can be tested on the exam, but if you know what you’re looking for, you can answer standard deviation questions very quickly. Let’s look at an example: For the set {2, 2, 3, 3, 4, 4, 5, 5, x}, which of the following values of x will most increase the standard deviation?(A) 1 (B) 2 (C) 3 (D) 4 (E) 5 If you recall the steps to calculating the standard deviation, what we really need to do first is to calculate the mean. (i.e. how mean are you?) You can add the eight elements together and divide by eight, but the fact that these elements follow a fairly obvious pattern helps us as well. The numbers each appear twice, and they are evenly spaced. This means that the average will be the same as the median, and the median is 3.5. Even if you take the long way, it shouldn’t take you more than 20 seconds to find that the mean of this set is 3.5 The next step is to take each element and find the difference from the mean, but this is what we need to do if the goal is to actually calculate the standard deviation. All we’re being tasked to do here is to determine which number will increase the standard deviation the most. In this regard, all we need to do is figure out which answer choice is furthest from the mean. That number will produce the biggest distance, which will then be squared and in turn produce the biggest difference in standard deviation. So although you can spend a lot of time calculating every last detail of this question, what it actually comes down to is “which of these numbers is furthest from 3.5”. Asking about distance from a specific number is much more straightforward, and probably an elementary school level question. Yet, if you understand the concept, you can turn a GMAT question into something a 5th grader could answer (Are you smarter than a 5th grader?). T he answer is thus obviously choice A, as 1 is as far from 3.5 as possible given only these five choices. This question is discussed HERE. The important thing about the standard deviation is that you will never have to formally calculate it, but understanding the underlying concept will help you excel at the quantitative section of the GMAT. Most standard deviation questions hinge primarily on the distance from the mean, as everything else is just a rote division or addition. Much like taking five practice exams and getting wildly different scores, having a high variance is bad for knowing what to expect. Understanding the way standard deviations are tested on the GMAT will help you consistently get the questions right and reduce the variance of your results (hopefully with a very high mean).
_________________



Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 02:51
A 750 Level GMAT Question on Statistics! In this post, we have a very interesting statistics question for you. Above, we have already discussed statistics concepts such as mean, median, range. This question needs you to apply all these concepts but can still be easily done in under two minutes. Now, without further ado, let’s go on to the question – there is a lot to discuss there. Question: An automated manufacturing unit employs N experts. Their average monthly salary is $7000 while the median monthly salary is only $5000. If the range of their monthly salaries is $10,000, what is the minimum value of N?(A)10 (B)12 (C)14 (D)15 (E)20 Solution: Let’s first assimilate the information we have. We need to find the minimum number of experts that must be there. Why should there be a minimum number of people satisfying these statistics? Let’s try to understand that with some numbers. Say, N cannot be 1 i.e. there cannot be a single expert in the unit because then you cannot have the range of $10,000. You need at least two people to have a range – the difference of their salaries would be the range in that case. So there are at least 2 people – say one with salary 0 and the other with 10,000. No salary will lie outside this range. Median is $5000 – i.e. when all salaries are listed in increasing order, the middle salary (or average of middle two) is $5000. With 2 people, one at 0 and the other at 10,000, the median will be the average of the two i.e. (0 + 10,000)/2 = $5000. Since there are at least 10 people, there is probably someone earning $5000. Let’s put in 5000 there for reference. 0 … 5000 … 10,000 Arithmetic mean of all the salaries is $7000. Now, mean of 0, 5000 and 10,000 is $5000, not $7000 so this means that we need to add some more people. We need to add them more toward 10,000 than toward 0 to get a higher mean. So we will try to get a mean of $7000. Let’s use deviations from the mean method to find where we need to add more people. 0 is 7000 less than 7000 and 5000 is 2000 less than 7000 which means we have a total of $9000 less than 7000. On the other hand, 10,000 is 3000 more than 7000. The deviations on the two sides of mean do not balance out. To balance, we need to add two more people at a salary of $10,000 so that the total deviation on the right of 7000 is also $9000. Note that since we need the minimum number of experts, we should add new people at 10,000 so that they quickly make up the deficit in the deviation. If we add them at 8000 or 9000 etc, we will need to add more people to make up the deficit at the right. Now we have 0 … 5000 … 10000, 10000, 10000 Now the mean is 7000 but note that the median has gone awry. It is 10,000 now instead of the 5000 that is required. So we will need to add more people at 5000 to bring the median back to 5000. But that will disturb our mean again! So when we add some people at 5000, we will need to add some at 10,000 too to keep the mean at 7000. 5000 is 2000 less than 7000 and 10,000 is 3000 more than 7000. We don’t want to disturb the total deviation from 7000. So every time we add 3 people at 5000 (which will be a total deviation of 6000 less than 7000), we will need to add 2 people at 10,000 (which will be a total deviation of 6000 more than 7000), to keep the mean at 7000 – this is the most important step. Ensure that you have understood this before moving ahead. When we add 3 people at 5000 and 2 at 10,000, we are in effect adding an extra person at 5000 and hence it moves our median a bit to the left. Let’s try one such set of addition: 0 … 5000, 5000, 5000, 5000 … 10000, 10000, 10000, 10000, 10000 The median is not $5000 yet. Let’s try one more set of addition. 0 … 5000, 5000, 5000, 5000, 5000, 5000, 5000 … 10000, 10000, 10000, 10000, 10000, 10000, 10000 The median now is $5000 and we have maintained the mean at $7000. This gives us a total of 15 people. Answer (D) This question is discussed HERE. Granted, the question is tough but note that it uses very basic concepts and that is the hallmark of a good GMAT question! Try to come up with some other methods of solving this.
_________________



Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 02:59
A 750+ Level Question on SD Above, we looked at a 750+ level question on mean, median and range concepts of Statistics. Here we have a 750+ level question on standard deviation concept of Statistics. We do hope you enjoy checking it out. Before you begin, you might want to review the post that discusses standard deviation: Dealing With Standard DeviationSo here goes the question. Question: Given that set S has four odd integers and their range is 4, how many distinct values can the standard deviation of S take?(A) 3 (B) 4 (C) 5 (D) 6 (E) 7 Solution: Recall what standard deviation is. It measures the dispersion of all the elements from the mean. It doesn’t matter what the actual elements are and what the arithmetic mean is – the standard deviation of set {1, 3, 5} will be the same as the standard deviation of set {6, 8, 10} since in each set there are 3 elements such that one is at mean, one is 2 below the mean and one is 2 above the mean. So when we calculate the standard deviation, it will give us exactly the same value for both sets. Similarly, standard deviation of set {1, 3, 3, 5, 6} will be the same as standard deviation of {10, 12, 12, 14, 15} and so on. But note that the standard deviation of set {25, 27, 29, 29, 30} will be different because it represents a different arrangement on the number line. Let’s look at the given question now. Set S has four odd integers such that their range is 4. So it could look something like this {1, x, y, 5} when the elements are arranged in ascending order. Note that we have taken just one example of what set S could look like. There are innumerable other ways of representing it such as {3, x, y, 7} or {11, x, y, 15} etc. Now in our example, x and y can take 3 different values: 1, 3 or 5 x and y could be same or different but x would always be smaller than or equal to y.  If x and y were same, we could select the values of x and y in 3 different ways: both could be 1; both could be 3; both could be 5  If x and y were different, we could select the values of x and y in 3C2 ways: x could be 1 and y could be 3; x could be 1 and y could be 5; x could be 3 and y could be 5. For clarification, let’s enumerate the different ways in which we can write set S: {1, 1, 1, 5}, {1, 3, 3, 5}, {1, 5, 5, 5}, {1, 1, 3, 5}, {1, 1, 5, 5}, {1, 3, 5, 5} These are the 6 ways in which we can choose the numbers in our example. Will all of them have unique standard deviations? Do all of them represent different distributions on the number line? Actually, no! Standard deviations of {1, 1, 1, 5} and {1, 5, 5, 5} are the same. Why? Standard deviation measures distance from mean. It has nothing to do with the actual value of mean and actual value of numbers. Note that the distribution of numbers on the number line is the same in both cases. The two sets are just mirror images of each other. For the set {1, 1, 1, 5}, mean is 2. Three of the numbers are distance 1 away from mean and one number is distance 3 away from mean. For the set {1, 5, 5, 5}, mean is 4. Three of the numbers are distance 1 away from mean and one number is distance 3 away from mean. The deviations in both cases are the same > 1, 1, 1 and 3. So when we square the deviations, add them up, divide by 4 and then find the square root, the figure we will get will be the same. Similarly, {1, 1, 3, 5} and {1, 3, 5, 5} will have the same SD. Again, they are mirror images of each other on the number line. The rest of the two sets: {1, 3, 3, 5} and {1, 1, 5, 5} will have distinct standard deviations since their distributions on the number line are unique. In all, there are 4 different values that standard deviation can take in such a case. Answer (B) This question is discussed HERE.
_________________



Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
19 Aug 2015, 03:06
Other Resources on Statistics
_________________



Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
18 May 2016, 10:45
Using the Standard Deviation Formula on the GMAT We have discussed standard deviation (SD) in detail above. We know what the formula is for finding the standard deviation of a set of numbers, but we also know that GMAT will not ask us to actually calculate the standard deviation because the calculations involved would be way too cumbersome. It is still a good idea to know this formula, though, as it will help us compare standard deviations across various sets – a concept we should know well. Today, we will look at some GMAT questions that involve sets with similar standard deviations such that it is hard to tell which will have a higher SD without properly understanding the way it is calculated. Take a look at the following question: Which of the following distribution of numbers has the greatest standard deviation? (A) {3, 1, 2} (B) {2, 1, 1, 2} (C) {3, 5, 7} (D) {1, 2, 3, 4} (E) {0, 2, 4} At first glance, these sets all look very similar. If we try to plot them on a number line, we will see that they also have similar distributions, so it is hard to say which will have a higher SD than the others. Let’s quickly review their deviations from the arithmetic means: For answer choice A, the mean = 0 and the deviations are 3, 1, 2 For answer choice B, the mean = 0 and the deviations are 2, 1, 1, 2 For answer choice C, the mean = 5 and the deviations are 2, 0, 2 For answer choice D, the mean = 2 and the deviations are 3, 0, 1, 2 For answer choice E, the mean = 2 and the deviations are 2, 0, 2 We don’t need to worry about the arithmetic means (they just help us calculate the deviation of each element from the mean); our focus should be on the deviations. The SD formula squares the individual deviations and then adds them, then the sum is divided by the number of elements and finally, we find the square root of the whole term. So if a deviation is greater, its square will be even greater and that will increase the SD. If the deviation increases and the number of elements increases, too, then we cannot be sure what the final effect will be – an increased deviation increases the SD but an increase in the number of elements increases the denominator and hence, actually decreases the SD. The overall effect as to whether the SD increases or decreases will vary from case to case. First, we should note that answers C and E have identical deviations and numbers of elements, hence, their SDs will be identical. This means the answer is certainly not C or E, since Problem Solving questions have a single correct answer. Let’s move on to the other three options: For answer choice A, the mean = 0 and the deviations are 3, 1, 2 For answer choice B, the mean = 0 and the deviations are 2, 1, 1, 2 For answer choice D, the mean = 2 and the deviations are 3, 0, 1, 2 Comparing answer choices A and D, we see that they both have the same deviations, but D has more elements. This means its denominator will be greater, and therefore, the SD of answer D is smaller than the SD of answer A. This leaves us with options A and B: For answer choice A, the mean = 0 and the deviations are 3, 1, 2 For answer choice B, the mean = 0 and the deviations are 2, 1, 1, 2 Now notice that although two deviations of answers A and B are the same, answer choice A has a higher deviation of 3 but fewer elements than answer choice B. This means the SD of A will be higher than the SD of B, so the SD of A will be the highest. Hence, our answer must be A. This question is discussed HERE. Let’s try another one: Which of the following data sets has the third largest standard deviation?(A) {1, 2, 3, 4, 5} (B) {2, 3, 3, 3, 4} (C) {2, 2, 2, 4, 5} (D) {0, 2, 3, 4, 6} (E) {1, 1, 3, 5, 7} How would you answer this question without calculating the SDs? We need to arrange the sets in increasing SD order. Upon careful examination, you will see that the number of elements in each set is the same, and the mean of each set is 3. Deviations of answer choice A: 2, 1, 0, 1, 2 Deviations of answer choice B: 1, 0, 0, 0, 1 (lowest SD) Deviations of answer choice C: 1, 1, 1, 1, 2 Deviations of answer choice D: 3, 1, 0, 1, 3 Deviations of answer choice E: 4, 2, 0, 2, 4 (highest SD) Obviously, option B has the lowest SD (the deviations are the smallest) and option E has the highest SD (the deviations are the greatest). This means we can automatically rule these answers out, as they cannot have the third largest SD. Deviations of answer choice A: 2, 1, 0, 1, 2 Deviations of answer choice C: 1, 1, 1, 1, 2 Deviations of answer choice D: 3, 1, 0, 1, 3 Out of these options, answer choice D has a higher SD than answer choice A, since it has higher deviations of two 3s (whereas A has deviations of two 2s). Also, C is more tightly packed than A, with four deviations of 1. If you are not sure why, consider this: The square of deviations for C will be 1 + 1+ 1 + 1 + 4 = 8 The square of deviations for A will be 4 + 1 + 0 + 1 + 4 = 10 So, A will have a higher SD than C but a lower SD than D. Arranging from lowest to highest SD’s, we get: B, C, A, D, E. Answer choice A has the third highest SD, and therefore, A is our answer. This question is discussed HERE. Although we didn’t need to calculate the actual SD, we used the concepts of the standard deviation formula to answer these questions.
_________________



Math Expert
Joined: 02 Sep 2009
Posts: 61385

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
18 May 2016, 10:56
Solving GMAT Standard Deviation Problems By Using as Little Math as Possible The other night I taught our Statistics lesson, and when we got to the section of class that deals with standard deviation, there was a familiar collective groan – not unlike the groan one encounters when doing compound interest, or any mathematical concept that, when we learned it in school, involved an intimidatinglooking formula. So, I think it’s time for me to coin an axiom: the more painful the traditional formula associated with a given topic, the simpler the actual calculations will be on the GMAT. (Please note, though the axiom is awaiting official mathematical verification by Veritas’ hardworking team of data scientists, the anecdotal evidence in support of the axiom is overwhelming.) So, let’s talk standard deviation. If you’re like my students, your first thought is to start assembling a list of increasingly frantic questions: Do we need to know that horrible formula I learned in Stats class? (No.) Do we need to know the relationship between variance and Standard deviation? (You just need to know that there is a relationship, and that if you can solve for one, you can solve for the other.) Etc. So, rather than droning on about what we don’t need to know, let’s boil down what we do need to know about standard deviation. The good news – it isn’t much. Just make sure you’ve internalized the following: * The standard deviation is a measure of the dispersion the elements of the set around mean. The farther away the terms are from the mean, the larger the standard deviation. * If we were to increase or decrease each element of the set by “x,” the standard deviation would remain unchanged. * If we were to multiply each element of the set by “x,” the standard deviation would also be multiplied by “x.” * If the mean of a set is “m” and the standard deviation is “d,” then to say that something is within 3 standard deviations of a set is to say that it falls within the interval of (m – 3d) to (m + 3d.) And to say that something is within 2 standard deviations of the mean is to say that it falls within the interval of (m – 2d) to (m + 2d. That’s basically it. Not anything to get too worked up about. So, let’s see some of these principles in action to substantiate the claim that we won’t have to do too much arithmetical grinding on these types of questions: If d is the standard deviation of x, y, z, what is the standard deviation of x+5, y+5, z+5 ? A) d B) 3d C) 15d D) d+5 E) d+15 If our initial set is x, y, z, and our new set is x+5, y+5, and z+5, then we’re adding the same value to each element of the set. We already know that adding the same value to each element of the set does not change the standard deviation. Therefore, if the initial standard deviation was d, the new standard deviation is also d. We’re done – the answer is A. (You can see this with a simple example. If your initial set is {1, 2, 3} and your new set is {6, 7, 8} the dispersion of the set clearly hasn’t changed.) This question is discussed HERE. Surely the questions get harder than this, you say. They do, but if you know the aforementioned core concepts, they’re all quite manageable. Here’s another one: Some water was removed from each of 6 tanks. If standard deviation of the volumes of water at the beginning was 10 gallons, what was the standard deviation of the volumes at the end?
1) For each tank, 30% of water at the beginning was removed 2) The average volume of water in the tanks at the end was 63 gallonsWe know the initial standard deviation. We want to know if it’s possible to determine the new standard deviation after water is removed. To the statements we go! Statement 1: If 30% of the water is removed from each tank, we know that each term in the set is multiplied by the same value: 0.7. Well, if each term in a set is multiplied by 0.7, then the standard deviation of the set is also multiplied by 0.7. If the initial standard deviation was 10 gallons, then the new standard deviation would be 10*(0.7) = 7 gallons. And we don’t even need to do the math – it’s enough to see that it’s possible to calculate this number. Therefore, Statement 1 alone is sufficient. Statement 2: Knowing the average of a set is not going to tell us very much about the dispersion of the set. To see why, imagine a simple case in which we have two tanks, and the average volume of water in the tanks is 63 gallons. It’s possible that each tank has exactly 63 gallons and, if so, the standard deviation would be 0, as everything would equal the mean. It’s also possible to have one tank that had 126 gallons and another tank that was empty, creating a standard deviation that would, of course, be significantly greater than 0. So, simply knowing the average cannot possibly give us our standard deviation. Statement 2 alone is not sufficient to answer the question. And the answer is A. This question is discussed HERE. Maybe at this point you’re itching for more of a challenge. Let’s look at a slightly tougher one: 7.51; 8.22; 7.86; 8.36 8.09; 7.83; 8.30; 8.01 7.73; 8.25; 7.96; 8.53
A vending machine is designed to dispense 8 ounces of coffee into a cup. After a test that recorded the number of ounces of coffee in each of 1000 cups dispensed by the vending machine, the 12 listed amounts, in ounces, were selected from the data above. If the 1000 recorded amounts have a mean of 8.1 ounces and a standard deviation of 0.3 ounces, how many of the 12 listed amounts are within 1.5 standard deviations of the mean? A)Four B) Six C) Nine D) Ten E) Eleven Okay, so the standard deviation is 0.3 ounces. We want the values that are within 1.5 standard deviations of the mean. 1.5 standard deviations would be (1.5)(0.3) = 0.45 ounces, so we want all of the values that are within 0.45 ounces of the mean. If the mean is 8.1 ounces, this means that we want everything that falls between a lower bound of (8.1 – 0.45) and an upper bound of (8.1 + 4.5). Put another way, we want the number of values that fall between 8.1 – 0.45 = 7.65 and 8.1 + 0.45 = 8.55. Looking at our 12 values, we can see that only one value, 7.51, falls outside of this range. If we have 12 total values and only 1 falls outside the range, then the other 11 are clearly within the range, so the answer is E. This question is discussed HERE. As you can see, there’s very little math involved, even on the more difficult questions. Takeaway: remember the axiom that the more complexlooking the formula is for a concept, the simpler the calculations are likely to be on the GMAT. An intuitive understanding of a topic will always go a lot further on this test than any amount of arithmetical virtuosity.
_________________



Current Student
Joined: 04 Jun 2018
Posts: 155
GMAT 1: 610 Q48 V25 GMAT 2: 690 Q50 V32 GMAT 3: 710 Q50 V36

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
10 Mar 2019, 21:57
Bunuel wrote: A 750 Level GMAT Question on Statistics! In this post, we have a very interesting statistics question for you. Above, we have already discussed statistics concepts such as mean, median, range. This question needs you to apply all these concepts but can still be easily done in under two minutes. Now, without further ado, let’s go on to the question – there is a lot to discuss there. Question: An automated manufacturing unit employs N experts. Their average monthly salary is $7000 while the median monthly salary is only $5000. If the range of their monthly salaries is $10,000, what is the minimum value of N?(A)10 (B)12 (C)14 (D)15 (E)20 Solution: Let’s first assimilate the information we have. We need to find the minimum number of experts that must be there. Why should there be a minimum number of people satisfying these statistics? Let’s try to understand that with some numbers. Say, N cannot be 1 i.e. there cannot be a single expert in the unit because then you cannot have the range of $10,000. You need at least two people to have a range – the difference of their salaries would be the range in that case. So there are at least 2 people – say one with salary 0 and the other with 10,000. No salary will lie outside this range. Median is $5000 – i.e. when all salaries are listed in increasing order, the middle salary (or average of middle two) is $5000. With 2 people, one at 0 and the other at 10,000, the median will be the average of the two i.e. (0 + 10,000)/2 = $5000. Since there are at least 10 people, there is probably someone earning $5000. Let’s put in 5000 there for reference. 0 … 5000 … 10,000 Arithmetic mean of all the salaries is $7000. Now, mean of 0, 5000 and 10,000 is $5000, not $7000 so this means that we need to add some more people. We need to add them more toward 10,000 than toward 0 to get a higher mean. So we will try to get a mean of $7000. Let’s use deviations from the mean method to find where we need to add more people. 0 is 7000 less than 7000 and 5000 is 2000 less than 7000 which means we have a total of $9000 less than 7000. On the other hand, 10,000 is 3000 more than 7000. The deviations on the two sides of mean do not balance out. To balance, we need to add two more people at a salary of $10,000 so that the total deviation on the right of 7000 is also $9000. Note that since we need the minimum number of experts, we should add new people at 10,000 so that they quickly make up the deficit in the deviation. If we add them at 8000 or 9000 etc, we will need to add more people to make up the deficit at the right. Now we have 0 … 5000 … 10000, 10000, 10000 Now the mean is 7000 but note that the median has gone awry. It is 10,000 now instead of the 5000 that is required. So we will need to add more people at 5000 to bring the median back to 5000. But that will disturb our mean again! So when we add some people at 5000, we will need to add some at 10,000 too to keep the mean at 7000. 5000 is 2000 less than 7000 and 10,000 is 3000 more than 7000. We don’t want to disturb the total deviation from 7000. So every time we add 3 people at 5000 (which will be a total deviation of 6000 less than 7000), we will need to add 2 people at 10,000 (which will be a total deviation of 6000 more than 7000), to keep the mean at 7000 – this is the most important step. Ensure that you have understood this before moving ahead. When we add 3 people at 5000 and 2 at 10,000, we are in effect adding an extra person at 5000 and hence it moves our median a bit to the left. Let’s try one such set of addition: 0 … 5000, 5000, 5000, 5000 … 10000, 10000, 10000, 10000, 10000 The median is not $5000 yet. Let’s try one more set of addition. 0 … 5000, 5000, 5000, 5000, 5000, 5000, 5000 … 10000, 10000, 10000, 10000, 10000, 10000, 10000 The median now is $5000 and we have maintained the mean at $7000. This gives us a total of 15 people. Answer (D) This question is discussed HERE. Granted, the question is tough but note that it uses very basic concepts and that is the hallmark of a good GMAT question! Try to come up with some other methods of solving this. The answer seems WRONG. The set of {5000,5000,5000,5000,5000,5000,7500,7500,10000,15000} solves this in N=10.



Intern
Joined: 11 Oct 2018
Posts: 21
Location: Germany

Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
18 Mar 2019, 05:10
nitesh50 wrote: The answer seems WRONG. The set of {5000,5000,5000,5000,5000,5000,7500,7500,10000,15000} solves this in N=10. That's true. Bunuel can you elaborate the approach to this result? EDIT: I found the right question, to the given answer. The information missing is, that the average is not $7000, but $7000 more than the least salary.



Intern
Joined: 08 Jun 2019
Posts: 27

Re: Statistics Made Easy  All in One Topic!
[#permalink]
Show Tags
12 Jul 2019, 04:07
Hi, is there a PDF version of the content on this thread?




Re: Statistics Made Easy  All in One Topic!
[#permalink]
12 Jul 2019, 04:07



Go to page
1 2
Next
[ 21 posts ]



