# GMATPrep Tests Scoring Errors

### Show Tags

12 Aug 2018, 18:22
Hey friends,

I'm a GMAT tutor in Chicago. I've recently had two students report to me egregious scoring errors on their GMATPrep tests. In each case a similar thing happened - the student received a score on their quant section that was impossibly low given the number of questions they got wrong. In one case a student got 9 questions wrong and scored in the 34th percentile, and in the other the student got 8 questions wrong and scored in the 28th percentile. Anybody familiar with how the scoring on the GMAT works knows that even after taking into account the adaptive nature of the test, there is absolutely no possible combination of questions that would give such a low score with so few questions wrong. In addition, the tests themselves seem to have stopped adapting about 10 questions into the quant section and as a result the students were seeing nothing but easy questions no matter how many right they were getting.

Has anybody else experienced this? Any possible explanation? Let me know your thoughts. I don't think it's a stretch to say that if students can't trust their scores on GMAC's own practice tests then how are they supposed to trust their score on the real thing.

I have attached screenshots of each of the score reports.
Attachments

Attach16318_20180809_213732.jpg [ 489.18 KiB | Viewed 998 times ]

Screen Shot 2018-08-09 at 9.37.06 PM.png [ 243.68 KiB | Viewed 997 times ]

Attach15790_20180627_190702.jpg [ 262.02 KiB | Viewed 996 times ]

Attach15793_20180627_192417.jpg [ 334.75 KiB | Viewed 996 times ]

### Show Tags

13 Aug 2018, 18:30
.....Any thoughts on this?? I'm really trying to figure out if other people are having same issue and, if not, why it's now happened twice with my students.
Joined: 19 Jul 2018
Posts: 97

### Show Tags

14 Aug 2018, 14:03
1
Measurement nerd here with an (unfortunately) incomplete answer. The measurement theory behind the GMAT scoring algorithm has three parts... 1) how easy a question is to guess on 2) the difficulty of the question and 3. the amount of information the question gives the algorithm about the test taker.

Let's assume that the algorithm is working correctly. (If you really feel like it isn't working properly, I'd report it to GMAC- definitely something they'd appreciate knowing if it is in fact broken!) Now without seeing these values for the questions your student got right or wrong or knowing the specific GMAT scoring algorithm, it is literally impossible to know exactly why they got the scores they did. That said, if I had to guess, it's some combination of:

1. Some questions are just going to give the test more information about a test taker than other questions are and are therefore going to be valued more by the algorithm both when it's choosing questions and when it's scoring.
2. The difficulty of the questions you miss is as important as the number of questions you miss, if not more so.
3. For the quant section, percentiles tend to drop off precipitously. A Q49, for example, is only in the 74th percentile even though anyone would agree that it's a pretty darn good score.

That said, one of the important things to remember about the GMAT scoring algorithm is that you don't break the world record for the fastest marathon by knowing how electronic timing works - you do it by running really quickly.
Founder
Joined: 04 Dec 2002
Posts: 17136
Location: United States (WA)
GMAT 1: 750 Q49 V42
GPA: 3.5
### Show Tags

14 Aug 2018, 14:10
That seems quite a bit low. But if you get the first 10 Questions wrong in Quant, you will get Q30 or around that (19th percentile) https://gmatclub.com/forum/new-format-g ... 69682.html
### Show Tags

14 Aug 2018, 15:01
bb wrote:
That seems quite a bit low. But if you get the first 10 Questions wrong in Quant, you will get Q30 or around that (19th percentile) https://gmatclub.com/forum/new-format-g ... 69682.html

Thanks for the response. Even taking into account how the first 10 questions seem to carry more weight, I'm still having a hard time seeing how the scores could have been so low except in an extreme scenario such as getting the first nine wrong and then everything else right. This isn't based on any understanding I claim to have of the algorithm - I'm really just trying to apply some common sense to my existing familiarity with the GMAT's adaptive format.

Below is the full breakdown for each student. It's true that each begins poorly - Student #1 gets 5 of the first 10 incorrect and Student #2 gets the first 3 incorrect - but I'm just not convinced that this should have put them in a hole so deep it would be impossible to climb out of.

I'm not necessarily saying that the test itself is at fault. It's possible the students did something like exit the section in the middle which then triggered some sort of glitch. I'm just hoping to get an explanation so I don't have to tell students the GMATPrep tests are unreliable.

Let me know if you have any more thoughts given the breakdowns.

Student #1 - 9 Incorrect, 34th percentile
1. Incorrect
2. Correct
3. Incorrect
4. Incorrect
5. Correct
6. Incorrect
7. Correct
8. Incorrect
9. Correct
10. Correct
11. Incorrect
12. Incorrect
13. Correct
14. Correct
15. Incorrect
16. Correct
17. Correct
18. Correct
19. Correct
20. Correct
21. Correct
22. Correct
23. Correct
24. Correct
25. Correct
26. Correct
27. Correct
28. Correct
29. Correct
30. Incorrect
31. Correct

Student #2 - 8 Incorrect, 28th Percentile
1. Incorrect
2. Incorrect
3. Incorrect
4. Correct
5. Correct
6. Correct
7. Correct
8. Correct
9. Correct
10. Correct
11. Incorrect
12. Incorrect
13. Correct
14. Correct
15. Incorrect
16. Correct
17. Incorrect
18. Correct
19. Correct
20. Correct
21. Incorrect
22. Correct
23. Correct
24. Correct
25. Correct
26. Correct
27. Correct
28. Correct
29. Correct
30. Correct
31. Correct
14 Aug 2018, 15:09
It is not possible to determine if scoring is correct or incorrect just based on pure number of questions but I can see that the student has had a horrible performance on the first 10 questions on the first test. There are experimental questions that don't count for example and those could throw things off more so. There is another project a user started looking at the difficulty of the questions a person answered: https://gmatclub.com/forum/data-gmat-ev ... 73092.html - identifying the number of easy, medium, and diff questions can help.

At this time we do not have any indication that the GMAT Prep Scores are not accurate or cannot be trusted though I see there is another similar question here: https://gmatclub.com/forum/gmat-prep-of ... 73060.html
Has your student taken other trusted CATs and scored a lot better? Do they want to take Old GMAT Prep or perhaps here is an idea for you - take a GMAT prep (old one) and make the exact same mistakes - e.g. miss the first 3, then a few correct, and then get the following wrong. Follow their footprint basically. The last 6 questions - don't even answer them and exist the test (we can back the score out).

### Show Tags

14 Aug 2018, 19:09
bb wrote:
It is not possible to determine if scoring is correct or incorrect just based on pure number of questions but I can see that the student has had a horrible performance on the first 10 questions on the first test. There are experimental questions that don't count for example and those could throw things off more so. There is another project a user started looking at the difficulty of the questions a person answered: https://gmatclub.com/forum/data-gmat-ev ... 73092.html - identifying the number of easy, medium, and diff questions can help.

At this time we do not have any indication that the GMAT Prep Scores are not accurate or cannot be trusted though I see there is another similar question here: https://gmatclub.com/forum/gmat-prep-of ... 73060.html
Has your student taken other trusted CATs and scored a lot better? Do they want to take Old GMAT Prep or perhaps here is an idea for you - take a GMAT prep (old one) and make the exact same mistakes - e.g. miss the first 3, then a few correct, and then get the following wrong. Follow their footprint basically. The last 6 questions - don't even answer them and exist the test (we can back the score out).

Thanks, I'll try taking an old GMATPrep test and compare results. One thing I wanted to clarify is that I have dozens of students and have only noticed this issue in these two cases so I'm not trying to suggest there is a systemic error in everyone's score reports. I really am just trying to figure out how to explain these two results.

The poor performance in the first 10 questions could be the answer and is the most supported explanation at this time based on the link you posted. I'm not entirely convinced though.

For student #1: Yes, he gets half of the first 10 wrong. That's quite a bit less extreme than getting all of the first 10 wrong though like in the other post. Starting at question #16 he gets 14 in a row correct. Even if it's true that by this point the test has adjusted to the test-taker, and the difficulty level shouldn't fluctuate much from question to question, it just seems like it's not adapting at all which is bizarre. What would be the logic behind designing the test this way?

The best logical case I can make is by comparing the two students to each other. Student #2 gets fewer questions wrong but his percentile score is worse. As you have mentioned, it is impossible to explain someone's score based on a number of wrong answers alone. However, I would say that in addition to having fewer wrong answers, student #2's distribution is also clearly more favorable. He ends up getting 7 of the first 10 right despite getting the first three wrong. The only way I could see this being substantially worse than 5 out of 10 would be if those first three questions were so important that they doomed him for the rest of the section. After that, he gets some right and some wrong without any real streaks in one direction before finishing by getting 9 correct answers in a row. I just can't imagine the scenario where this results in a 28th percentile because it would have to mean he is getting absolutely creamed for every incorrect answer, and then getting almost no credit for his right answers. Really seems like it simply stopped adapting.
### Show Tags

15 Aug 2018, 12:36
What we have observed in the GMAT Prep iterations and scenarios is that it is impossible to recover after missing the first 10 questions. On the other hand, if you miss the last 10, you still get Q49 or Q50 vs Q29, so that gives you an idea how much fluctuation there is depending on which questions are missed.

P.S. Curious if you ran the GMAT Prep Experiment.
### Show Tags

03 Sep 2018, 08:27
3
I apologize in advance for a not-terribly-fun response, but this isn't actually surprising at all, for two reasons:

• Both of these students got the bejeezus beat out of them at the beginning of the section. The first test-taker missed 7 of the first 12, so he was seeing really, really easy questions by the time he got to the middle third of the test. (The second test-taker had a stranger distribution of errors, but the same idea basically applies, especially after he missed the first three questions.) The difficulty level of every question is determined by the test-taker's performance up to that point, so once a test-taker gets demolished early in the test, it's nearly impossible to "convince" the algorithm to hand out more difficult questions. And as LauraOrion pointed out, the number of questions you miss generally matters far less on an adaptive test than WHICH questions you miss. So if they missed really easy stuff early, it's not a surprise at all that they never really saw harder stuff.
• Percentiles are desperately warped on the quant section of the GMAT. When the test was originally designed, the idea was that scores would be roughly normally distributed, with a mean of around 30. But that's not how things have worked out: the distribution is massively skewed, with a mean of 40, a median score of 44, and modes of 49 and 50 (!!). So when your two students earn raw quant scores of 35 and 37, they're earning scores that were supposed to be well above average when the test was designed. It's just that the test-taking pool has gotten MUCH better at quant over the years, and the percentiles have moved so much that they're nearly meaningless now.

Sure, there are always legitimate, lingering questions about the GMATPrep software and the extent to which it perfectly mimics the actual exam. But what you're showing here is absolutely NOT a sign of GMATPrep scoring errors. This is exactly the sort of not-great result that we should expect on an adaptive test if a test-taker gets whacked early. And because the percentiles are so deceptive now, your students' results really aren't as apocalyptic as they might seem. They aren't nearly good enough for top business schools in 2018, but when the test was designed, mid-30s quant scores were supposed to be pretty good.

I hope this helps a bit! And if you're ever having a hard time sleeping, give me a call, and I'll tell you all about the differences between the three-parameter logistic model and the four-parameter logistic model in computer adaptive testing, and you'll be asleep instantly.
### Show Tags

03 Sep 2018, 17:02
GMATNinja wrote:
I apologize in advance for a not-terribly-fun response, but this isn't actually surprising at all, for two reasons:

• Both of these students got the bejeezus beat out of them at the beginning of the section. The first test-taker missed 7 of the first 12, so he was seeing really, really easy questions by the time he got to the middle third of the test. (The second test-taker had a stranger distribution of errors, but the same idea basically applies, especially after he missed the first three questions.) The difficulty level of every question is determined by the test-taker's performance up to that point, so once a test-taker gets demolished early in the test, it's nearly impossible to "convince" the algorithm to hand out more difficult questions. And as LauraOrion pointed out, the number of questions you miss generally matters far less on an adaptive test than WHICH questions you miss. So if they missed really easy stuff early, it's not a surprise at all that they never really saw harder stuff.
• Percentiles are desperately warped on the quant section of the GMAT. When the test was originally designed, the idea was that scores would be roughly normally distributed, with a mean of around 30. But that's not how things have worked out: the distribution is massively skewed, with a mean of 40, a median score of 44, and modes of 49 and 50 (!!). So when your two students earn raw quant scores of 35 and 37, they're earning scores that were supposed to be well above average when the test was designed. It's just that the test-taking pool has gotten MUCH better at quant over the years, and the percentiles have moved so much that they're nearly meaningless now.

Sure, there are always legitimate, lingering questions about the GMATPrep software and the extent to which it perfectly mimics the actual exam. But what you're showing here is absolutely NOT a sign of GMATPrep scoring errors. This is exactly the sort of not-great result that we should expect on an adaptive test if a test-taker gets whacked early. And because the percentiles are so deceptive now, your students' results really aren't as apocalyptic as they might seem. They aren't nearly good enough for top business schools in 2018, but when the test was designed, mid-30s quant scores were supposed to be pretty good.

I hope this helps a bit! And if you're ever having a hard time sleeping, give me a call, and I'll tell you all about the differences between the three-parameter logistic model and the four-parameter logistic model in computer adaptive testing, and you'll be asleep instantly.

Thanks for the thorough response. I'm conceding this argument to you guys. I was trying to use logic but clearly the algorithm is anything but logical so I can't really continue this line of defense without getting a PhD in machine learning which I'm not prepared to do. I don't think I'm going to have time to do the experiment recreating the result, but if there were a way to hand grade these tests I can submit to you guys the exact questions. I would like to know the exact moment they doomed themselves.
### Show Tags

04 Sep 2018, 08:58
Unfortunately there really isn't a good way to run something like this by hand. Even if we had the exact difficulty, importance, and guessing parameters, we'd still need information like the scaling vector the GMATPrep software uses and a good deal of other variables. The function itself is way too time consuming to run by hand. There's a reason that we use classical test theory to grade tests in schools - tests based on item response theory (especially those that are computer adaptive) require many, many calculations in order to give an accurate picture of a student's abilities. It's just not practical to do it unless you've built the computer program to do it.
