Thank you for taking the new Veritas Prep
Practice Test! And thanks, actually, for bringing up the scoring – we love talking about the scoring algorithm and how it works.
Our tests (beginning in about April of 2013; if you didn’t take the scoring as seriously under the previous tests I don’t blame you) are scored using Item Response Theory (http://en.wikipedia.org/wiki/Item_response_theory
), the system used by the GMAT, GRE, and other computer adaptive tests. And it’s a difficult system to understand; it took our team (which included math majors from multiple continents, MIT grads, members of the International Association for Computerized Adaptive Testing….) a healthy year or more to fully understand and implement the system. So as I’ve long said, if you can either explain to Usain Bolt how electronic timing works or just letting let him go out and run faster than anyone else in the world, he’s better off training to run fast. Learning the nuances of IRT/CAT theory isn’t going to help you much on the GMAT, but understanding the basics can help you feel comfortable with the test administration and can help you avoid bad advice. So with that in mind…Why “Percent Correct” Doesn’t Matter Nearly As Much As You Think
The IRT system has two main functions – item administration (which questions you see) and ability estimation (your score). And each system informs the other. Once the ability estimate feels confident that you’re above average, for example, it delivers questions that are most likely to help it determine “just how far above average?” – which means that you’ll miss several questions even if you’re in the 90th percentile, because it’s trying to determine whether you’re above that level and the only way to know is to continue testing your upper limit.
Now, the simplified explanation strips out a good amount of IRT nuance and basically says this: once the system has narrowed in on your ability you should theoretically get half the remaining questions wrong; if you’re 75th percentile, you should be getting all the 80th questions wrong and all the 70th questions right, and the system will keep pinballing you between those levels. That’s not 100% accurate but it’s close enough for a close-enough understanding of the scoring system. Almost everyone will miss a lot of questions once the system has started to figure out your threshold. And according to IRT theory it doesn’t take too long to get there – within 6-7 questions the system usually has a pretty good feel for your ability level.
What’s really happening with the ability estimation is that it’s calculating the probability of someone with your responses having each score. And here’s where conventional forum wisdom tends to miss the nuance of IRT – we see “you get a question right it gives you a harder question / you get it wrong it gives you an easier one”. What the system is really doing after each response is using all of your responses to date to estimate the probability of your having each score, and not all questions carry equal weight. Again, the IRT system heavily relies on probability – some questions are much more potent at determining whether you’re above or below a certain threshold and others are a little less telling. The system takes these weights into account, particularly as your score moves. These weights also have to account for content delivery – the system might want to ask you a “more potent” Sentence Correction question but need to deliver you another Reading Comprehension passage, and so those RC questions might not carry the same weight as the questions before it.
Which is all a long way of saying “it’s complicated” – so let’s take a look under the hood at your test so I can show you how the system worked.Your Test
(BTW – since your GMATClub handle was also part of your login for your test this was relatively easy for me to find…hope you don’t mind my pulling it up)
You answered the first quant question correctly, and here’s where “conventional wisdom” may lead you to an incorrect conclusion about how the system should have treated your next two incorrect answers. After one question, the system only had one data point on you – you were correct. So at that point you had a high probability of being well above average – after all, if you answer a 50th percentile question right you’ll probably answer a 60th percentile question right, too – so the system wanted to ask you a hard question (and it did – only about 15% of people get that Venn Diagram problem right). When you got that wrong, the probability that you were above the 90th percentile decreased dramatically, but the probability that you were above the 60th percentile was still very high; the system had a data point in that range and you answered it correctly. Getting one right, then one wrong did not take you right back to the starting point – that’s not how the probability-based system works. So then you saw another substantially-above-average problem and got it wrong, so the system at this point saw that the highest likelihood was that your score was a little above average, but not that much.
So after three questions, the system saw you as around the 60th percentile, and gave you a subsequent question consistent with that estimate…and then you rattled off three straight correct answers on some relatively tough problems. And after you missed that 7th question, you went on a nice run of more-right-than-wrong before you hit your first real trouble patch of three straight wrong…and by that point the system had already pegged you as being around a standard deviation above average, so those wrong answers were all on hard questions. Had you free-fallen a little more you might have been in trouble, but you stemmed the tide nicely with a couple correct answers to break up those trouble patches and the system more gradually moved its estimate down. Then when you got back on track with a few runs of 4+ correct, you limited the damage – at your peak you would have gotten a Q49, so those rough patches came on some really tough problems.
Which I guess brings me back to why your correct percentage doesn’t tell the whole story – since your first question’s “most potent” point (where it gives the highest distinction between ability levels) is a tick above absolutely average, you never saw a “below-average” problem on the quant section. So you were supposed to get a lot of those questions wrong, as do most others who see 37 straight “difficult” problems. So when you’re graded based on your ability level compared with that of others your performance looks all the better – most people when seeing the run of questions you did in a test format would have performed worse.
On the verbal side, the setup was similar and I should also mention that two of your verbal mistakes came on unscored, experimental questions (we include those like the GMAT does and for the same reason – to get good statistics on questions for use in subsequent tests). So if you take those out your correct % goes even higher.What you can learn from this
While I can’t guarantee that the official GMAT uses “stock” Item Response Theory (it may add in a few scaffolds or tweaks for its own purposes but I know it adheres very closely to, if not 100% to, IRT), you can learn from this experience with IRT that:
1) Getting “the first 10 questions right” isn’t what many say it is. You can miss 4-5 of the first 10 and still leave that set with the computer estimating you well above average.
2) The key is, instead, to make sure you don’t miss many easy questions. Doing so impacts the ability estimate (you’ve given the system a data point that you now have to overcome) and the item delivery (if you miss an easy question, the system is more apt to deliver you questions around that level to test your “floor” instead of your “ceiling”, so you give yourself a narrow margin of error and you don’t get as many opportunities to prove that you’re elite)
3) The IRT system is nearly impossible to game. Remember – your ability is calculated by your probability of certain scores and it takes into account all of your previous responses. Please don’t take from this “get the first one right and you’re golden!” or anything like that – the system has a lot more nuance than that. Like I said above, understanding electronic timing doesn’t make you a better sprinter, and understanding GMAT scoring won’t help you game it. The best advice you can take from this is “don’t make careless mistakes in the first 10 questions” (or really ever…but make sure you slow down and don’t let early jitters mess you up early) and “don’t be a perfectionist at the expense of having time for later questions”. A good start on the GMAT can easily be getting 6-7 of the first ten right.
As for our practice tests, I invite everyone to take them and see how the experience goes! Since the new test went live this spring we’ve heard from quite a few students that their scores on our new tests have been identical to those of their official test or their GMAT Prep test (you can see some of those reviews at http://gmatclub.com/forum/has-anyone-taken-the-veritas-prep-s-gmat-simulator-cat-exams-92320-20.html#p1227821
), and as a teacher I’m even happier that people are getting an authentic experience.