How Does the GMAT Algorithm Work?

Question

How the GMAT Algorithm Works  Written and originally posted by David Kuntz, Vice President of Research at Knewton, where he builds the CATs for its online GMAT course.

Related Discussions:     GMAT prep analysis and What if GMAT Prep Scenarios  
   GMAT Scoring, my observations by Hjort  - 100+ replies 
   Un-scientific analysis of GMAT Prep  - 100+ replies 
   GMAT Score Tables  - 100+ replies

Webinars on GMAT Algorithm:     Understanding the GMAT Adaptive Algorithm  
   3 Effective GMAT Test Taking Strategies to Beat the GMAT Adaptive Algorithm

1. What’s an algorithm? An algorithm, generally, is a usually efficient set of well-defined steps that are followed to solve some pre-defined problem. In the case of a CAT algorithm, the problem is to reliably and efficiently estimate a student’s ability in a reasonable amount of time. Some CAT algorithms seek to solve this problem by selecting one question at a time, each subsequent question selected based on all of the student’s prior responses. Other algorithms look only at the most recently-answered question. Still others evaluate responses to specific groups of questions.

CAT algorithms also vary with regard to the explicit criteria they use to select the next question (or sets of questions) to administer. Some try to minimize total measurement error. Others try to maximize the precision and accuracy of measurement for each question administered. Still others try to select questions that will most refine the current ability estimate. As a consequence, CAT algorithms can vary greatly from one to another, depending on the specific implementation of the algorithm, and the intent of the algorithm developers.

2. Why does the GMAT use an algorithm when the linear LSAT seems to be a pretty decent gauge of proficiency?  One of the common goals in using a CAT algorithm is to reduce the number of questions a student needs to answer in order to establish, to a specified level of reliability, an estimate of the student’s ability. CATs are often more efficient than linear tests, and so fewer questions are needed to reach a desired level of reliability. The LSAT needs over 100 items to reach that level, while the GMAT needs fewer than 80 to reach a comparable level.

3. Is the entire GMAT adaptive? Almost all large-scale standardized tests contain some number of ”experimental” or “pretest” questions that are administered to the student but do not count toward the student’s final score. This is simply a way for the test makers to gather data on the questions, in order to determine how difficult they are and how well they distinguish between students at different ability levels. They also use the data collected to identify bad questions, so that they can eliminate or fix them before they count.

Some tests, like the LSAT, include all of the pretest questions in a single section. Others, like the GMAT, intermingle the pretest questions with the operational ones. Which section is the pretest section, and which questions are the pretest questions, is usually a well-guarded secret. It is generally bad strategy to spend time trying to guess whether a given question is operational or not. The price of guessing incorrectly is just too high.

4. How does the GMAT select which questions I get? CATs like the GMAT have a blueprint — a set of specifications (difficulty, question type, content area, etc.) that define which questions you see. At the same time, each question has certain statistical characteristics that the algorithm uses, based on your response, to estimate your quantitative or verbal ability. The algorithm looks at your performance on the questions you have already answered and the characteristics of each question remaining in the pool and then selects for you the question that simultaneously best satisfies the blueprint and provides the most statistical information it can, to generate the best estimate of your ability.

5. My score doesn’t seem to match my performance: I only got a few questions wrong, but my score isn’t as high as I thought it would be / I got a bunch of questions wrong, yet my score seems higher than it should be.  Most exams are linear assessments, like the SAT or your 10th grade history final. These are scored by counting the number of questions you answer correctly, and sometimes by penalizing for each question you answer incorrectly. The result, a raw score, is then converted to a scaled score, like the 600-2400 range for the SAT.

A computer-adaptive test (CAT) works very differently. It doesn’t really care as much about how many you get right or wrong, but rather which questions you get right and wrong. The CAT algorithm estimates your ability based on a variety of criteria, including the difficulty of a question. After each question, it evaluates your response and updates this estimate. When the test is over, the algorithm converts your quantitative and verbal ability estimates into the quantitative and verbal scaled scores, and then separately combines your quantitative and verbal ability estimates to calculate the overall score.

6. Do the first X number of questions matter more?  Many variables that come into play when the CAT selects your next question. One of them is the CAT’s current estimate of your ability. It uses this estimate to select questions that will be most useful in refining that estimate (if you’re a high performing student, giving you low difficulty questions isn’t usually as useful in discerning your true ability as giving you harder questions, and vice versa). What is important to remember is that you should not try to guess how you are doing by whether the question in front of you seems easy or difficult; every question deserves your full attention. With that understood, unless you have completely bombed the test, it is usually the case that missing a couple of very hard questions late in the test will have a smaller effect on your final score than missing a couple of very easy questions earlier, not because of their position within the test but because of their levels of difficulty.

7. How severe is the penalty for not finishing a section?  The penalty is significant. You can expect your scaled score to decrease by roughly 1 point for every question that you don’t answer. For example, if you correctly answer every question you encounter but fail to answer the last five, you generally won’t score higher than a 46.

8. I took the GMAT and got a 710, 44q/44v/6 AWA. A friend of mine happened to take the test 6 days later and get the exact same quant/verbal scaled scores but he got a 720. How this could happen?  Both the individual section scores and the overall score are calculated using an estimate of your Math and Verbal abilities derived from your performance on the CAT. Your overall score is not calculated from your section scores. Because your underlying ability estimate might be slightly different from your friend’s, your overall scores might be different.

For example, there are a range of ability estimates that translate into a Verbal score of 40, and there are a range of ability estimates that translate into a Math score of 42. Depending on which specific estimate is calculated for you, your overall score could range from 660 to 680. Please note that the Standard Error of Measurement (SEM) on the overall score for GMAT is 29 points, so scores of 660 / 680 all fall within the standard error.

How can my overall percentile be higher than both my quantitative and verbal percentiles?

Your overall score is calculated separately from your section scores, so you can score in the 99th percentile on the GMAT even if you didn’t score in the 99th percentile on either of the sections. For example, you could get a 48 on Quantitative (86th percentile), a 45 on Verbal (98th percentile), and a 760 overall (99th percentile).

Are the quantitative and verbal sections weighted equally in the total score?

Technically, yes — the estimates of your quantitative and verbal abilities that the CAT produces contribute the same amount to your overall score. However, the verbal section has a greater effect on your percentile rank because it is generally more difficult. If, for example, you scored a 40 on both the Quantitative and Verbal sections, your percentile rank for Quantitative would be 61st, but for Verbal it would be 91st. Your overall score (650) would be in the 84th percentile.

Why are scores above 51 rare? Why does the scale go up to 60? Can anyone get a 52?

For psychometric reasons, GMAC has truncated the scale at 51 (they do not report section scores higher than 51).

Why is it so difficult to create a good CAT?

A CAT needs to do many things well in order to reliably and accurately estimate your ability. It requires a robust algorithm to estimate your ability, a complex but speedy mechanism to identify the best question for you to see next, a rich pool of questions from which to select the questions, and a powerful scoring algorithm that translates the ability estimate into something meaningful.

Each test question has many characteristics that need to be simultaneously considered in the selection. The statistical characteristics of the questions all need to be determined beforehand through a process known as pretesting. Many, many questions are needed in order to be able to provide accurate assessment for all ability levels. And all of those questions need to be carefully constructed, reviewed, and statistically aligned so that they contribute meaningfully to your ability estimate.

David Kuntz is Vice President, Research at Knewton, where he builds the CATs for its online GMAT course. He is one of the brilliant brains behind the accuracy of Knewton CATs. This is a series of posts combined into one about the algorithm behind the GMAT.

abhicoolmax · Answer

This is a revelation! I didn't know this. So one who scores q51 with 0 incorrect should expect to see a higher overall score than another who scored q51 with 5 incorrects - considering the verbal is close to same. This theory explains why some q51,v40 in this forum are even 760, while others are 730-750!! Thanks BB for sharing this. It's good to know, and thus it motivates to perform the best possible one can in any section!

jeprince112 · Answer

very interesting information to take into consideration. I still am confused as to whether the beginning portion of the test is truly more important to get right. I have heard that a lot, but also the case that it does not weigh as heavily as some believe.

catfreak · Answer

Awesome post. Kudos bb !

321kumarsushant · Answer

just a quick question..
ref to point no 8.

is it possible that variation in total score exist because of time spent on individual question and got it right or wrong.?

Yekrut · Answer

Interesting article to see how everything works.

Kurai · Answer

bb  I know there's no way to REALLY know, but is there a possibility that the amount of time you spend on a particular question can affect your overall score? What I mean is, is it possible that the Algorithm contains "time" as part of the equation to determine not only the next questions difficulty, but your overall score? I first thought of this when I was looking over my ESR. I know most people use the ESR's time per question as a gauge on if one is spending too much time on a specific type of questions, but what if it's one of the factors that determines a score?

For example, if you spend 4 minutes on a Medium level PS question and get it wrong, it may decrease your overall score more than say completely guessing on a Hard level DS question after 1 minute. Or say there is 2 questions of the same difficulty. If someone gets the question right or wrong after 1minute versus 4minutes, it could be that the score gets affected more for one more than the other. There could be some correlation between the difficulty of the question and the time you spend on each question.

I apologize if this has been covered or seems silly. Just thought I would throw a random thought out there!

mcelroytutoring · Answer

I seriously doubt that this is true, based on my own anecdotal experiences of taking the test. I've taken the GMAT 4 times, and a couple of those times, I've finished the Verbal section about 10 minutes early. However, it didn't help my score to have finished the questions more quickly. On the tests where I used 100% of the available time (and got a similar percent correct), my Verbal scores weren't any lower than the scores I got when I finished the test with plenty of time to spare.

In addition, GMAC has given us exactly zero evidence to support the idea that question values are weighted depending on how long one takes to answer them.

Narenn · Answer

If it helps, here is the recording of a webinar we had conducted in collaboration with GMATWhiz on GMAT adaptive algorithm.

https://www.youtube.com/watch?v=fz2Ws6IBNeE

	e-GMAT: 60% Off on GMAT Focus Get to 735+ from any starting point
	Magoosh increases your score, guaranteed - 15% off ALL Magoosh GMAT plans with code GMATCLUB15 + Free GMAT Club Tests.
	Manhattan Prep: $99 Bonus Score Higher on Your GMAT Exam. Guaranteed.
	Target Test Prep - Save up to $599 $400 Discount + $99 Bonus GMAT Club Tests
	Experts' Global - ~~$1720~~ $960 End-to-End GMAT Prep + App Support: 150+ Videos, 2000+ Questions, 15 FLTs +
	GMATWhiz - $299 ~~$499~~ Experience the course through a free trial.
	GMAT Club Tests - Free Included with every course purchase of $149+ View all deals

How Does the GMAT Algorithm Work?

Prep Toolkit

Top 5 GMAT Debrief Videos

615 to 715 in 15 Days (Ria's Strategies)

More than Quant/Verbal Abilities: Speed vs. Accuracy

745 with Only Self-Prep and GMAT Club

From 555 to 765: 210-point Improvement

Perfect 805 Score Debrief by Julia

My Rewards