greenoak wrote:
Hi, IanStewart,
It seems that you know this industry very well. May I ask your opinion on the following issues?
a) Do you think it is probable that GMAC has recently changed the scoring algorithm for verbal?
b) In your opinion, is the scoring principle for verbal different from that for quant – and it is because of this the results of practicing tests in verbal are less indicative of the real score? Perhaps, the factor of chance is more important in verbal?
Thanks,
Greenoak.
For a),
The GMAT scoring algorithm is based on 40 years of research, so there's no way they'll make any significant changes. Still, they have scope to change certain parameters (for example, they can decide, more or less freely without affecting the integrity of the scoring, how much more difficult the second question is than the first if you answer the first correctly), so it's possible they've made minor adjustments. If scores are off, as anecdotal evidence on this and other forums suggests (though my students have consistently scored in the 40s on the verbal on recent GMATs, so I don't know what to think), it may not be the algorithm at fault. The ScoreTop issue may have affected question calibration, which could affect current scores. It's not entirely straightforward to explain clearly, but I'll try:
-suppose a diagnostic question is inserted on a test, and ends up in a ScoreTop JJ document, so every test taker who reads the JJs knows the correct answer in advance. Suppose also that there are *a lot* of people reading the JJs;
-the question appears as a diagnostic. Imagine that this question is, in truth, ballistically difficult- a true 51-level verbal question. GMAC doesn't know how difficult the question is until test-takers see it, as a diagnostic. If a lot of ScoreTop readers know the answer in advance, and get it right, GMAC will think, by analyzing responses, that the question is in fact quite easy.
-the question then shows up on a real GMAT, calibrated as an 'easy' question. Most honest test-takers will see it and get it wrong (after all, in truth it is bleeding hard). But because the test thinks the question is easy, test-takers will get heavily penalized for answering incorrectly, dragging down their scores (especially if this happens on more than one question). If the question had been correctly calibrated as a 'damn difficult' question, there would be almost no penalty for getting it wrong; indeed it would be expected.
That example is exaggerated- the effect wouldn't be so pronounced- but if many questions were affected, scores could be affected in an exaggerated way. So, if we grant that there is a problem with recent verbal scores (and while I don't doubt many think that's the case, I would need more evidence to be convinced), it may have nothing to do with the algorithm, and nothing to do with computer error, and may be down to calibration error. There are many places to look if there is a scoring problem. That said, based on the numbers available in the GMAC v ScoreTop court documents, I did a very quick, and very rough, estimate of how much calibration might be affected (posted in the ScoreTop thread on this forum), and my conclusion was that no one should have suffered more than a one point scaled score loss because of this calibration effect. If the number of ScoreTop users was substantially higher than what I found in the court documents, the effect could be much greater.
To answer b), there's an underlying assumption that the Quant section of the GMAT is testing a single ability, and that the verbal section is testing a (different) single ability. It's a very questionable assumption; mathematical ability is, to my mind, a combination of many separate abilities. The verbal section seem to comprise even more disparate abilities- SC, CR and RC seem to test very different things- so one would expect the verbal scores to have greater 'standard error' than quant scores- you would expect your score on the verbal to depend on how many of each question type you see, and where your abilities lie. Still, the GMAT is designed to be have enough questions that standard error is minimized, and I've yet to see any research that indicates that the verbal section has greater standard error than the math; while I intuitively think the verbal section should have less reliable scores than the quant, that's really just speculation.