AndrewN wrote:
Furthermore, the scaled score for the Verbal section is not an average of your three sub-scores, although your particular scores might lead you to believe as much.
This is definitely true, and it's fairly easy to see, intuitively, why it would be incorrect to average test subscores to arrive at an overall score. If a test has two sections, and those sections are independent (a test taker's skill at one is independent of their skill at the other), and a test taker gets a 90th percentile score in each section, you might, at first, think that test taker deserves a 90th percentile score overall, when you combine the two section scores. But this 90th/90th percentile score combination is extremely rare. Only 10% of test takers did better on the first section, and only 10% did better on the second. If the sections are independent, then, multiplying probabilities, only 1% of test takers did better on both sections. Now, some test takers outside of that 1% would also deserve a higher overall score than the 90th/90th percentile test taker (someone with a 88th/98th percentile split, for example) but this just illustrates that two high scores are much less likely than just one, and when a test taker has 90th/90th percentile scores, their total score should be much higher than the 90th percentile. The mathematically correct way to combine scores in this situation is to "sum two normal random variables" (something you'd learn about in an undergraduate statistics course), and if you do that with two 90th percentile scores, you end up with a 96.5th percentile score overall.
AndrewN wrote:
I also remember seeing that scaled 50 score in Verbal with no apparent errors, giving rise to the notion within tutoring circles that perhaps those integrated experimental questions really do count. Only GMAC™ knows for sure.
No, we do know for sure, and in the original thread about that V50 score, which is
here, I explained in a lot of detail why it is mathematically impossible for an adaptive test to use experimental questions to calculate a score. I'll summarize here: if a question is experimental, not only does the algorithm not yet know the question difficulty, discrimination, or guessing parameter values. so the algorithm would have no idea what a right or wrong answer even meant, it doesn't even know if the question is a valid test question. Invalid questions give false information about a test taker if they're used on a test, so experimental questions can never and would never be used as part of the score calculation.
There's a different explanation for that V50 score when the test taker answered every Verbal question correctly, and the associated ESR illustrates what happened (the ESR can be found in the above thread) : by fluke, that test taker had a very easy Verbal test on average. That can occasionally happen, because the GMAT does not adapt nearly as predictably as most prep books claim. And if a test taker gets everything right, all the algorithm can really say is "the test taker is above level X". If a test taker only gets V30-level questions, and answers them all correctly -- well that's what a V50 test taker will usually do, and is also what a V51 test taker will usually do. The algorithm can only decide between V50 and V51 by asking "which is more likely?" and V50 is the test taker's more likely level, because more people are V50-level than are V-51 level.
Of course, it wasn't the test-taker's fault that they didn't get many hard questions -- it was a fault in the question pool (and the algorithm didn't handle this very unusual case correctly) -- which is why GMAC revised the test taker's score to a V51 on appeal.
Even if someone disbelieves everything I've said on this topic, and thinks "this V50 score report is evidence that GMAC uses experimental questions to calculate test scores", there is one obvious question that needs an answer: if that V50 was the test taker's correct score because the test taker had a wrong answer to an experimental question, then why did GMAC revise the score to a V51?