jallenmorris
Interesting that these two sentences come from the same poster within the same post because, to me, they do not make sense together.
We all know GMAC puts "test" test question on teh GMAT to see how people do and whether they will be used later on. It does lengthen the test as those questions are not scored. I don't like the fact that they do this. Does GMAC factor in the difficulty of an unscored question and how the mere presence of that question changes the entire test?
Out of 37 questions, if you have 34 that count, those 3 questions will change the entire test. If they factor in how those 3 unscored questions alter the variance of the test, then those questions aren't really "unscored" in a literal sense. Just my thoughts on it. If a question is on the test, it has an impact. I think they should put only questions on it that are actually going to be scored.
(Any PhD's or Statistics or Psych majors on this board will be able to contribute/refine/elaborate/correct on this more as not many folks read up on psychometrics )
I can see where you are confused, but experimental questions are not part of the test. On test day, you are actually taking part in two different activities (1) the test and (2) a pre-test of future questions. These experimental questions aren't scored as part of your exam, so they aren't lengthening (1) but (2). The unscored items are used for an item analysis later. This seems counterintuitive, I know, but a test is traditionally considered to be comprised of the stimuli to which your responses are scored or measured in some meaningful way.
Also, when someone in educational measurement speaks of length, s/he isn't usually referring to time, but the number of items on a given instrument. Yes, more questions would = more time, but in the literature you would see more specific reference to the nature of the speeded test in the units of time. As far as unscored items contributing to variance, you have to approach it from the perspective of what variance is and how it is calculated. Variance is the square of the standard deviation, if something is never scored it doesn't contribute in any way to this calculation. I know where you are coming from - what if an experimental question is really tough, and it messes with your head and cause you to waste time and screw up later?
So, in a sense, it can influence variance that way... but it isn't approached like that, because so can a million other things (from the color of the carpet and walls, how many other people are in the room, the room's temperature, whether your girlfriend just dumped you - that may affect your score ). In test construction sometimes certain assumptions have to be made and sometimes certain things have to be simplified, or demonstrated insignficant. This is where the concept of random error is convenient. If you chalk up the affect diagnostic pretest questions have as a random error component (the mean of random errors in a population = 0) It's like saying it doesn't really matter.
Unscored items chosen randomly from some pretest pool getting introduced randomly as you progress through an exam is very different conceptually from the idea of lengthening a test with scored questions to which people may or may not have already had access to. Besides the obvious waste of time, it opens up a pandora's box - it can artificially inflate the mean, it can lower prediction (remember, your score is used to predict how well you will do in bschool) and introduce measurement bias (who has the OG and who doesn't). It can even shift the desired factor composition of the exam (while long term memory plays a role in many knowledge based tests in a general sense, more variance than desired would be accounted for by the memory factor than by the spatial/mathmatical skills/reasoning factors) When you change the factor composistion of a test, it becomes a different test.
All this might seem circular, since error reduces reliability...why would they put up with any potential random errors introduced by pretesting? The answer is because they *have* to do item analysis on new questions and this really is probably the best method of doing it, they accept the trade off as something inescapable. Adding unscored questions as a way to mess with your head is not an efficient method of discrimination b/c it throws out valuable information (arguably the most valuable info - did you get the ? right or wrong). If GMAC wanted more accurate estimates of examinees true scores they would acheive this with scored questions.
Ian - you make an even better point that I overlooked, which is that there is no reason to even include a known question as an unscored diagnostic item b/c they don't have to test it - its retired! Parsimony is a wonderful thing. I was trying to be polite and not say to the OP "hey, dude you are either lying or crazy because you didn't see an OG question on the test".
But I doubt that the diagnostic questions are first sorted on any a priori grounds. Empirical test construction on this scale is quite scientific, in this case randomization is a key component of the process that can't be overlooked. It would be questionable if they decided a priori that X is a 700 level question and to confirm their hypothesis by testing it out in a group that displays a restricted range of scores (correlation analysis on restricted ranges can present other problems). It would be more acceptable to collect data from random samples and let this aggregate data tell them it is a 700 level question.
Again, this can seem illogical at first, b/c it means a question like 2x=4 could possibly make its way into the difficult item pool... but hey, welcome to the world of psychological measurement

not everything makes sense here!!!( why would answering false to "I sometimes tease animals" show up on a scale that measures hysteria? (this is true -BTW))