Chetan1286 wrote:
hello gmat club experts
i posted a few days ago about a gmatclub cat i gave and score was not believable according to the questions i answered correctly.
but after 2 days i took another one and i was satisfied with the score i got v34 i got 26 questions correctly
but here comes the problem today i took a free veritas prep full length test and i was literally shocked with the scores which are as follows
in verbal i got 28 out of 36 questions correct
out of first 10 questions 8 were correct
out of scecond 10 questions 8 were correct
out of last 16 questions 12 were correct
and i got a v31 score i was bitterly surprised i mean experts say you have to get most questions correct in the beginning part and i did so i can not understand the score
like if someone gets a v42 does that mean that the person got 100% questions correct or what
and shock does not end here
when i took a official gmat prep test i got 16 questions correct of 31 and got q43 score
in same way when i took gmatclub cat of quant i got 16 correct of 31 and got q42 score which was on the same lines as a official test
but in the same veritas prep test i got 22 quant questions of 31 correct and what i get is a q41 score which i think is absolutely rubbish as i did well in begining of the test as well
so my question to the experts is can we or more should we believe on these prep tests or not or maybe they give so low score on free test so that we get nervous and subcribe to theire test or prep courses
so am i correct to say we can reallly believe only on a official gmac practice test. for scoring which i think is always correct in relation to questions we get correct or incorrect
please do reply and let me know what you think
Chetan12861. Here is a post that is extremely relevant to this query:
https://www.gmatclub.com/forum/veritas-prep-resource-links-no-longer-available-399979.html#/2016/06 ... ems-wrong/The points discussed in detail are:
-
The number of right/wrong answers is much less predictive than you think.- Of the “ABCs” of Item Response Theory, Difficulty Level is Only One Element (B)…
- Question delivery values “content balance” more than you think.
- Some questions don’t count at all.
- Every test has a margin of error.
2. Always, on every test, we aim to give you the most accurate estimate of your current ability. The test takers today are tech savvy and smart with access to various prep resources including the official ones. If any test-prep company tries to deceive them by giving a non-predictive score for whatever reasons, it is setting itself up for failure and ridicule, not to mention evading the highest ethical standards the education sector demands by its very nature and because of what is at stake - the careers of test takers.
Also, I am putting down some thoughts by our Academic Head, Brian Galvin, on this topic from another post.
1. Experimental Questions:
On your GMAT you will see *several* unscored, experimental problems that GMAC is running through the pool to gather data. There's a decent likelihood that some of those problems are flawed, including:
-A question is culturally biased (the right answer is right and the wrong answers are wrong, but there's something in the subject matter that favors people from a particular region or background)
-A question is ambiguously worded (in trying to "hide" the key to unlocking the problem, the author was unable to include enough information for it to reasonably or concretely be solved)
-A question is too labor-intensive (it's not "wrong" but it takes too much time to be fair within the time limits of the exam)
-A question has a second correct or defensible answer choice
-An "insufficient" DS statement is actually sufficient if one draws on a field of study (say trig or calculus) that the authors didn't anticipate
-A question is formatted poorly (like what you saw)
-A question is missing key information (like "not drawn to scale" on geometry for example)
-Etc.
Like GMAC said, those questions won't count until they've been statistically confirmed and "graduated" into the live, scored pool. But like you saw, it can be distracting or unnerving to feel like you've found a flaw or are facing an unfair question. So just know in advance that there is a fair likelihood that you'll see an unfair question. And if you do see an unfair question, there's an incredibly low likelihood that that item itself will affect your score. So just do your best and if you're really distracted, tell yourself "it's experimental." I've had multiple students come back from their tests and claim that there was an unsolvable Problem Solving question or a verbal question with the same answer repeated twice or something like that. And there's a good chance that was the "fog of war" talking...their minds were spinning and they were exhausted and they just blew it. But maybe they did see that, and if so you have to be able to tell yourself "it probably doesn't count, so make a reasonable decision and move on."
This is also why it's a terrible idea to try to read into your performance by the difficulty level of the problem in front of you. If you see a dead-easy problem at #15 it may, indeed, be dead easy and you might as well go home because you're performing/scoring so poorly. BUT there's also a fair likelihood that it's an easy experimental problem and they're gathering data on how well a genius like you handles such a problem (if more than a few 750 scorers get a 300-level problem wrong, it's probably flawed and GMAC needs to know that). Or what makes it hard is that most people don't even see the little trap lurking there. Either way, if you see a way-too-easy question or a way-too-hard problem or a potentially-flawed problem, just tell yourself that it could be experimental and do your time-efficient best just in case it does count. And then be ready for the next question.
2. How the *NEXT* question is chosen:
-Adaptive scoring is all about probabilities. The system tries to gauge your ability by looking at your responses and calculating the probability that someone with those responses would be at the 99th percentile, the 95th, the 90th, etc., and its "ability estimate" of you is based on which ability level carries the highest probability at that point. And it delivers questions, too, by scanning the pool of available questions and looking for questions that would have a high probability of providing valuable information about you. So, it's very common for the system to deliver you a question that's a bit below its current estimate of your ability, just because that problem has a high probability of helping the system learn more about you in that range (say, right now the system thinks you're in the 610-670 range, your missing that "easier" problem may help the system realize that you're highly unlikely to be above 660, but getting it right might help to cement your floor at 620). Because of that, you can't look at "a 550-level question must mean that the system thinks I'm below 600." It may just have a high probability of helping the system learn more about your ability near but not exactly at the "difficulty level" of that problem.
-Which brings up another nuanced point about Item Response Theory - the psychometricians behind IRT don't use the term "difficulty level" for questions...that's a test-taker and tutor kind of way of thinking about the problems. They look at the "b-value" which is the ability level at which the question provides the most information about examinees. It's similar to difficulty but not really "difficulty," and what's important about that is that wherever the b-value may lie (say at the 60th percentile) that problem still has a lot of predictive value for ability levels surrounding that. So, again, if the system serves you a 600-level problem it's not necessarily because it doesn't think you can handle a 650...it's just that the system believes it will get more information from that problem than from one at the 650 level, even though it might think you're closer to 650 than to 600 at that moment.
Which even as I'm reading that back may not sound all that convincing, but consider an example like professional sports. The best team in the English Premiership or the NBA never goes undefeated. Even though a great team may never have less than a 60% chance of winning any given game (after all, it's better than any other team), you can learn a lot about that team by seeing how it performs over a 10-game stretch when its likelihood of winning any one game is 70%. (Think about that probability...a 70% chance of winning 1 game means a 49% chance of winning two in a row, and less than 25% of winning 4 in a row). Question delivery is similar - the system can learn a lot about you from seeing how you handle problems that are below your ability level, as well as learning from problems that are above your ability level.
-And I think that builds to this really important part - we in these forums and in classrooms and in textbooks and blog posts...we try to personify the scoring algorithm to make it make sense. But it's just a big data computer. So it's not "thinking" about your ability (hey, so AK125 got this 600-level problem right...I wonder if he's ready for a 650...). It's just assessing the data and assigning questions - whether at, above, or below - its estimate of your ability, based on how much more information it can get about you with the next question. Which can sometimes feel a little underwhelming or disappointing, again because we tend to personify the test and feel like if we got 2-3 questions in a row we've "earned" a "harder" question. But the system doesn't work that way - it isn't concerned with appearances, but rather just mathematically going about its job.
*THAT* said...remember it's all probabilities, so a Q46 means that of all the available scores, it's most likely that you're a 46 and less-likely-but-still-reasonable that you're a 45 or 47 and even less likely but not out of the realm of possibility that you're a 44 or 48. So with any practice test score keep that in mind.
All in all, the actual point of a practice test is to help you figure out your weaknesses. Focusing on how the algorithm works is not very productive and your score is dependent on many many variables. Try to plug all the gaps instead.