Thank you for using the timer!
We noticed you are actually not timing your practice. Click the START button first next time you use the timer.
There are many benefits to timing your practice, including:
A Conceptual Overview of Adaptive Tests [#permalink]
22 Oct 2004, 10:36
This post was BOOKMARKED
Many observers have criticized the difficulty of the questions that appear in the OG (the Official Guide). The problem is not so much that the OG items are too easy as it is that the OG, like a traditional pencil and paper test, features items that range across the p scale. The high probability items (the easy questions) are generally of little interest to students who are seeking scaled scores at the far right tail. Suppose we group items into seven difficulty strata with the middle stratum called stratum x. The majority of items that one would encounter on a traditional pencil and paper test would come from the middle strata near x (i.e. x-1, x, x+1). In a traditional test, you would encounter very few questions at the top of the range (x+3) since these questions have undesirable characteristics for most test takers (it would provide low differentiation (discrimination) among low and medium ability test takers).
Now suppose we create an adaptive test from this body of traditional questions. The first test item administered to each student would come from the middle stratum (stratum x). If it is answered correctly, we move to stratum x+1 for the second question. If the first question is answered incorrectly, we move to stratum x-1 for the second question. If the first two questions are both correct, we move to stratum x+2. If this questions is answered incorrectly, we move back down to stratum x+1. At the end of the test we would combine the information concerning difficulty and number of questions answered correctly and incorrectly to obtain an estimate of the net number of items this student would have answered correctly on a traditional paper based test (this is the raw score). The estimated raw score on the paper test is then used to read the associated scaled score from the paper test's raw to scaled score conversion table.
Last edited by Hjort on 15 Aug 2005, 14:47, edited 1 time in total.
Some readers might question what to do with test takers who do not finish a section. Suppose that there are 36 questions in each section that contribute to the final score (we have eliminated any validation questions). Suppose that test taker performs perfectly on the first 18 questions but runs out of time. Should this student get a perfect score for having missed none of the questions she attempted? No, this would seem to benefit this student and penalize other students who rationed their time to answer all 36 questions. One way to correct this problem is to adjust each student's raw score by the proportion of questions she answered. Thus, we would cut this student's raw score in half since she only finished half of the section. Of course, cutting the raw score in half would not necessarily cut the scaled score in half.
Another important issue for adapative and pencil-paper examinations is the apparent variability of scores. Suppose two students receive a 500 each the first time they take the exam. Student A then receives a 520 on the second exam and sees this score as proof that her study techniques are working. Student B takes the exam again and scores 480 and thus believes that his study techniques are actually causing his skills to decrease. Unfortunately, all of these scores are consistent with the two students having a true score of about 500. Since the SEM is nearly 30 points, we would expect about two thirds of students to receive scores within 30 points of their true score on any given administration of the test. Thus, it would not be a great surprise for a student to take the GMAT on Saturday and receive a 570 and then receive a 600 the next Monday. Indeed, given a large number of test takers, we should not be surprised to see several students with observed scores 50 or more points above their true score.
Many of the impressive claims made by "test preparation" companies do not fare well when one considers the impact of the inherent variability of observed scores.
There is a common argument that the GMAT is irrelevant to MBA admissions since schools can use so many other admissions factors in making their choices.
It is crucial to remember, however, that the GMAT has far more predictive power than most other admissions criteria. For instance, the median correlation of V,Q,andAWA scores with first year MBA grades was .42 while the median correlation of undergrad grades with first year MBA was only .25. When the GMAT and undergrad grades are combined the correlation increases to 0.47. Thus, assertions by schools that they weigh all admissions factors equally should be viewed skeptically (of course the correlation varies from school to school).
The median correlation for undergrad GPA and first year MBA is 0.25 as corrected above.
A similar study of Executive MBA programs has revealed that the median correlation between first year MBA GPA and GMAT score was 0.49. The correlation between undergrad grades and first year MBA was only .22.
The GMAT V score alone had a correlation of 0.38 while the Q score alone had a correlation of 0.44. Even the often marginalized AWA score had a correlation similar to that of the undergrad grades (.22).
What might be the most interesting revelation of this study is the limited predictive value of some other variables. For instance, the number of year of work experience had virtually no association with academic success (correlation of -.02) while entering base salary was extremely weak (correlation of .09).
While it is important to stress that these data are for EMBA programs, it is intriguing how the other variables have even less predictive power than the AWA alone!
By chance i happened to check these posts today only. Interesting findings. I would like to know how these studies were formulated. Are you referring to a simple regression on Grades versus say GMAT score? What were the partial regression coefficients. Were they significant?
What was the goodness of fit when you used all the variables - GMAT score, AWA, Grades, Work experience and salary.
I also had a feeling that work experience might infact have a negative correlation with Grades....
All good questions regarding the admissions validity studies. I have only been able to read very brief summaries of these studies so I cannot comment on them in any detail. They appear to be based on simple regressions. Further, the overall goodness of fit for the multiple regression model is probably not great (but still pretty good when compared to other variables used to predict academic success).
I have been looking through the Hjort Test Library and found some interesting tidbits from the early 1980s. In this set of tests, the order of difficulty was extremely strict, most sections started with "Easy" questions, had many medium questions in the middle, and some hard questions at the end. In a few sections there were very easy questions at the beginning and a few very hard at the end. Not surprisingly, the test with the most very difficult Quant questions had the highest quant scaled score.
The subscores of the GMAT have exhibited some interesting trends. For instance, the mean for the Q section from 6/1994 through 3/1997 was 32 with an SD of 9. The mean for 1/2000 through the end of 2002 was 35 with an SD of 10. Thus, a score of 41 which was once one SD above the mean is now only about half an SD above. Comparing the same two periods the verbal subscore mean fell from 28 to 27 with the SD remaining at 9. At the same time, the mean for the AWA has increased from 3.8 to 4.0 while the SD increased from 0.9 to 1.0.
Some interesting insights into changes in test scores-
It appears that in both nominal scores and percentiles the lower edge of the GMAT distrubtion have increased greatly over the past twenty years. In the late 1970s and early 1980s some of the most selective schools still enrolled at least 10% of their students with scores near average. Not surprisingly, the difference between the score of the 90th percentile and 10th percentile matriculant has decreased considerably as well.
Duke early 80s had a 10th percentile of 500 whereas in the early 2000s it was about 650. Thus, the lower edge a Duke went from about average to considerably above average in 20 years. Likewise, the center of the distribution increased about 150 points from 550 to 700.
Columbia had a difference of 180 points in the early 1980s from the 90th to the 10th. Twenty years later the spread was only some 90 points.
Yale had a difference of 180 that has since fallen to about 100.
i read your thread and it good, you will be happy to know it sticky now. keep posting. i like the inforatmion abt the test. i would like to ask you which is the most appropriate day to take test to maximise score