Thank you for using the timer - this advanced tool can estimate your performance and suggest more practice questions. We have subscribed you to Daily Prep Questions via email.

Customized for You

we will pick new questions that match your level based on your Timer History

Track Your Progress

every week, we’ll send you an estimated GMAT score based on your performance

Practice Pays

we will pick new questions that match your level based on your Timer History

Not interested in getting valuable practice questions and articles delivered to your email? No problem, unsubscribe here.

Thank you for using the timer!
We noticed you are actually not timing your practice. Click the START button first next time you use the timer.
There are many benefits to timing your practice, including:

Let the debate begin!!! [#permalink]
15 Nov 2006, 15:06

For GSB, odds of an interview. From 2009 data, source admissions411

Coefficients in question, 1 equating to interview, 0 equating to not.

Intercept -0.460851231
GMAT 0.003873504
VERB 0.0023856
QUANT -0.029936274
TIMES TAKEN 0.066353128
AGE -0.001569024
GPA 0.215000635

Where's nationality and job function you ask? I'm too lazy to map them to values and use them as dummys. Maybe if i get unlazy I'll do it.

From 2008 data, odds of acceptance, assuming unknown = ding. Again 1 equating to accept, 0 to not.

Coefficients

Intercept -0.501643016
GMAT 0.000804336
VERB 0.008032123
QUANT -0.005540867
TIMES TAKEN -0.023042997
AGE -0.026421802
GPA 0.260872977
ALUMNI RECS 0.015136345

This is decidely disconcerting if there's any truth to it. It places my odds at
pretty crappy right now.

On the other hand, Adjusted R Square 0.08175456

The whole thing means nothing, cause I did a sloppy job.

Some other tidbits I need to do to add some value:

Set up an over 21 proxy for the age or change it to years of exp assuming a 21 year old grad date. (Right now the implication is being 10 years old increases my odds)

Setup proxy for industry and country.

Pull in a much larger data set across the top 20 schools and see what happens with a data set on the order of 2000+

Maybe if I have some time tomorrow I'll do it.

Ok i did the work exp thing. No real change here -

Coefficients
Intercept -1.056500848
SCORE 0.000804336
VERB 0.008032123
QUANT -0.005540867
TIMES TAKEN -0.023042997
WORK EXP -0.026421802
GPA 0.260872977
RECS 0.015136345

Implication is that there is some bias against age. The whole data blows because all I have is 700+ to begin with, so its crap. Someone find me data sets in the 600 range.

Wharton, GSB, Kellogg, Haas data set, 2008, removed incomplete entries, removed all unknowns (as i expect at least some of these are people who just never came back to update), sample size of about 1000 after clean up.

I'd like to reply, but I'm just not sure what to make of that cr@p. I need to take a closer look at it later.

Heh. Crap is the right word.

I was trying to run a regression to determine how different variables play into the model of accept or deny.

In short, the data suggests that GMAT is worth nothing and GPA is worth everything. The reason for this is that there simply isn't enough data below the 700 mark. There's plenty above, but little below. In short, the regression is worthless.

I'd like to reply, but I'm just not sure what to make of that cr@p. I need to take a closer look at it later.

Heh. Crap is the right word.

I was trying to run a regression to determine how different variables play into the model of accept or deny.

In short, the data suggests that GMAT is worth nothing and GPA is worth everything. The reason for this is that there simply isn't enough data below the 700 mark. There's plenty above, but little below. In short, the regression is worthless.

That was my first reaction to your message; that the most important factor was GPA, and that there was a negative correlation to GMAT Quant. As you have said, the real problem is with the data set. It is reasonable to believe that GMAT will be more of a factor as it gets lower from the average; by the time GMAT is below 640, it might be the single more important factor of all (nearly impossible to overcome).

On the other hand, if scores are artificially limited to 700+ (as they are here), but other factors are allowed to flow freely (more or less), then the other factors will clearly gain in importance.

I think of it this way. Each of the following could result in an "easy deny":
1. really low GMAT
2. really low GPA
3. really outrageous age
4. really horrendous recs
5. really low grade work experience

If you remove the "easy denies" with low GMATs, but leave the other "easy denies" in place, then clearly they will factor in more obviously.

i also tried to make some sense on the relative importance of parameters.

I think that linear regression is not suitable here...
there are too many non-parametric variables that are arbitrarily modeled as [0,1]. in general regression model tend to work better for variables that are parametric, and preferably linear.
also, the GMAT score in itself is not a linear parameter, i.e. the difference between 620 to 650 has different meaning (and effect) than the difference between 720 to 750. a linear model cannot model such difference.

so i'm not surprised with rhyme's result are not so good (and its not that rhymes work is crap... actually it seems that you did good work... but you ran into the theoretical limitations of regression).

i'd approach it differently. to check the effect of GMAT score, i'd compare the GMAT scores of those who accepted/dinged using t-test. these tests are better to model connection between parametric/non-parameteric variables.

to see the multi-dimensional (or multi-variate) connections, i'd use factor analysis that would help explain the source of the variance in the target variable in terms of variances of the dependant variables.

reading the thread (and my post) again...
it might be that i confused factor analysis and "analysis of variance" (also known as ANOVA)... i can check it if you'd like. long time since i used them in practice...

also, if we want to still purues the linear model there are 3 things that may help it to be more accurate:
a) normalizing scale. i'm not sure if you did that or not, but if you'd like for the coefficient to represent relative importance you need all parameters to work on the same scale. i.e. if you work with gmat score, divide it by 800 to have a 0...1 scale (or better, since there is no data on lower scores, better substract 550 from the score and divide by 250. same with GPA (divide by 4 or substract something and normalize) etc...
if all variables are normalized to 0..1 scale the correlation remains the same, but coefficients can be compared. but again... you might have already done that

b) to overcome the non-linearity of gmat score (and quant/verbal score as well), you can use percentiles instead. percentiles are, by definition, a linear parameter.

c) instead of 0/1 target variable (0-dinged,1-accepted), you can elaborate it further to represent more information. for example: 0-dinged, 1-interviewed but dinged, 2-accepted
or even better (if you have the data):0-dinged,1-interviewed but dinged, 2-waitlisted but rejected, 3- waitlisted and accepted, 4-accepted.