Glad to hear you're enjoying the test, Haik, and congratulations on achieving so much success with them! A few thoughts on scoring reliability:
-Our tests are administered and scored using Item Response Theory (the same system that the GMAT and GRE use), and so both user ability and question difficulty is determined using our ~4.5 million user responses in the system. That's why we've been able to enjoy such (relatively) accurate scoring - we're not guessing at question difficulty via "easy/medium/hard," but instead using the same kind of data-driven system that powers all the official tests.
-Similar to the official tests, though, the data is a little less robust at the ends of the bell curves. The folks at GMAC have admitted that they're less confident in the accuracy of a 750 vs. a 770 than they are in the accuracy of a 650 vs. 670, and we'd have to admit the same. Given that most users will at some point see 650-level questions (whether they're 700 level but missed a couple "easy" ones or they're a 600 level but got a few hard ones right), but significantly fewer will see the 750-level questions, the data just isn't as powerful at the poles.
-Pursuant to that...I just went into the system to check the error margins on your tests, and as I'd kind of expect you're probably looking at a 40-50 point standard error on those as opposed to the normal ~30. So I'd probably take from your test that you're "poised to score well above 700" but not necessarily that you're a 790 scorer.
-Anecdotally, I trust our quant scores almost to the exact point...I've had students take as many as 10 quant sections with really accurate scoring each time and congruent to their test day scores. Verbal is just a little less precise, in large part because the item difficulty and user abilities are all calculated using user stats, and those verbal skills haven't quite produced as concrete of IRT data curves as the quant skills have (which, as I've read in academic literature, is what a lot of these tests find, too). So I've recommended to my own students that they take their verbal score with a few points on either side as a range, whereas their quant score is something that they can be very, very confident in. The overall net effect is that the scores have been even more accurate than we would have predicted when we started, but particularly at the upper limits that "within 30 points" stat widens just a little bit largely due to an increased verbal error margin.
I hope that helps...