Discover more from sebjenseb
Optimizing the GRE
Ideally, the ETS would administer an IQ test and call it the GRE, but because we can’t have nice things, we have to settle for testing something that is close to intelligence. However, that doesn’t mean that the GRE (aka the crystalized intelligence test) cannot be improved upon.
Currently, the GRE is using this format:
The whole thing takes 158 minutes to complete, which is better than the 4 the GRE used to take, though this has come at the cost of only having 54 questions to measure V/Q. Currently, these are the biggest problems I have with the test:
The math is way too easy. ~3% of people score at the ceiling.
The score increments are too large. I really don’t see the point in going from the 200-800 system to the 130-170 one.
The test has 54 questions. Even in the old GRE that took 4 hours to test, the reliability was “only” .92 for the verbal section, .93 for the quant section, and .77 (!) for the writing section. I wouldn’t be surprised if the reliability of the new GRE is sub .9 for all sections.
I would suggest the following changes:
Keep the writing section. This may be somewhat controverisal, but on the SAT, the score on the writing section is the best predictor of first year GPA out of the 3 sections, meaning that it is valid.
Add two speeded sections for Quant/Verbal where the paricipant answers as many quick questions of easy to average difficulty (e.g. sentence completion, is A or B bigger) in 5 minutes.
Add ability estimate sections for each question type. In these sections, the participant starts being asked questions of average difficulty, and have 30 seconds to 1 minute (depending on the qustion type) to answer each question. The difficulty of the next questions is then adjusted based on prior performance on the test. Each of the 6 sections would take 10 minutes, except for the sections for data interpretation and reading, which would take 20 minutes.
The ability estimate sections could be improved by having prior performance on the test contribute to the difficulty of the starting question, improving the reliability of the estimate.
The GRE is then scored the following way:
The quantitative and verbal ability estimates are computed by taking the general factor of each ability estimate (including the speeded test) and computing factor scores.
The essay is scored with machine learning by having the text be the parameter and the composite verbal/mathematical scores be the dependent variable. This would best be kept secret, as students would try to game the algorithm by adding a lot of superfluous vocabulary. (note: I wouldn’t be that surprised if machine learning is already used to grade these essays). Then, the essay could be sent to an LLM which checks if it wrote the essay about the same topic, and then grades the essay. The machine learning and LLM ratings are then weighed based on how well they measure quant/verbal ability.
All scores are converted to the 200-800 scale.
This would be better than the status quo because:
The speeded sections would allow for a higher ceiling, as there is a lot of natural variance in the speed at which people can get correct answers on simple questions (see wonderlic).
More g-loaded sections on the GRE would be given more weight.
Adjusting the difficulty of the questions while the test is being taken would be more efficient.
ML and LLMs would grade the essay faster than human raters.