LLMs are solving MCAT, the bar test, SAT etc like they’re nothing. At this point their performance is super human. However they’ll often trip on super simple common sense questions, they’ll struggle with creative thinking.
Is this literally proof that standard tests are not a good measure of intelligence?
Sure, tests are bad, but another option is that AI is simply better.
It’d better be. Why’d anyone want to create unintelligent artificial intelligence?
It kind of bothers me that we work hard on making AIs intelligent, and then when one actually starts performing well we go “oh, the test must be bad, let’s change it to make sure the AI still scores poorly compared to humans.” I agree that tests are generally bad but this makes one of the biases we build into them obvious.
Eventually we will be too dumb to tell if it is smarter than us regardless of the tests that we invent.