

12·
1 day agoThey also polled 10,000 people to compare against a human baseline:
Turns out GPT-5 (7/10) answered about as reliably as the average human (71.5%) in this test. Humans still outperform most AI models with this question, but to be fair I expected a far higher “drive” rate.
That 71.5% is still a higher success rate than 48 out of 53 models tested. Only the five 10/10 models and the two 8/10 models outperform the average human. Everything below GPT-5 performs worse than 10,000 people given two buttons and no time to think.
What assumptions do you mean? I’ve seen a few people say that, but I don’t actually understand what they’re referring to. Here’s the text of the question posed in the article:
The question specifically notes they want to wash their car, so that part isn’t left to assumption. Even if you don’t assume an automatic car wash, would you assume they have a 50m hose? Or that you could plausibly walk that far away with something from the car wash to wash your car?
Personally, I’d agree with the assessment of the article, that the only plausible way to get the question “wrong” would be to focus too much on the short distance, missing/forgetting that the purpose of the trip requires you to have the car at the destination. (Not too surprising that 30% of people did lol)