In 2024, some unusual AI benchmarks gained popularity, with one of the most talked-about being a video of actor Will Smith eating spaghetti.
This bizarre test has become a meme and a benchmark to evaluate the realism of AI video generators. Smith himself even parodied the trend on Instagram in February.
While this is an odd example, it is not the only quirky benchmark that took off this year. A 16-year-old developer created an app that allows AI to control Minecraft and build structures, while a British programmer developed a platform for AI to play games like Pictionary and Connect 4.
These strange tests have captured the attention of the AI community, even though there are more formal academic benchmarks used to assess AI performance.
The rise of these unconventional benchmarks may be due to the fact that many traditional AI tests focus on complex, academic challenges like solving Math Olympiad problems or offering advanced solutions to PhD-level tasks.
While these are impressive, they do not resonate with the average person, who may use AI for simpler tasks like email responses or basic research.
Furthermore, popular public AI benchmarks, such as Chatbot Arena, allow anyone to rate AI performance. Still, the reviews often reflect the preferences of industry insiders, making them less relevant to the general public.
Weird benchmarks like Will Smith eating spaghetti or AI playing Minecraft may not be scientifically rigorous, but they are entertaining and easier for most people to understand. Despite their lack of empirical value, these fun tests have become a popular way to gauge AI’s capabilities in a more accessible way.