Sep 13

Meta’s Llama 4 Maverick AI Falls Behind Rivals in Benchmark

Meta's unmodified Maverick AI model ranks below GPT-4o, Anthropic's Claude 3.5, and Google's Gemini 1.5 in the LM Arena chat benchmark.

Editorial StaffEditor

Published September 13, 20251 min read512 views1 tags

Originally reported bytechcrunch

Meta’s experimental AI model, Llama 4 Maverick, has fallen behind its competitors in the LM Arena chat benchmark, which measures AI performance based on human-rated conversations. Earlier this week, Meta faced criticism for using an unreleased version of Maverick to achieve high scores on LM Arena. This led to the LM Arena team revising its policies and recalculating scores based on the unmodified version of Maverick, which has not performed well in comparison to rival models like OpenAI's GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 Pro. As of April 11, 2025, Maverick’s unmodified version, "Llama-4-Maverick-17B-128E-Instruct," ranked below these established models. This version of Maverick had been specifically optimized for conversational interactions, which appeared to give it an advantage in earlier tests but did not translate well when assessed by LM Arena’s human raters. Despite Maverick’s high performance in these initial tests, the modified version, designed for broader applications, struggled in the standard evaluation. Meta acknowledged the situation, explaining that the “Llama-4-Maverick-03-26-Experimental” version had been optimized to perform well in specific tasks, such as benchmarking with LM Arena. The company emphasized that it regularly experiments with different variants of its models to gather feedback and improve performance. Meta has now released the open-source version of Llama 4 and looks forward to seeing how developers customize it for diverse use cases. Despite the lower ranking, Meta remains committed to refining Llama 4 and exploring how it can be adapted to meet different demands in AI development.

#news

Editorial StaffEditor

The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.

View all posts

Reader feedback

What did you think of this story?

User Comments

Filter:

No comments yet. Be the first to comment!

View all news

YouTuber Hank Green Calls His AI Use 'Unhealthy

#ainews#hankgreen#chatgpt#unhealthyuse#authenticity

Hank Green, the accomplished novelist, comedian, and YouTuber boasting 3.2 million subscribers, recently issued an apology to his extensive audience regarding his increasing reliance on AI chatbots. T...

3 min readAugust 2, 2026

19m ago

Hot 100 Hit: Is It Just AI Slop?

#ainews#aimusic#fenixflexin#rubberz#aidetection

While absolute certainty remains elusive, evidence strongly suggests the involvement of artificial intelligence. Fenix Flexin, primarily recognized as half of the Los Angeles rap duo Shoreline Mafia,...

6 min readAugust 1, 2026

1h ago

Altman's persistent push: ChatGPT for parents.

#ainews#openai#chatgpt#parenting#aiethics

OpenAI CEO Sam Altman recently shared what he enthusiastically described as a "cool use case" for the company's new product, ChatGPT Work. On Friday, he posted that parents could "connect your family...

2 min readAugust 1, 2026

2h ago