The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.
Meta’s Llama 4 Maverick AI Falls Behind Rivals in Benchmark
Meta's unmodified Maverick AI model ranks below GPT-4o, Anthropic's Claude 3.5, and Google's Gemini 1.5 in the LM Arena chat benchmark.

Originally reported bytechcrunch
Meta’s experimental AI model, Llama 4 Maverick, has fallen behind its competitors in the LM Arena chat benchmark, which measures AI performance based on human-rated conversations. Earlier this week, Meta faced criticism for using an unreleased version of Maverick to achieve high scores on LM Arena. This led to the LM Arena team revising its policies and recalculating scores based on the unmodified version of Maverick, which has not performed well in comparison to rival models like OpenAI's GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 Pro.
As of April 11, 2025, Maverick’s unmodified version, "Llama-4-Maverick-17B-128E-Instruct," ranked below these established models. This version of Maverick had been specifically optimized for conversational interactions, which appeared to give it an advantage in earlier tests but did not translate well when assessed by LM Arena’s human raters. Despite Maverick’s high performance in these initial tests, the modified version, designed for broader applications, struggled in the standard evaluation.
Meta acknowledged the situation, explaining that the “Llama-4-Maverick-03-26-Experimental” version had been optimized to perform well in specific tasks, such as benchmarking with LM Arena. The company emphasized that it regularly experiments with different variants of its models to gather feedback and improve performance. Meta has now released the open-source version of Llama 4 and looks forward to seeing how developers customize it for diverse use cases.
Despite the lower ranking, Meta remains committed to refining Llama 4 and exploring how it can be adapted to meet different demands in AI development.
ES
Editorial Staff Editor
View all posts
Filter:
No comments yet. Be the first to comment!
Related stories
xAI's Anthropic Deal: What's the Catch?
#ainews#anthropic#xai#spacexipo#neocloud
A significant partnership has been announced between Anthropic and xAI, with Anthropic acquiring the entirety of the compute capacity at xAI’s Colossus 1 data center located in Tennessee. This develop...
1d ago
Wispr Flow's Audacious Bet on India's Voice AI Challenge
#ainews#wisprflow#indiamarket#voiceai#hinglish
Indian internet users extensively leverage voice notes, voice search, and multilingual messaging. However, transforming these prevalent habits into a scalable AI business presents significant hurdles...
1d ago
Heard AI Terms? Stop Nodding, Start Understanding.
#ainews#aiterms#aiglossary#agi#aiagents
Artificial intelligence is rapidly transforming the world, simultaneously coining an entirely new vocabulary to articulate its mechanisms. Even a brief engagement with AI topics quickly introduces ter...
2d ago