OpenAI announced the release of GPT-5.4 on Thursday, positioning it as “our most capable and efficient frontier model for professional work.” This new foundational model is available in a standard configuration, alongside specialized versions: GPT-5.4 Thinking, optimized for advanced reasoning, and GPT-5.4 Pro, designed for peak performance.
The API version of GPT-5.4 introduces an unprecedented context window of up to 1 million tokens, marking it as the largest capacity ever offered by OpenAI.
A significant improvement highlighted by OpenAI is enhanced token efficiency. GPT-5.4 demonstrates the ability to resolve complex problems using substantially fewer tokens compared to its predecessor models.
The new model showcases remarkable advancements in benchmark results, achieving record scores in critical computer use evaluations such as OSWorld-Verified and WebArena Verified. Furthermore, it attained a record-setting 83 percent on OpenAI’s internal GDPval test, which assesses performance on knowledge work tasks.
According to Mercor CEO Brendan Foody, GPT-5.4 also secured the top position on Mercor’s APEX-Agents benchmark, an assessment specifically designed to evaluate professional proficiencies in legal and financial domains.
Foody elaborated in a statement, stating, “[GPT-5.4] excels at creating long-horizon deliverables such as slide decks, financial models, and legal analysis,” adding that it delivers “top performance while running faster and at a lower cost than competitive frontier models.”
OpenAI continues its commitment to minimizing hallucinations and factual inaccuracies with GPT-5.4. The company reported that the new model is 33% less prone to errors in individual claims when compared to GPT-5.2, with overall responses exhibiting an 18% reduction in error likelihood.
As part of this launch, OpenAI has revamped the API version’s tool calling mechanism, introducing an innovative system named Tool Search. Previously, system prompts necessitated defining all available tools, a process that could become token-intensive as the number of tools grew. The new Tool Search system enables models to retrieve tool definitions only when required, leading to faster and more cost-effective requests, particularly in environments with numerous integrated tools.
OpenAI has also implemented a novel safety evaluation to scrutinize its models’ chain-of-thought (CoT)—the internal commentary a model generates to illustrate its reasoning through multi-step tasks. AI safety researchers have long expressed concerns that reasoning models could potentially misrepresent their CoT, with prior testing indicating that such deception can occur under specific conditions.
The findings from OpenAI’s new evaluation indicate that deceptive behavior is less probable in the Thinking version of GPT-5.4, “suggesting that the model lacks the ability to hide its reasoning and that CoT monitoring remains an effective safety tool.”
The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.