OpenAI has introduced GPT-4.1, a new lineup of AI models optimized for coding and instruction-following tasks. This family consists of three variants: GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano, all accessible through OpenAI’s API, but not via ChatGPT. With a remarkable 1-million-token context window, these models can process around 750,000 words in one go, surpassing even lengthy literary works like “War and Peace.”
This launch comes as competitors such as Google and Anthropic intensify their efforts in developing advanced programming AI. Google recently launched its Gemini 2.5 Pro, which also features a 1-million-token context window and has been performing well on coding benchmarks, alongside Anthropic’s Claude 3.7 Sonnet and the upgraded V3 from DeepSeek.
OpenAI aims to create sophisticated AI capable of executing comprehensive software engineering tasks. The company’s vision includes developing an “agentic software engineer,” capable of programming entire applications, managing quality assurance, bug testing, and producing documentation. GPT-4.1 represents a significant stride towards this ambition.
According to OpenAI, the updated models address real-world coding demands and enhance various aspects of development, such as minimizing unnecessary edits and ensuring consistent tool usage. The full GPT-4.1 model reportedly outperforms previous iterations on coding benchmarks, while the mini and nano versions prioritize efficiency, though with slight compromises in accuracy. Pricing for these models varies, with GPT-4.1 priced at $2 per million input tokens, and the nano model at just $0.10.
Although GPT-4.1 scores competitively in benchmarks with scores ranging from 52% to 54.6% on SWE-bench Verified, it still faces challenges, especially in maintaining accuracy with larger inputs. OpenAI’s findings underline that even advanced models require precise prompts for optimal performance, emphasizing ongoing challenges in AI reliability.