The artificial intelligence boom has been fundamentally rooted in the belief that larger models inherently possess greater power, and that the most powerful models ultimately achieve dominance. The industry is now on the cusp of discovering the implications should this core assumption begin to falter.
Already, escalating operational costs are compelling users to reconsider smaller, more economical AI models. This emerging trend of cost-conscious model selection is novel, and while its ultimate impact on the industry remains to be seen, a significant shift is widely anticipated.
One prominent prediction, articulated by Coinbase co-founder Brian Armstrong, suggests that this will lead to a substantial majority of tasks transitioning to more affordable models.
“Demand for intelligence is near infinite, but 80% of workloads will be running on 99% cheaper models within 12-18 months,” Armstrong wrote on X. “20% of workloads will still run on latest gen models where IQ maxing is important.”
It is challenging to overstate the profound transformation Armstrong’s prediction would bring to the AI industry if it comes to fruition.
Previously, most AI companies competed primarily on quality, which typically involved defaulting to the most advanced model available. If these same tasks can be managed by less expensive models without compromising performance, it would instigate a massive recalibration of AI economics. Crucially, a substantial portion of these cost savings would directly impact the revenues of major research labs, delivering a financial blow to companies like OpenAI and Anthropic just as they prepare for their initial public offerings.
This represents a potentially seismic shift for the industry, resting on one fundamental question: Are companies prepared to transition to smaller models?
Initial evaluations indicate that, with proper system configuration, more affordable models can effectively substitute without any reduction in quality. In a recent trial by the legal AI tool Harvey, the company successfully reduced inference costs by threefold while maintaining performance. This test, conducted in partnership with the inference platform Fireworks AI, strategically combined Claude Opus and Fireworks’ GLM 5.1, leveraging Opus for the most intensive tasks. The result was a significantly lower load in terms of server time and overall expenditure.
“Quality comes first, and in legal it always will,” Harvey co-founder Gabe Pereyra told TechCrunch, referring to his startup’s AI legal services. He added, “However, the definition of quality is evolving from simply using the most powerful model for everything, to using the best model that gets the right answer most efficiently.”
While this trend is often characterized as a rivalry between major labs and Chinese or open-weight models, such a framing misses the broader point. The true distinction is not between proprietary and open models; it is between large models and small ones. While switching from GPT-5.5 to DeepSeek’s V4 Flash can yield cost savings, a transition to GPT-5.4-mini proves equally effective.
An active price competition is currently underway between in-house inference services from the large labs and those provided by independently served open-weight models. For the larger discussion of small versus large, the specific type of smaller model that ultimately gains traction is less significant.
While the principle of avoiding unnecessary computational resources might seem obvious, it directly contradicts the scaling-first approach that has dominated the industry until now. Inspired by "the bitter lesson," research labs have heavily invested in training the most compute-intensive models possible, pushing the boundaries of AI capabilities. With prices heavily subsidized by investors, clients previously had no incentive to choose anything other than the most advanced option.
As token prices climb and subsidies begin to recede, users are encountering cost pressures for the first time. It remains uncertain whether this new financial strain will genuinely compel enterprise users to adopt smaller models. They could just as easily economize by making fewer API calls, utilizing less contextual data, or simply abandoning less promising deployments.
However, if it is demonstrated that the majority of deployments can operate just as effectively on a smaller model, it could significantly temper the growing demand for inference capabilities – and raise new questions regarding the justification for the substantial costs associated with training frontier models.
The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.