Microsoft's AI Offensive: Three Foundational Models Tackle Rivals

Originally reported bytechcrunch

Microsoft AI, the technology titan's dedicated research division, recently unveiled three groundbreaking foundational AI models capable of generating text, voice, and images. This significant announcement marks a pivotal moment in the company's strategic push to expand its proprietary multimodal AI ecosystem.

The release underscores Microsoft's ambition to contend with leading AI research institutions by developing its own comprehensive suite of models, even as it maintains its significant partnership with OpenAI. This move highlights Microsoft's dual strategy in the rapidly evolving artificial intelligence landscape.

Among the new offerings, MAI-Transcribe-1 provides speech-to-text transcription across 25 diverse languages and demonstrates a 2.5-fold speed improvement over Microsoft’s Azure Fast service, according to an official company press release. MAI-Voice-1, an audio-generating model, empowers users to synthesize 60 seconds of audio content in a mere second and facilitates the creation of personalized voice profiles. Rounding out the trio, MAI-Image-2 stands as a sophisticated video-generating model.

MAI-Image-2 initially debuted on MAI Playground, a novel large language model testing software, on March 19. Currently, all three models are being made available through Microsoft Foundry, with the transcription and voice models also accessible within the MAI Playground environment.

These advanced models were developed by Microsoft’s MAI Superintelligence team, an AI research group established and announced in November 2025. This dedicated team operates under the leadership of Mustafa Suleyman, who serves as the CEO of Microsoft AI.

“At Microsoft AI, we’re building Humanist AI. We have a distinct view when creating our AI models — putting humans at the center, optimizing for how people actually communicate, training for practical use,” Suleyman articulated in a recent blog post. He further added, “You’ll see more models from us soon in Foundry and directly in Microsoft products and experiences,” signaling future innovations.

In the burgeoning and highly competitive large language model (LLM) landscape, MAI aims to differentiate its models through competitive pricing. The company indicated in its blog post that these new offerings are designed to be more cost-effective than those from competitors such as Google and OpenAI.

Specifically, MAI-Transcribe-1 is priced starting at $0.36 per hour. MAI-Voice-1 begins at $22 per 1 million characters, while MAI-Image-2 starts at $5 for 1 million tokens for text input and $33 for 1 million tokens for image output.

Despite the introduction of its own suite of models, Suleyman reiterated Microsoft's enduring commitment to its collaboration with OpenAI in an interview with VentureBeat. He also disclosed to The Verge that a recent renegotiation of their alliance had specifically paved the way for Microsoft to independently advance its superintelligence research initiatives.

Microsoft has poured over $13 billion into the AI research laboratory and integrates OpenAI's models across its diverse product portfolio via a multi-year strategic partnership. The company adopts a parallel strategy in chip development, both manufacturing its own semiconductors and procuring them from external providers, reflecting a broad approach to technological independence and collaboration.

#AI#News#Tech

Editorial StaffEditor

The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.

Microsoft's AI Offensive: Three Foundational Models Tackle Rivals

What did you think of this story?

User Comments

Trump's AI Crackdown on Anthropic: Who Profits?

iOS 27 Unlocks iPhone's Practical AI Future, Beyond Siri

Signal's Whittaker: 'AI Chatbots Aren't Your Friends