Microsoft is making a bold move to establish its own identity in the generative AI landscape. Through its specialized research arm, Microsoft AI, the company has unveiled three new foundational models designed to handle text, audio, and video. This release marks a significant step in the tech giant’s effort to build a proprietary multimodal stack, positioning itself as a direct competitor to labs like Google and even its high-profile partner, OpenAI.
The New MAI Trio: Speed and Multimodality
The rollout includes three distinct tools optimized for efficiency and practical utility, now accessible via Microsoft Foundry and the MAI Playground testing environment.
MAI-Transcribe-1
This speech-to-text model supports 25 different languages. It is built for high-performance environments, operating 2.5 times faster than Microsoft’s previous Azure Fast offering. It enters the market with a competitive price point of $0.36 per hour.
MAI-Voice-1
Focused on audio generation, this model is remarkably fast, capable of generating 60 seconds of audio in a single second. It also offers users the ability to develop custom synthetic voices. Pricing is set at $22 per one million characters.
MAI-Image-2
While the name suggests still photography, this is actually a video-generation model. After a brief testing period in the MAI Playground, it is now seeing a wider release. Pricing for this model is split: $5 per million text input tokens and $33 per million image output tokens.
The Strategy of “Humanist AI”
These models are the first major output from the MAI Superintelligence team, a specialized research group formed in late 2025. The team is led by Mustafa Suleyman, the CEO of Microsoft AI and co-founder of DeepMind.
Suleyman describes the vision for these tools as “Humanist AI.” This philosophy prioritizes human-centric design, focusing on how people naturally communicate and ensuring the models are trained for practical, real-world applications rather than just theoretical benchmarks.
Balancing Independence and Partnerships
Despite these internal advancements, Microsoft is not abandoning its relationship with OpenAI. In recent discussions with The Verge, Suleyman reaffirmed the company’s commitment to the partnership, which involves over $13 billion in investment.
However, a recent renegotiation of that deal has cleared the path for Microsoft to pursue its own independent superintelligence research. This “dual-track” strategy mirrors Microsoft’s approach to hardware: the company continues to buy industry-standard components from partners while simultaneously developing its own in-house alternatives to ensure long-term flexibility and cost control.







