Building on our momentum
Just last week, we celebrated a major milestone: the public preview launch of three new first-party Microsoft AI models in Microsoft Foundry: MAI-Image-2, MAI-Voice-1, and MAI-Transcribe-1. Together, they represent a comprehensive multimedia AI stack purpose-built for developers that spans image generation, natural speech synthesis, and enterprise-grade transcription across 25 languages.
The response from the developer community has been incredible, and we're not slowing down.
Fast on the heels of that launch, we're thrilled to introduce the next addition to the MAI image generation family: MAI-Image-2-Efficient – or Image-2e for short. It’s now available in public preview in Microsoft Foundry and MAI Playground.
What makes MAI-Image-2-Efficient unique?
MAI-Image-2-Efficient is built on the same architecture as MAI-Image-2 which is the model that debuted at #3 on the Arena.ai leaderboard for image model families. Based on customer feedback, we’ve now improved it and engineered for speed and efficiency.
It’s up to 22% faster with 4x more efficiency compared to MAI-Image-2 when normalized by latency and GPU usage1. It also outpaces leading text-to-image models by 40% on average2.
In short, MAI-Image-2-Efficient gives developers more output for less compute, unlocking a whole new category of use cases.
Who is MAI-Image-2-Efficient for?
MAI-Image-2-Efficient is designed for builders who need high-quality image generation at speed and scale. Here are the top use cases where Image-2-Efficient shines:
- High-volume production workflows: E-commerce platforms, media companies, and marketing teams often need to generate thousands of images per day, as part of their business processes to generate targeted advertisements, concept art and mood boards. MAI-Image-2-Efficient's superior efficiency means larger batches at lower GPU cost, so your team can think and iterate as fast as you want and reach the end-product faster.
- Real-time and conversational experiences: When users expect images to appear mid-conversation (in a chatbot, a creative copilot, or an AI-powered design tool), every millisecond counts. Thanks to its lower latency, MAI-Image-2-Efficient serves as an excellent backbone for interactive applications that require fast response times.
- Rapid prototyping and creative iteration: MAI-Image-2-Efficient enables your team to quickly and affordably test new pipelines, experiment with creative ideas, or refine prompts. You don't need the complete model to validate a concept; what you need is speed, and that's exactly what MAI-Image-2-Efficient provides.
MAI-Image-2 vs. MAI-Image-2-Efficient — which should you use?
MAI-Image-2-Efficient and MAI-Image-2 are built for different strengths, so choosing the right model depends on the needs of your workflow.
MAI-Image-2-Efficient is the ideal choice for high-volume workflows where latency and speed are priorities. If your pipeline needs to generate images quickly and at scale, MAI-Image-2-Efficient delivers without compromise.
MAI-Image-2 is the recommended option when your images require precise, detailed text rendering, or when scenes demand the deepest photorealistic contrast and smoothness.
The two models also have distinct visual signatures:
- MAI-Image-2-Efficient renders with sharpness and defined lines, making it a strong choice for illustration, animation, and photoreal images designed to grab attention.
- MAI-Image-2 delivers smoother, more nuanced contrast, making it the go-to for photorealistic imagery that prioritizes depth and subtlety.
Try it today
MAI-Image-2-Efficient is available now in Microsoft Foundry and MAI Playground. For builders in Foundry, MAI-Image-2-Efficient starts at $5 USD per 1M tokens for text input and $19.50 USD per 1M tokens for image output.
And this is just the beginning. We have more exciting announcements lined up; stay tuned for what we're bringing to Microsoft Build 2026.
References:
- As tested on April 13, 2026. Compared to MAI-Image-2 when normalized by latency and GPU usage. Throughput per GPU vs MAI-Image-2 on NVIDIA H100 at 1024×1024; measured with optimized batch sizes and matched latency targets. Results vary with batch size, concurrency, and latency constraints.
- As tested on April 13, 2026. Compared to Gemini 3.1 Flash (high reasoning), Gemini 3.1 Flash Image and Gemini 3 Pro Image: Measured at p50 latency via AI Studio API (1:1, 1K images; minimal reasoning unless noted; web search disabled). MAI-Image-2, MAI-Image-2-Efficient, GPT-Image-1.5-High: Measured at p50 latency via Foundry API.