Microsoft Ramps Up AI Chip Race With Google and Amazon

Microsoft Maia AI chip

Microsoft on Monday (Jan. 26) introduced Maia 200, its second-generation custom artificial intelligence (AI) accelerator, as cloud providers increasingly focus on inference, the phase of AI where trained models generate responses for users. Microsoft said the chip is designed specifically to run large models efficiently at scale, a shift that reflects how inference has become the dominant source of ongoing AI compute costs.

    Get the Full Story

    Complete the form to unlock this article and enjoy unlimited free access to all PYMNTS content — no additional logins required.

    yesSubscribe to our daily newsletter, PYMNTS Today.

    By completing this form, you agree to receive marketing communications from PYMNTS and to the sharing of your information with our sponsor, if applicable, in accordance with our Privacy Policy and Terms and Conditions.

    In its official announcement, Microsoft said Maia 200 delivers up to three times higher inference performance than competing chips from Amazon and Google on certain internal benchmarks. The company framed the launch as part of a broader effort to lower the cost of running AI services such as Copilot and large language models across Azure.

    The move places Microsoft more squarely in the race among hyperscalers to design proprietary silicon, a strategy aimed at improving efficiency, reducing dependence on third-party chip suppliers and gaining tighter control over AI economics as usage scales.

    What Maia 200 Is Designed to Do

    Maia 200 is built specifically for AI inference. Training typically requires large bursts of compute during development, while inference represents a continuous operational expense as models respond to user prompts in real time. Microsoft’s design choices reflect that distinction.

    Microsoft mentioned that Maia 200 is optimized for low-precision inference formats, particularly FP4 and FP8, which are widely used to reduce compute and energy requirements while maintaining acceptable output quality for many large models. Microsoft said the chip delivers up to 10 petaFLOPS of FP4 performance, alongside improved memory bandwidth to keep data close to compute and reduce latency.

    In simpler terms, the goal is to process more AI queries faster and at lower cost. Microsoft said Maia 200 offers roughly 30% better performance per dollar compared with the hardware it previously relied on for inference workloads in its data centers. Those savings become meaningful as AI usage grows across enterprise applications and consumer-facing services.

    Advertisement: Scroll to Continue

    Microsoft has deployed Maia 200 first in its U.S. Central data center near Des Moines, with subsequent rollouts planned for other regions, including Phoenix. The company has also released development tools to allow software teams to optimize workloads for the chip, signaling that Maia 200 is intended to support a growing portion of Microsoft’s AI infrastructure rather than a limited internal experiment.

    Performance Claims and the Inference Race

    Microsoft’s most attention-grabbing claim is that Maia 200 delivers up to three times higher inference performance than Amazon’s third-generation Trainium chip and higher FP8 performance than Google’s latest TPU. However, the scope of that comparison is narrow and warrants careful framing.

    As Microsoft stated, the comparisons are based on internal benchmarks focused on low-precision inference workloads. The company does not claim superiority across all AI tasks, nor does it extend the three-times figure to every competing chip or use case.

    LiveScience attributed the performance comparisons directly to Microsoft and noted that the claims apply to specific inference scenarios rather than general-purpose AI performance.

    The Decoder also emphasized that Microsoft has not released third-party validation of the results. There are currently no standardized benchmark results, such as MLPerf submissions, that independently compare Maia 200 with Amazon or Google chips across real-world workloads.

    This distinction matters. Amazon and Google have spent years refining their own custom AI processors and continue to invest heavily in inference optimization. Performance can vary significantly depending on model architecture, precision format and deployment environment.

    The Maia 200 launch also reflects Microsoft’s broader vertical-integration strategy. As AI demand strains global chip supply and drives up costs, owning more of the hardware stack gives cloud providers greater control over pricing, performance and product road maps.

    For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.