Blog Post

Azure Infrastructure Blog
3 MIN READ

Mt Diablo - Disaggregated Power Fueling the Next Wave of AI Platforms

JasonAdrian's avatar
JasonAdrian
Copper Contributor
Oct 15, 2024

Authors:

Jason Adrian – General Manager, Azure Platform Architecture

Laurentiu Olariu – Power Architect, Azure Platform Architecture

Banha Sok – Power Engineer, System Design & Development

 

Hyperscale datacenters are continually evolving and undergoing significant changes, with the rise of AI representing one of the most substantial shifts to date. The introduction of AI systems has brought forth novel challenges and disruptions to the infrastructure that supports hyperscale datacenters. While compute and storage systems for the cloud usually have rack power densities below 20kW, AI systems are increasing rack power to hundreds of kW. To adapt to this fast-changing segment, we began to look at every layer of our infrastructure to optimize for these changes. Our solution is to separate the single rack into an server rack and a power rack, each optimized for its primary function.

 

Figure 1 – AI System Disaggregation

 

This modular methodology allows us to adjust the power in the disaggregated power rack according to the changing demands of different inferencing and training SKUs. Additionally, it facilitates the reuse of this validated design across a variety of silicon solutions.  

 

 

The Evolution of Power Delivery – Mt Diablo

Traditional rack solutions integrate the power and server infrastructure in a single rack, but with Mt. Diablo we are moving all the power conversion into a separate disaggregated power rack. There are several key reasons for adopting disaggregated power in the datacenter:

 

  • Space Optimization: Disaggregated power enables the entire server rack to be used for AI accelerators and scale up network switches to enable larger pods. This optimization is crucial for performance and efficiency, enabling up to 35% more AI accelerators in each server rack.
  • Scalability and Future Proofing: The need for scalability and future-proofing is driven by high-power server racks, which will exceed a few hundred kilowatts and are moving towards a megawatt. With this approach, we can right-size the power shelf count to meet each configuration’s unique needs.
  • Power Conversion Efficiency:  Today’s power solutions convert AC inputs into 48Vdc outputs for distribution to the server trays. To improve efficiency, we can convert to 400Vdc (High Voltage Direct Current or HVDC), monopolar or bipolar, to enable better efficiency relative to the needs of high-power server racks. With 400V we expect improvements and incremental evolution in improved efficiency, like what we have seen in the 48Vdc conversion space.
  • Modular Design: The modular design allows for multiple developments in parallel. This includes HVDC power shelves with specific power supply units (PSUs) that provide HVDC output to a dedicated busbar, cross-rack power distribution to the server rack, in rack energy storage, and AC voltage distribution within the power rack.

 

All of the benefits of disaggregated power highlighted above make this approach a forward-thinking strategy for datacenter infrastructure.

400Vdc & Industry Alignment

While the first disaggregated power racks will use the current 48Vdc ecosystem, the real enhancements come with the 400Vdc power distribution. The high-level proposal for a 400Vdc disaggregated power rack enables an improved solution compared to prior 12Vdc and 48Vdc solutions and aims to encourage industry alignment and commonality in several areas:

  • Connectivity Solutions:  The 400Vdc connection solutions will differ significantly from the previous 12Vdc and 48Vdc solutions, highlighting the need for industry-wide standardization.
  • Power Rack Form Factor/Dimensions: Establishing common dimensions for power racks to ensure compatibility and ease of integration.
  • AC to DC PSU Topology: Addressing the differences between single-phase and three-phase input to create a unified approach.
  • DC to DC Modules in Server Rack: Standardizing the modules used within server racks to ensure consistency and reliability.
  • Redundancy: Defining redundancy configurations, like single feed or dual feed, or N+x power module redundancy to enhance system reliability.
  • Safety Standards: Developing safety standards for 400Vdc distribution and liquid cooled bus solutions to ensure safe operation.
  • Data/Power Management Backplane: Creating a standardized backplane for data and power management, including communication protocols, firmware updates, power control, and failure management.

This alignment aims to streamline the development and deployment of disaggregated power solutions, making it easier for industry to adopt and implement these new technologies, and enables partnerships like Microsoft and Meta that are supporting this initiative.

 

Conclusion

The disaggregated power rack enables scalability and flexibility in a time where innovation and time to market is of paramount importance.  In an effort to move fast and shift the industry to HVDC power distribution, it’s critical to foster a healthy ecosystem and partnerships to drive commonality. This is why we are excited to announce our upcoming contribution of this architectural specification to the OCP community in collaboration with Meta.

 

 

Updated Oct 11, 2024
Version 1.0

5 Comments

  • DavidYu_Eaton's avatar
    DavidYu_Eaton
    Copper Contributor

    Hi Jason,

    This is David from Eaton. Really interesting topic about high rating AI power supply. One question on power conversion in server racks. With 400Vdc/800Vdc voltage feed the server rack, where is the 400V/800Vdc-to-48Vdc power supply unit assembled? Is it assembled in a power shelf like before, or assembled in a new stack/shelf to feed the server module, or assembled in a server module as an on-board power supply unit? Thanks

  • tiffanyhongkong's avatar
    tiffanyhongkong
    Copper Contributor

    I'm writing to you as a postdoctoral researcher with a keen interest in your work on disaggregated power racks for AI. Attending the OCP presentation by Microsoft and Meta this year truly opened my eyes to the potential of 400Vdc architecture.

     

    My research explores the critical link between power delivery and AI accelerator performance, and your advancements in this area could significantly impact my findings. I'm eager to learn more about the anticipated launch date of the Mt. Diablo power rack and the underlying chip technology driving its development ( e.g., Blackwell GB200? )

     

    Could you provide any materials or direct me to relevant resources to further my understanding on this topic? This information would be invaluable to my research.

  • mikec's avatar
    mikec
    Copper Contributor

    Jason - very interesting and I will watch your OCP presentation. Couple of questions:

    1. If the spec is not yet out I would assume we would start seeing these racks in mid 2026. Is that right?
    2. Can you provide a rough timeline on power per rack? For example I think in 2025 there is ORv3 HP at 155W. So in 2026 ?, 2027?, 2028?
    3. In what year do we get to 1MW per rack?
    4. Just doing basic math using NVIDIA NVL72, can I assume 35% more accelerators per rack is going from 72 to 96 accelerators or is that too simple and I should I think about this differently?
  • toddkazmirski's avatar
    toddkazmirski
    Copper Contributor

    Hi Jason,

    I enjoyed your presentation at the OCP 2024 Summit. I'm interested in some of the topology details of the +/-400Vdc system. Will you be publishing any additional information? 

    • JasonAdrian's avatar
      JasonAdrian
      Copper Contributor

      Yes - we are planning a Mt Diablo specification release to OCP in the coming months, which will have a lot more details and specifics.