[wp_tech_share]

Across hyperscalers and sovereign clouds alike, the race is shifting from just model supremacy to infrastructure supremacy. The real differentiation is now in how efficiently GPUs can be interconnected and utilized. As AI clusters scale beyond anything traditional data center networking was built for, the question is no longer how fast can you train? but can your network keep up? This is where emerging architectures like Optical Circuit Switches (OCS) and Optical Cross-Connects (OXC), a technology used in wide area networks for decades, enter the conversation.

The Network is the Computer for AI Clusters

The new age of AI reasoning is ushering in three new scaling laws—spanning pre-training, post-training, and test-time scaling—that together are driving an unprecedented surge in compute requirements. At GTC 2025, Jensen Huang stated that demand for compute is now 100× higher than what was predicted just a year ago. As a result, the size of AI clusters is exploding, even as the industry aggressively pursues efficiency breakthroughs—what many now refer to as the “DeepSeek moment” of AI deployment optimization.

As the chart illustrates, AI clusters are rapidly scaling from hundreds of thousands of GPUs to millions of GPUs. Over the next five years, the expectation is that there will be about 124 gigawatts of capacity to be brought online, or an equivalent of more than 70 million GPUs to be deployed. In this reality, the network will play a key role in connecting those GPUs in the most optimized, efficient way. The network is the computer for AI clusters.

 

Challenges in Operating Large Scale AI Clusters

As shown in the chart above, the number of interconnects scales exponentially with the number of GPUs. This rapid increase drives significant cost, power consumption, and latency. It is not just the number of interconnects that is exploding—the speed requirements are rising just as aggressively. AI clusters are fundamentally network-bound, which means the network must operate at nearly 100 percent efficiency to fully utilize the extremely expensive GPU resources.

Another major factor is the refresh cadence. AI back-end networks are refreshed roughly every two years or less, compared to about five years in traditional front-end enterprise environments. As a result, speed transitions in AI data centers are happening at almost twice the pace of non-accelerated infrastructure.

Looking at switch port shipments in AI clusters, we expect the majority of ports in 2025 will be 800 Gbps. By 2027, the majority will have transitioned to 1.6 Tbps, and by 2030, most ports are expected to operate at 3.2 Tbps. This progression implies that the data center network’s electrical layer will need to be replaced at each new bandwidth generation—a far more aggressive upgrade cycle than what the industry has historically seen in front-end, non-accelerated infrastructure.

 

 

The Potential Role of OCS in AI Clusters

Optical Circuit Switches (OCS) or Optical Cross-Connects (OXC) are network devices that establish direct, light-based optical paths between endpoints, bypassing the traditional packet-switched routing pipeline to deliver near-zero-latency connectivity with massive bandwidth efficiency. Google was the first major hyperscaler to deploy OCS at scale nearly a decade ago, using it to dynamically rewire its data center topology in response to shifting workload patterns and to reduce reliance on power-hungry electrical Ethernet fabrics.

A major advantage of OCS is that it is fundamentally speed-agnostic—because it operates entirely in the optical domain, it does not need to be upgraded each time the industry transitions from 400 Gbps to 800 Gbps to 1.6 Tbps or beyond. This stands in stark contrast to traditional electrical switching layers, which require constant refreshes as link speeds accelerate. OCS also eliminates the need for optical-electrical-optical (O-E-O) conversion, enabling pure optical forwarding, that not only reduces latency but also dramatically lowers power consumption by avoiding the energy cost of repeatedly converting photons to electrons and back again.

The combined benefit is a scalable, future-proof, ultra-efficient interconnect fabric that is uniquely suited for AI and high-performance computing (HPC) back-end networks, where east-west traffic is unpredictable and bandwidth demand grows faster than Moore’s Law. As AI workload intensity surges, OCS is being explored as a way to optimize the network.

 

OCS is a Proven Technology

Using an OCS in a network is not new. It was, however, called by different names over the past three decades: OOO Switch, all-optical switch, optical switch, and optical cross-connect (OXC). Currently, the most popular term for these systems used in data centers is OCS.

It has been used in the wide area network (WAN) for many years to solve a similar problem set. And for many of the same reasons, tier-one operators worldwide have addressed it through the strategic use of OCSs. Hence, OCSs have been used in carrier networks by operators with the strictest performance and reliability requirements for over a decade. Additionally, the base optical technologies, both MEMS and LCOS, have been widely deployed in carrier networks and have operated without fault for even longer. Stated another way, OCS is based on field-proven technology.

Whether used in a data center or to scale across data centers, an OCS offers several benefits that translate into lower costs over time.

To address the specific needs for AI data centers, companies have launched new OCS products. The following is a list of the products available in the market:

 

Final Thought

AI infrastructure is diverging from conventional data center design at an unprecedented pace, and the networks connecting GPUs must evolve even faster than the GPUs themselves. OCS is not an exotic research architecture; it is a proven technology that is ready to be explored and considered for use in AI networks as a way to differentiate and evolve them to meet the stringent requirements of large AI clusters.

[wp_tech_share]
From NVIDIA’s 800Vdc power architecture to the open Deschutes CDU standard, this year’s OCP Summit highlighted breakthroughs across the full spectrum of power, cooling, and rack technologies shaping AI data centers.

 

The Open Compute Project (OCP), founded in 2011 to promote open, efficient data center design, has become the leading forum shaping AI‑era infrastructure. Now a focal point for next‑generation discussions on power, cooling, and rack and server architecture, its annual Global Summit was held last week in San Jose, Calif., drawing more than 10,000 participants. The non‑profit’s reach continues to expand through new subprojects that broaden its scope across data center systems. The clearest signal of its growing influence came with the announcement that NVIDIA would join its board—a move underscoring how even the industry’s pace‑setter sees value in aligning more closely with the organization.

Among the most pivotal technological developments, NVIDIA provided deeper detail on its 800Vdc power distribution architecture for data centers, adding substance to a disruptive concept first hinted at in a May blog post. This triggered a wave of announcements from power and component suppliers: Vertiv previewed new products expected next year; Eaton introduced a new reference design; Flex expanded its AI infrastructure platform; Schneider Electric unveiled an 800Vdc sidecar rack; ABB announced new DC power products leveraging its solid‑state expertise; Legrand deepened its focus on OCP‑based power and rack solutions; and Texas Instruments introduced new power management chips.

Comparison between current (top) and proposed 800 Vdc power architecture (bottom) in May 2025 (Source: NVIDIA blog)
Comparison between current (top) and proposed 800 Vdc power architecture (bottom) in May 2025 (Source: NVIDIA blog)

 

Comparison between current and proposed 800 Vdc power architectures in October 2025 (Source: NVIDIA blog)
Comparison between current and proposed 800 Vdc power architectures in October 2025 (Source: NVIDIA blog)

 

After years of liquid cooling dominating headlines as the defining innovation in data center design, power distribution has now taken center stage. Roadmaps point to accelerated compute racks exceeding 500 kW per cabinet, introducing new challenges for delivering power efficiently to AI clusters. NVIDIA’s proposed solution marks a decisive break from conventional 415/480 V AC layouts, moving toward a higher-voltage DC (800 Vdc) bus spanning the whitespace and fed directly from a single step‑down switchgear integrated with a solid‑state transformer connected to utility and microgrid systems.

This transition represents a major architectural shift, though it will unfold gradually. Hybrid deployments bridging existing AC systems with 800 Vdc designs are expected to dominate in the coming years. These transitional architectures will rely on familiar 415/480 Vac power distribution feeding whitespace sidecar units, which will step up and rectify voltage to 800 Vdc, in order to supply adjacent high‑performance racks.

Despite speculation that UPS systems, PDUs, power shelves, and BBUs may become obsolete, these interim designs will continue to sustain demand for such equipment for the foreseeable future. Until 2027, when Rubin Ultra chips are expected to reach the market, greater clarity around the end‑state architecture should emerge, and collaboration across the ecosystem will bring novel solutions to market. Significant progress is expected in the design and scalable manufacturing of solid‑state transformers (SSTs), DC breakers, on‑chip power conversion and other solutions enabling purpose‑built AI factories to fully capitalize on the efficiency of these new architectures.

Many of these technologies are already under development. ABB’s DC circuit breaker portfolio, while rooted in industrial applications, provides a solid foundation but must evolve to meet the needs of a new customer segment, alongside its solid‑state MV UPS offering. Vertiv and Schneider Electric—industry heavyweights whose announcements offered only high‑level previews of future solutions—are accelerating product development to address these evolving requirements and still have ample time to do so. Eaton stood out as one of the few vendors demonstrating a functional power sidecar unit at OCP, showcasing tangible progress in this emerging architecture and reinforcing its position through expertise in SSTs gained from the acquisition of Resilient Power.

While suppliers are expected to adapt swiftly to new demands, regulatory bodies responsible for guiding the design and safe operation of power solutions, such as the NFPA, often move at a slower pace than the market. Codes and standards will need to evolve accordingly, and uncertainty in this area could become a key obstacle to the broader adoption of cutting-edge higher-voltage designs.

Although power has dominated recent discussions, liquid cooling sessions remained highly popular at OCP. I even found myself standing in a packed room for what I assumed would be a niche discussion on turbidity and electrical conductivity measurements in glycol fluids. Yet, the most significant development in this area was the introduction of the open‑standard Deschutes CDU. With the new specification expected to attract additional entrants to the market, our preliminary research—initially counting just over 40 CDU manufacturers—has quickly become outdated, with over 50 companies now in our mapping. However, new entrants continue facing the same challenges: while a CDU may appear to be just pipes, pumps, and filters, the true differentiation lies in system design expertise and intelligent controls—capabilities that remain difficult to replicate.

CDUs following Deschutes design showcased at OCP by Boyd and Envicool (Source: Dell’Oro Group)
CDUs following Deschutes design showcased at OCP Global Summit’25 by Boyd and Envicool (Source: Dell’Oro Group)

 

These trends underscore OCP’s growing role as the launchpad for the next generation of data center design, bringing breakthrough technologies to the forefront. This year’s discussions—from higher-voltage DC power to open liquid cooling—are shaping the blueprint for the next generation of AI factories. These architectures point toward a new model for hyperscale infrastructure, the result of collaboration among hyperscalers themselves, chipmakers, infrastructure specialists, and system integrators. Much remains in flux, with further developments expected leading into SC25 and NVIDIA GTC 2026. Stay tuned, and connect with us at Dell’Oro Group to explore our latest research or discuss these trends defining the data center of the future.

[wp_tech_share]

With around 40 vendors rushing into coolant distribution units, liquid cooling is surging—but how many players can the market sustain?

The AI supercycle is not just accelerating compute demand—it’s transforming how we power and cool data centers. Modern AI accelerators have outgrown the limits of air cooling. The latest chips on the market—whether from NVIDIA, AMD, Google, Amazon, Cerebras, or Groq—all share one design assumption: they are built for liquid cooling. This shift has catalyzed a market transformation, unlocking new opportunities across the physical infrastructure stack.

While the concept of liquid cooling is not new—IBM was water-cooling System/360 mainframes in the 1960s—it is only now, in the era of hyperscale AI, that the technology is going truly mainstream. According to Dell’Oro Group’s latest research, the Data Center Direct Liquid Cooling (DLC) market surged 156 percent year-over-year in 2Q 2025 and is projected to reach close to $6 billion by 2029, fueled by the relentless growth of accelerated computing workloads.

As with any fast-growing market, this surge is attracting a flood of new entrants, each aiming to capture a piece of the action. Oil majors are introducing specialized cooling fluids, and thermal specialists from the PC gaming world are pivoting into cold plate solutions. But one product category in particular has become a hotbed of competition: coolant distribution units (CDUs).

 

What’s a CDU and Why Does It Matter?

CDUs act as the hydraulic heart of many liquid cooling systems.

Sitting between facility water and the cold plates embedded in IT systems, these units regulate flow, pressure, and temperature, while providing isolation, monitoring, and often redundancy.

As direct-to-chip liquid cooling becomes a design default for high-density racks, the CDU becomes a mission-critical mainstay for modern data centers.

 

At Dell’Oro, we have been tracking this market from its early stages, anticipating the shift of liquid cooling from niche to necessity. Our ongoing research has already identified around 40 companies with CDUs within their product portfolios, ranging from global powerhouses to nimble specialists. The sheer number of players raises an important question: is the CDU market becoming overcrowded?

 

Who is currently in the CDU market?

The CDU market is being shaped by players from a wide variety of backgrounds. Some excel in rack system integration, others in high-performance engineering, and others in manufacturing and scalability prowess. The variety of approaches reflects the diversity of the players themselves—each entering the market from a different starting point, with distinct technical DNA and go-to-market strategies.

Below is a snippet of our CDU supplier map—only a sample of our research to be featured in Dell’Oro’s upcoming Data Center Liquid Cooling Advanced Research Report, expected to be published in 4Q 2025. Our list of CDU vendors is constantly refreshed—it has only been three weeks since the latest launch by a major player, with Johnson Controls announcing its new Silent-Aire series of CDUs.

Not all companies in this list have arrived here organically. The momentum in the CDU market has also fueled a wave of M&A and strategic partnerships. Unsurprisingly, the largest moves have been led by physical infrastructure giants eager to secure a position, as was the case with Vertiv’s acquisition of CoolTera in December 2023 and Schneider Electric’s purchase of Motivair in October 2024.

Beyond these headline deals, several diversified players have taken stakes in thermal specialists—for example, Samsung’s acquisition of FläktGroup and Carrier’s investment in two-phase specialist Zutacore. Private equity has also entered the fray, most notably with KKR’s acquisition of CoolIT. Together, these moves underscore the growing strategic importance of CDU capabilities, even if not every partnership is directly tied to them.

 

Who will win in the CDU market?

Our growth projections are robust, and there is room for multiple vendors to thrive. In the short to medium term, we still expect to see new entrants. Innovators are likely to emerge, developing technologies to address the relentless thermal demands of AI workloads, while nimble players will be quick to capture share in underserved geographies and verticals. Established names such as Vertiv, CoolIT, or Boyd will need to maintain their edge as data center designs and market dynamics evolve.

By the end of the decade, we expect the supply landscape to consolidate as the market matures and capital shifts toward other growth segments. Consolidation and exits are inevitable. We expect fewer than 10 vendors to ultimately capture the lion’s share of the market, with the remainder assessing the minimum scale neede d to operate sustainably while meeting shareholder expectations—or exiting altogether.

Who will win? There is no single path to success, as data center operators and their applications remain highly diverse. For instance, some had forecast the demise of the in-rack CDU as a subscale solution misaligned with soaring system capacity requirements. Many operators, however, continue to find value in this form factor. Slightly lower partial power usage effectiveness (pPUE) can be offset by advantages in modularity, ease of off-site rack integration and commissioning, and containment of faults and leaks.

Similarly, liquid-to-air (L2A) systems were often described as a transitional technology destined to be quickly superseded by more efficient liquid-to-liquid (L2L) solutions. Yet L2A CDUs have maintained a role even with large operators—ideal for retrofit projects in sites heavily constrained by legacy design choices, with accelerated computing racks operating alongside conventional workloads.

In-rack CDUs, L2A solutions, and other design variations will continue to play a role in a market that is rapidly evolving. GPU requirements are rising year after year, and liquid cooling systems are advancing in step with the capacity demands of next-generation AI clusters. Amid this market flux, several factors are emerging as critical for success.

First, CDUs are not standalone equipment: they are an integral element of a cooling system. Successful vendors take a system-level approach, anticipating challenges across the deployment and leveraging the CDU as hardware tightly integrated with multiple elements to ensure seamless operation. Vendors with proven track records and large installed bases—spanning multiple gigawatts—enjoy an advantage in this regard, as their experience positions them to function as a partner and advisor to their customers, rather than a mere vendor.

Second, success is not just about having the right product—it is about understanding the problem the customer needs solved and developing suitable solutions. Operators face diverse challenges, and a single fleet may need everything from small in-rack CDUs to customized L2A units or even fully skidded multi-megawatt systems. Breadth of portfolio helps hedge across deployment types, but it is not the only path to success. Vendors with a sharp edge in specific technologies can also capture meaningful share.

Lastly, scale and availability are often decisive. As builders race to deliver more compute capacity, short equipment lead times can create opportunities for nimble challengers. Availability goes beyond hardware—it also requires skilled teams to design, commission, and maintain CDUs across global sites, including remote locations outside traditional data center hubs.

As the market evolves, one key question looms: which vendors will adapt and emerge as leaders in this critical segment of the AI infrastructure stack? The answer will shape not just the CDU landscape, but the broader liquid cooling market. We will be following this closely in Dell’Oro’s upcoming Data Center Liquid Cooling Advanced Research Report, expected in 4Q 2025, in which we provide deeper analysis into these dynamics and the broader liquid cooling ecosystem.

[wp_tech_share]

NVIDIA recently introduced fully integrated systems, such as the GB200/300 NVL72, which combine Blackwell GPUs with Grace ARM CPUs and leverage NVLink for high-performance interconnects. These platforms showcase what’s possible when the CPU–GPU connection evolves in lockstep with NVIDIA’s accelerated roadmap. As a result, ARM achieved a 25 percent revenue share of the server CPU market in 2Q25, with NVIDIA representing a significant portion due to strong adoption by major cloud service providers.

However, adoption of such proprietary systems may not reach its full potential in the broader enterprise market, as many customers prefer the flexibility of open ecosystems and established CPU vendors that the x86 architecture offers. Yet the performance of GPU-accelerated applications on x86 has long been constrained by the pace of the PCIe roadmap for both scale up and scale out connectivity. While GPUs continue to advance on an 18-month (or shorter) cycle, CPU-to-GPU communication over PCIe has progressed more slowly, often limiting system-level GPU connectivity.

The new Intel–NVIDIA partnership is designed to close this gap. With NVLink Fusion available on Intel’s x86 platforms, enterprises can scale GPU clusters on familiar infrastructure while benefiting from NVLink’s higher bandwidth and lower latency. In practice, this brings x86 systems much closer to the scalability of NVIDIA’s own NVL-based rack designs, without requiring customers to fully commit to a proprietary stack.

For Intel, the agreement ensures continued relevance in the AI infrastructure market despite the lack of a competitive GPU portfolio. For server OEMs, it opens up new design opportunities: they can pair a customized Intel x86 CPUs with NVIDIA GPUs in a wider range of configurations—creating more differentiated offerings from individual boards to full racks—while retaining flexibility for diverse workloads.

The beneficiaries of this development include:
  • NVIDIA, which extends NVLink adoption into the broader x86 ecosystem.
  • Intel, which can play a key role in the AI systems market despite lacking a competitive GPU portfolio, bolstered by NVIDIA’s $5 billion investment.
  • Server OEMs, which gain more freedom to innovate and differentiate x86 system designs.
At the same time, there are competitive implications:
  • AMD is unlikely to participate, as its CPUs compete with Intel and its GPUs compete with NVIDIA. The company continues to pursue its own interconnect strategy through UA Link.
  • ARM may see reduced momentum for external enterprise AI workloads if x86 platforms can now support higher GPU scalability. That said, cloud providers may continue to use ARM for internal workloads and could explore custom ARM CPUs with NVLink Fusion.

Ultimately, NVLink Fusion on Intel x86 platforms narrows the gap between systems based on a mainstream architecture and NVIDIA’s proprietary designs. It aligns x86 and GPU roadmaps more closely, giving enterprises a more scalable path forward while preserving choice across CPUs, GPUs, and system architectures.

[wp_tech_share]

AWS’s In-Row Heat Exchanger (IRHX) is a custom-built liquid cooling system designed for its most powerful AI servers—system that initially spooked infrastructure investors, but may ultimately strengthen the vendor ecosystem.

On July 9, 2025, Amazon Web Services (AWS) unveiled its in-house-engineered IRHX, a rack-level liquid-cooling platform engineered to support AWS’s highest-density AI training and inference instances built around NVIDIA’s Blackwell GPUs.

The IRHX comprises three building blocks—a water‑distribution cabinet, an integrated pumping unit, and in‑row fan‑coil modules. In industry shorthand, this configuration is a coolant distribution unit (CDU) flanked by liquid‑to‑air (L2A) sidecars. Direct liquid cooling (DLC) cold‑plates draw heat directly from the chips; the warmed coolant then flows through the coils of heat exchangers, where high‑velocity fans discharge the heat into the hot‑aisle containment before the loop recirculates.

 IRHX: the AWS cooling solution supporting its NVIDIA Blackwell server deployments
IRHX: the AWS cooling solution supporting its NVIDIA Blackwell server deployments (Source: YouTube, https://youtu.be/u81NapG8yL0)

 

Data Center Physical Infrastructure vendors have offered DLC solutions with L2A sidecars for some time now, as a practical retrofit path for operators looking to deploy high-density racks in existing air-cooled environments with minimal disruption. Vertiv offers the CoolChip CDU 70, CoolIT provides a comprehensive line of AHx CDUs, Motivair brands its solution as the Heat Dissipation Unit (HDU™), Boyd sells in-row and in-rack L2A CDUs, and Delta also markets its own L2A options—just to name a few.

“When we looked at liquid cooling solutions based on what’s available in the market today, there were a few trade-offs,” says Dave Brown, VP of Compute & ML Services at AWS. “Between long lead times for building greenfield sites and scalability issues with off-the-shelf solutions, AWS chose to develop its own system. “The IRHX was designed to allow us to scale fast by standardizing our equipment and supply chain, and has been built to spec for our standard rack dimensions to fit within our existing data centers.” (Source: YouTube, https://youtu.be/u81NapG8yL0)

While the AWS approach resembles other industry solutions, it introduces several thoughtful innovations laser-focused on its specific needs. Most off-the-shelf systems integrate the pump and heat exchanger coil in a single enclosure, delivering a self-contained L2A unit for one or more racks. In contrast, AWS separates the pumping unit from the fan-coil modules, allowing a single pumping system to support a large number of fan units. These modular fans can be added or removed as cooling requirements evolve, giving AWS flexibility to right-size the system per row and site.

AWS is no stranger to building its own infrastructure solutions. From custom server boards and silicon (Trainium, Inferentia) to rack architectures and networking gear, the scale of its operations justifies a highly vertical approach. The IRHX follows this same pattern: by tailoring CDU capacity and L2A module dimensions to its own rack and row standards, AWS ensures optimal fit, performance, and deployment speed. In this context, developing a proprietary cooling system isn’t just a strategic advantage—it’s a natural extension of AWS’s vertically integrated infrastructure stack.

Market Implications

What does this mean for Data Center Physical Infrastructure vendors? The market reaction was swift—shares of Vertiv (NYSE: VRT) and Munters (STO: MTRS) dropped the day following AWS’s announcement. We do not view Amazon’s move as a threat to vendors in this space, however.

First, it’s important to recognize that the Liquid Cooling market remains buoyant, with considerable room for growth across the ecosystem. Dell’Oro Group’s latest research showed 144% year-over-year growth in 1Q 2025, and our forecast for the liquid cooling segment remains strong. As long as the AI supercycle continues—and we see little risk of it slowing down—the market is expected to remain healthy.

Second, while AWS is an engineering powerhouse, it rarely develops these solutions in isolation. It typically partners with established vendors to co-design its proprietary systems, which are also manufactured by third parties. IRHX may carry the AWS name, but it is likely being built in the facilities of well-known cooling equipment suppliers. Rather than displacing revenue from infrastructure vendors, the IRHX is expected to reinforce it—these vendors are likely playing a key role in its production, and their topline performance should benefit as a result.

Finally, although Dave Brown has stated that the IRHX “can be deployed in existing data centers as well as new builds,” we don’t expect to see it widely used in greenfield facilities designed from the ground up for artificial intelligence. L2A solutions are ideal for retrofitting sites with available floor space and cooling capacity, offering minimal disruption and lower upfront capex. They remain less efficient than liquid-to-liquid (L2L) systems, however, which are likely to stay the architecture of choice in purpose-built AI campuses designed for maximum thermal efficiency and scale.