[wp_tech_share]

Last month was incredibly exciting, to say the least! We had the opportunity to attend two of the most impactful and prominent events in the industry: NVDA’s GTC followed by OFC.

As previously discussed in my pre-OFC show blog, we have been anticipating that AI networks will be in the spotlight at OFC 2024 and will accelerate the development of innovative optical connectivity solutions. These solutions are tailored to address the explosive growth in bandwidth within AI clusters while tackling cost and power consumption challenges. GTC 2024 has further intensified this focus. During GTC 2024, Nvidia announced the latest Blackwell B200 Tensor Core GPU, designed to empower trillion-parameter AI Large Language Models. The Blackwell B200 demands advanced 800 Gbps networking, aligning perfectly with the predictions outlined in our AI Networks for AI Workloads report. With an anticipated 10X traffic growth in AI workloads every two years, these AI workloads are expected to outpace traditional front-end networks by at least two speed upgrade cycles.

While a multitude of topics and innovative solutions were discussed at OFC regarding inter-data center applications as well as compute interconnect for scaling up the number of accelerators within the same domain, this blog will primarily focus on intra-data center applications. Specifically, it will focus on scaling out the network needed to connect various accelerated nodes in large AI clusters with 1000’s of accelerators. This network is commonly referred to in the industry as the ‘AI Back-end Network’ (also referred to; by some vendors; as the network for East-West traffic). Some of the topics and solutions that have been explored at the show are as follows:

1) Linear Drive Pluggable Optics vs. Linear Receive Optics vs. Co-Packaged Optics

Pluggable optics are expected to account for an increasingly significant portion of power consumption at a system level. An issue that will get further amplified as Cloud SPs build their next-generation AI networks featuring a proliferation of high-speed optics.

At OFC 2023, the introduction of Linear Drive Pluggable Optics (LPOs) promising significant cost and power savings through the removal of the DSP, initiated a flurry of testing activities. Fast forward to OFC 2024, we witnessed nearly 20 demonstrations, featuring key players including Amphenol, Eoptolink, HiSense, Innolight, and others. Conversations during the event revealed industry-wide enthusiasm for the high-quality 100G SerDes integrated into the latest 51.2 Tbps network switch chips, with many eager to capitalize on this advancement to be able to remove the DSP from the optical pluggable modules.

However, despite the excitement, the hesitancy from hyperscalers — with the exception of ByteDance and Tencent, who have announced plans to test the technology by end of this year— suggests that LPOs may not be poised for mass adoption just yet. Interviews highlighted hyperscalers’ reluctance to shoulder the responsibility of qualification and potential failure of LPOs. Instead, they express a preference for switch suppliers to handle those responsibilities.

In the interim, early deployments of 51.2 Tbps network chips are expected to continue leveraging pluggable optics, at least through the middle of next year. However, if LPOs can demonstrate safe deployment at mass scale while offering significant power savings for hyperscalers — enabling them to deploy more accelerators per rack — the temptation to adopt may prove irresistible. Ultimately, the decision hinges on whether LPOs can deliver on these promises.

Furthermore, Half-Retimed Linear Optics (HALO), also known as Linear Receive Optics (LROs) were discussed at the show. LRO integrates the DSP chip only on the transmitter side (as opposed to completely removing it in the case of LPOs). Our interviews revealed that while LPOs may proof to be doable at 100G-PAM4 SerDes, they may become challenging at 200G-PAM4 SerDes and that’s when LROs may be needed.

Meanwhile, Co-Packaged Optics (CPOs) remain in development, with large industry players such as Broadcom showcasing ongoing development and progress in the technology. While we believe current LPO and LRO solutions will certainly have a faster time to market with similar promises as CPOs, the latter may eventually become the sole solution capable of enabling higher speeds at some point in the future.

Before closing this section, let’s just not forget that, when possible, copper would be a much better alternative than all of the optical connectivity options discussed above. Put simply, use copper when you can, use optics when you must. Interestingly, liquid cooling may facilitate the densification of accelerators within the rack, enabling increased usage of copper for connecting various accelerator nodes within the same rack. The recent announcement of the NVIDIA GB200 NVL72 at GTC perfectly illustrates this trend.

2) Optical Circuit Switches

OFC 2024 brought some interesting Optical Circuit Switches (OCS) related announcements. OCS can bring many benefits including high bandwidth and low network latency as well as significant capex savings. That is because OCS switches can lead to a significant reduction in the number of required electrical switches within the network which eliminates the expensive optical-to-electrical-to-optical conversions associated with electrical switches. Additionally, unlike electrical switches, OCS switches are speed agnostic and don’t necessarily need to be upgraded when servers adopt next generation optical transceivers.

However, OCS is a novel technology and so far, only Google, after many years in development, was able to deploy it in mass in its data center networks. Additionally, OCS switches may require a change in the installed base of fiber. For that reason, we are still watching to see if any other Cloud SP, besides Google, has any plans to follow suit and adopt OCS switches in the network.

3) The Path to 3.2 Tbps

At OFC 2023, numerous 1.6 Tbps optical components and transceivers based on 200G per lambda were introduced. At OFC 2024, we witnessed further technology demonstrations of such 1.6 Tbps optics. While we don’t anticipate volume shipment of 1.6 Tbps until 2025/2026, the industry has already begun efforts in exploring various paths and options towards achieving 3.2 Tbps.

Given the complexity encountered in transitioning from 100G-PAM4 electrical lane speeds to 200G-PAM4, initial 3.2 Tbps solutions may utilize 16 lanes of 200G-PAM4 within an OSFP-XD form factor, instead of 8 lanes of 400G-PAMx. It’s worth noting that OSFP-XD, which was initially explored and demonstrated two years ago at OFC 2022, may be brought back to action due to the urgency stemming from AI cluster deployments. 3.2 Tbps solutions in OSFP-XD form factor offer superior faceplate density and cost savings compared to 1.6 Tbps. Ultimately, the industry is expected to figure out a way to enable 3.2 Tbps based on 8 lanes of 400G-PAMx SerDes, albeit it may take some time to reach that target.

In summary, OFC 2024 showcased numerous potential solutions aimed at addressing common challenges: cost, power, and speed. We anticipate that different hyperscalers will make distinct choices, leading to market diversification. However, one of the key considerations will be time to market. It’s important to note that the refresh cycle in the AI back-end network is typically around 18 to 24 months, significantly shorter compared to the 5 to 6 years seen in the traditional front-end networks used to connect general-purpose server.

For more detailed views and insights on the Ethernet Switch—Data Center report or the AI Networks for AI Workloads report, please contact us at dgsales@delloro.com.

[wp_tech_share]

Market Overview

The worldwide data center capital expenditure (capex) grew by 4% in 2023, reaching $260 billion, with servers leading all technology areas in revenue (Figure 1). However, this growth rate marked a slowdown from the double-digit growth observed in the previous year. Despite lingering economic uncertainties, the market is poised for growth driven by advancements in accelerated computing for AI applications, and expanding data center footprint.

The growth varied across different categories of data center technology areas.

  • IT infrastructure experienced a decline due to reduced investments in general-purpose servers and storage systems. This decline was attributed to supply issues that occurred in 2022, prompting enterprise customers and resellers to place excess orders, which led to inventory surges and subsequent corrections. Consequently, server shipments declined by 8% in 2023. The demand for general-purpose server and storage system components such as CPUs, memory, storage drives, and NICs, saw a sharp decline in 2023, as the major Cloud Service Providers (SPs) and server and storage system OEMs reduced component purchases in anticipation of weak system demand.
  • In contrast, there was a shift in capex towards accelerated computing. Spending on accelerators, such as GPUs and other custom accelerators, more than tripled in 2023, as the major Cloud SPs raced to deploy accelerated computing infrastructure that is optimized for AI use cases ranging from recommenders to generative AI. Accelerated servers, although comprising a small share of total server volume, command a significant average selling price (ASP) premium, contributing significantly to revenue.
  • Revenues for network infrastructure, consisting mostly of Ethernet switches, showed deceleration throughout 2023 as vendor fulfill back. Modest growth rates observed in the fourth quarter of 2023, reflecting a digestion cycle affecting various vendors and product segments.
  • While the data center physical infrastructure (DCPI) revenues experienced robust double-digit growth in 2023, the market also decelerated in the fourth quarter of 2023. This slowdown was attributed to the diminishing impact of pandemic-induced digitalization and limited price realization from price increases implemented in 2022. However, emerging deployments associated with AI workloads, particularly in retrofitting power distribution and thermal management in existing facilities, provided a marginal contribution to growth.

Data center capex growth varied among customer segments, with Colocation SPs leading in growth due to ongoing momentum in DCPI and global data center footprint expansion. In the Top 4 US Cloud SP segment, Microsoft and Google increased data center investments, particularly in AI infrastructure, while Amazon, underwent a digestion cycle following pandemic-driven expansion. In contrast, the major Chinese Cloud SPs experienced declines in data center capex due to economic, regulatory, and demand challenges. Enterprise data center spending also declined modestly in 2023, reflecting weakening demand amid economic uncertainties and digestion.

 

Vendor Landscape

Below are some vendor highlights in the key technology areas we track:

  • In the Server market, Dell led in revenue share, followed by HPE and IEIT Systems. Excluding white box server vendors, revenue for original equipment manufacturers (OEMs) declined by 10% in 2023, with lower server unit volumes attributed to economic uncertainties and excess channel inventory. However, some vendors experienced revenue growth through shifts in product mix towards accelerated platforms or general-purpose servers with the latest CPUs from Intel and AMD.
  • The Storage System market witnessed a 7% decline in revenue in 2023, with Dell leading in revenue share, followed by Huawei and NetApp. Huawei was the only major vendor to achieve growth, driven by success in adopting the latest all-flash arrays among enterprise customers.
  • In the Ethernet Data Center Switch market, Arista surpassed Cisco in the fourth quarter, although Cisco maintained its position as the market leader for the entirety of 2023. Cisco’s sales were boosted by substantial backlogged shipments earlier in the year. However, demand tapered off later as both cloud service providers and enterprise customers underwent a period of digestion. Meanwhile, Arista experienced remarkable revenue growth, outpacing the market due to its robust presence at Meta and Microsoft, both of which demonstrated significant network spending throughout 2023.
  • In the DCPI market, Schneider Electric held onto the top market share ranking in 2023. Vertiv maintained the number two market position, but gained meaningful share and is now challenging Schneider Electric for the top market share position. Eaton rounds out the top three DCPI vendors. All three companies experienced double-digit revenue growth for the full year.

 

2024 Outlook

Looking ahead to 2024, the Dell’Oro Group forecasts a double-digit increase in worldwide data center capex, driven by increased server demand and average selling prices (Figure 2). Accelerated computing adoption is expected to continue, supported by new GPU platform releases from NVIDIA, AMD, and Intel. Growth in network infrastructure and DCPI revenues will depend on organic investments rather than supply chain-induced backlog or price increases. Recent recovery in server and storage component markets for CPUs, memory and storage drives is signaling the potential for increased system demand later this year. Dell’Oro Group projects moderate growth for the Top 4 US Cloud SPs in data center capex, while the Top 4 China-based cloud SPs are expected to undergo a cautious recovery. Additionally, enterprise and rest-of-cloud segments may be sensitive to macroeconomic conditions, with potential upside opportunities in AI-related investments.

[wp_tech_share]

Greetings! Prior to delving into an evaluation of our data center predictions for 2024, allow me to first revisit some of the prominent trends I emphasized in the 2023 predictions blog.

    • Data center capex growth in 2023 has decelerated noticeably, as projected after a surge of spending growth in 2022. The top 4 US cloud service providers (SPs) in aggregate slowed their capex significantly in 2023, with Amazon and Meta undergoing a digestion cycle, while Microsoft and Google are on track to increase their greenfield spending on accelerated computing deployments and data centers. The China Cloud SP market remains depressed as cloud demand remains soft from macroeconomic and regulatory headwinds, although there are signs of a turnaround in second-half 2023 from AI-related investments. The enterprise server and storage system market performed worse than expected, as most of the OEMs are on track to experience a double-digit decline in revenue growth in 2023 from a combination of inventory correction, and lower end-demand given the economic uncertainties. However, network and physical infrastructure OEMs have fared better in 2023 because strong backlog shipments fulfilled in the first half of 2023 which lifted revenue growth.
    • We underestimated the impact of accelerated computing investments to enable AI applications in 2023. During that year, we saw a pronounced shift in spending from general-purpose computing to accelerated computing and complementary equipment for network and physical infrastructure. AI training models have become larger and more sophisticated, demanding the latest advances in accelerators such as GPUs and network connectivity. The high cost of AI-related infrastructure that was deployed helped to offset the sharp decline in the general-purpose computing market. However, supplies on accelerators have remained tight, given strong demand from new hyperscale.
    • General-purpose computing has taken a backseat to accelerated computing in 2023, despite significant CPU refreshes from Intel and AMD with their fourth-generation processors. These new server platforms feature the latest in server interconnect technology, such as PCIe 5, DDR5, and more importantly CXL. CXL has the ability to aggregate memory usage across servers, improving overall utilization. However, general-purpose server demand has been soft, and the transition to the fourth-generation CPU platforms has been slower than expected (although AMD made significant progress in 3Q23). Furthermore, CXL adoption is limited to the hyperscale market, with limited use cases.
    • Server connectivity is advancing faster than what we had expected a year ago. In particular, accelerated computing is on a speed transition cycle at least a generation ahead of the mainstream market. Currently, accelerated servers with NVIDIA H100 GPUs feature network adapters at up 400 Gbps with 112 Gbps SerDes, and bandwidth will double in the next generation of GPUs a year from now. Furthermore, Smart NIC adoption continues to gain adoption, though, mostly limited to the hyperscale market. According to our Ethernet Adapter and Smart NIC report, Smart NIC revenues increased by more than 50% in 2023.
    • The edge computing market has been slow to materialize, and we reduced our forecast in the recent Telecom Server report, given that the ecosystem and more compelling use cases need to be developed, and that additional adopters beyond the early adopters have been limited.

According to our Data Center IT Capex report, we project data center capex to return to double-digit growth in 2024 as market conditions normalize. Accelerated computing will remain at the forefront of capex plans for the hyperscalers and enterprise market to enable AI-related and other domain specific workloads. Given the high cost of accelerated servers and their specialized networking and infrastructure requirements, the end-users will need to be more selective in their capex priorities. While deployments of general-purpose servers are expected to rebound in 2024, we believe greater emphasis will be made to increase server efficiency and utilization, while curtailing cost increases.

Below, we highlight key trends that can enhance the optimization of the overall server footprint and decrease the total cost of ownership for end-users:

Accelerated Computing Maintains Momentum

We estimate that 11% of the server unit shipments are accelerated in 2023, and are forecast to grow at a five-year compound annual growth rate approaching 30%. Accelerated Servers contain accelerators such as GPUs, FPGAs, or custom ASICs, and are more efficient than general-purpose servers when matched to domain-specific workloads. GPUs will likely remain as the primary choice for training large AI models, as well as running inference applications. While NVIDIA currently has a dominant share in the GPU market, we anticipate that other vendors such as AMD and Intel will gain some share over time as customers seek greater vendor diversity. Greater choices in the supply chain could translate to much-needed supply availability and cost reduction to enable sustainable growth of accelerated computing. Refer to our Data Center IT Semiconductors & Components report for more insights on the accelerator and server component market.

Advancements in Next-Generation Server Platform

General-purpose servers have been increasing in compute density, as evolution in the CPUs is enabling servers with more processor cores per CPU, memory, and bandwidth. Ampere Computing Altra Max, AMD’s Bergamo are offered with up to 128 cores per processor, and Intel’s Granite Rapids (available later this year), will have a similar number of cores per processor. Less than seven years ago, Intel’s Skylake CPUs were offered with a maximum of 28 cores. The latest generation of CPUs also contains onboard accelerators that are optimized for AI inference workloads.

Lengthening of the Server Replacement Cycle

The hyperscale cloud SPs have lengthened the replacement cycle of general-purpose servers. This measure has the impact of reducing the replacement cost of general-purpose servers over time, enabling more capex to be allocated to accelerated systems.

Disaggregation of Compute, Memory, and Storage

Compute and storage have been disaggregated in recent years to improve server and storage system utilization. We believe that next-generation rack-scale architectures based on CXL will enable a greater degree of disaggregation, benefiting the utilization of compute cores, memory, and storage.

[wp_tech_share]

As we enter the new year, it’s a great opportunity to reflect on 2023 and assess what’s in store for 2024.

Looking back at our 2023 DCPI predictions, we anticipated that macroeconomic uncertainty would not lead to a DCPI recession in 2023. We also foresaw that power availability would challenge data centers to rethink energy storage and on-site power generation. Both proved to be true.

Through 3Q23, DCPI revenues have grown at double-digit rates, surpassing our expectation for 2023. Power availability also became a widespread topic of conversation, with battery energy storage systems (BESS), fuel cells, and small modular reactors (SMRs) all increasingly viewed as options to address future power availability challenges.

We also predicted a 10MW immersion cooling deployment from a top cloud service provider; however, this did not happen. Smaller scale deployments and proof of concepts occurred, but larger scale deployments require continued growth in ecosystem support, new environmentally friendly immersion fluids, and increased end-user operational readiness.

Yet, the most impactful and exciting development of 2023 in the data center industry should come as no surprise at this point, the proliferation of generative AI. This has set the stage for a profound transformation in the DCPI market. The impact will be felt for years to come, and we expect to see the following three trends this year:

  1. Normalizing order cycle to lead to slow start for DCPI market in 2024

After back-to-back years of double-digit growth, which has not been the norm over the past decade, DCPI revenue growth is forecast to moderate in 2024, pronounced in the first half of the year. This moderation is attributed to supply chain constraints that delayed unit shipments in 2022, creating unseasonably strong growth in the first half of 2023. Not only does this create tough year-over-year comparisons for 1H24, but abated supply chain constraints mean end-users’ ordering patterns are normalizing, shifting towards the second half of the year.

Additionally, while DCPI vendor backlogs haven’t meaningfully declined, the contents of those backlogs have changed. Order associated with traditional computing workloads have returned to more normal levels, while backlogs for AI-related DCPI deployments are growing. However, these AI-related DCPI deployments need additional time to materialize.

  1. Purpose-built AI facilities will begin to materialize in 2H24

After a slow first half of the year, growth is forecast to accelerate during the second half of 2024. We anticipate that this growth will be driven by new facilities purpose-built for AI workloads, starting to materialize from the top cloud service providers. These facilities are expected to demand 100s of MWs each, pushing rack power densities from 10 – 15 kW/rack today to 80 – 100 kW/rack to support power hungry accelerated servers.

This requires significant investments in higher ampacity power distribution and thermal management, specifically liquid cooling. We expect the majority of this liquid cooling to materialize in the form of Direct Liquid Cooling and air-assisted Rear Door Heat Exchangers (RDHx). This is due to the familiarity of end-users deploying IT infrastructure in the vertical rack form factor and existing ecosystem support, alongside the performance and sustainability benefits. We plan to provide more detail on liquid cooling in our upcoming Advanced Research Report, Data Center Liquid Cooling,’ scheduled to publish in 2Q24.

  1. Changes in GHG Protocol accounting will add pressure to data center sustainability

The data center industry is on a rapid growth trajectory, a trend further accelerated by the growth of AI workloads. However, this surge has raised concern about a potential for alarming growth in greenhouse gas (GHG) emissions. This has attracted attention to the data center industry, to which the industry has responded with commitments to grow sustainably.

To help measure and assess progress here, many within the data center ecosystem report on carbon emissions following the GHG Protocol Corporate Accounting and Reporting Standard. GHG Protocol recently began working on updates to the standards that may significantly impact data center Scope 2 emissions, or indirect emissions generated from the purchase of electricity. Historically, data center owners and operators have been able to limit these emissions through power purchase agreements (PPAs) and renewable energy certificates (RECs) offsets. However, these offsets no longer have the shiny appeal they once did. That’s because the burden a data center has on its local power grid and community may not align with the benefits from the offsets.

We expect the updates from GHG Protocol to address this issue, and become introduce more granularity and stringency in Scope 2 emissions accounting. This may make sustainability claims related to Scope 2 emissions more difficult to make, but much more meaningful. A draft version of these updates is expected in 2024, with the final standards slated for release in 2025. These changes will set the stage for the sustainability claims the data center industry can make in the second half of this decade.

[wp_tech_share]

Happy New Year! As usual, we’re excited to start the year by reflecting on the developments in the Ethernet data center switch market throughout 2023 and exploring the anticipated trends for 2024.

First, looking back at 2023, the market performed largely in line with our expectations as outlined in our 2023 prediction blog published in January of last year. As of January 2024, data center switch sales are set to achieve double-digit growth in 2023, based on the data collected up to the 3Q23 period. Shipments of 200/400 Gbps nearly doubled in 2023. While Google, Amazon, Microsoft, and Meta continue to dominate deployments, we observed a notable increase in 200/400 Gbps port shipments destined toward Tier 2/3 Cloud Service Providers (SPs) and large enterprises. In the meantime, 800 Gbps deployments remained sluggish throughout 2023, with expectations for acceleration in 2024. Unforgettably, 2023 marked a transformative moment in the history of AI with the emergence of generative AI applications, propelling meaningful impact and changes on modern data center networks.

Now as we look into 2024, below are our top 3 predictions for the year:

1. The Data Center Switch market to slow down in 2024

Following three consecutive years of double-digit growth, the Ethernet data center switch market is expected to slow down in 2024 and grow at less than half the rate of 2023. We expect 2024 sales performance to be suppressed by normalization of backlog, digestion of existing capacity, and optimization of spending caused either by macroeconomic conditions or a shift in focus to AI and budgets diverted away from traditional front-end networks used to connect general-purpose servers.

2. The 800 Gbps adoption to significantly accelerate in 2024

We predict 2024 to be a tremendous year for 800 Gbps deployments, as we expect a swift adoption of a second wave of 800 Gbps (based on 51.2 Tbps chips) from a couple of large Cloud SPs. The first wave of 800 Gbps (based on 25.6 Tbps chips) started back in 2022/2023 but has been slow as it has been adopted only by one Cloud SP. In the meantime, we expect 400 Gbps port shipments to continue to grow as 51.2 Tbps chips will also enable another wave of 400 Gbps adoption. We expect 400 Gbps/800 Gbps speeds to achieve more than 40% penetration by 2027 in terms of port volume.

3. AI workloads to drive new network requirements and to expand the market opportunity for both Ethernet and InfiniBand

The enormous appetite for AI is reshaping the data center switch market.  Emerging generative AI applications deal with trillions of parameters that drive the need for thousands or even hundreds of thousands of accelerated nodes. To connect these accelerated nodes, there is a need for a new fabric, called the AI back-end network, which is different from the traditional front-end network mostly used to connect general-purpose servers. Currently, InfiniBand is dominating the AI back-end networks but Ethernet is expected to gain significant share over the next five years. We provide more details about the AI back-end network market in our recently published Advanced Research Report: ‘AI Networks for AI Workloads.’ Among many other requirements, AI back-end networks will accelerate the migration to high speeds. As noted in the chart below, the majority of switch ports in AI back-end networks are expected to be 800 Gbps by 2025 and 1600 Gbps by 2027.

Migration to High-speeds in AI Clusters (AI Back-end Networks)

For more detailed views and insights on the Ethernet Switch—Data Center report or the AI Networks for AI Workloads report, please contact us at dgsales@delloro.com.