Dell’Oro Group projects that the spend on accelerated compute servers targeted to artificial intelligence (AI) workloads will reach double-digit growth over the next five years, outpacing other data center infrastructure. An accelerated compute server equipped with accelerators such as a GPU, FPGA, or custom ASIC can generally handle AI workloads with much greater efficiency than general purpose (without accelerators) servers. Numerically speaking, deployment of these servers still represents only a fraction of Cloud service providers’ overall server footprint. Yet, at ten or more times the cost of a general-purpose server, accelerated compute servers are becoming a substantial portion of the data center capex.
Tier 1 Cloud service providers are increasing their spending on new infrastructure tailored for AI workloads. In Facebook’s 3Q21 earnings calls, the company announced its plans to increase capex by more than 50% in 2022. Investments will be driven by AI and machine learning to improve ranking and recommendations across Facebook’s platform. In the longer term, as the company shifts its business model to the metaverse, capex investments will be driven by video and compute-intensive applications such as AR and VR. At the same time, Tier 1 Cloud service providers, such as Amazon, Google, and Microsoft, also aim to increase spending on AI-focused infrastructure to enable their enterprise customers to deploy applications with enhanced intelligence and automation.
It has been a year since my last blog on AI data center infrastructure. Since that time, new architectures and solutions have emerged that could pave the way for the further proliferation of AI in the data center. Following are three innovations I’ll be watching closely:
New CPU Architectures
Intel is scheduled to launch its next-generation Sapphire Rapids processor next year. With its AMX (Advanced matrix Extension) instruction set, Sapphire Rapids is optimized for AI and ML workloads. CXL, which will be offered with Sapphire Rapids for the first time, will establish a memory-coherent, high-speed link PCIe Gen 5 interface between the host CPU and accelerators. This, in turn, will reduce system bottlenecks by enabling lower latencies and more efficient sharing of resources across devices. AMD will likely follow on the heels of Intel and offer CXL on EPYC Genoa. For ARM, competing coherent interfaces will also be offered, such as CCIX with Ampere’s Altra processor and NVlink on Nvidia’s upcoming Grace processor.
Faster Networks and Server Connectivity
AI applications are bandwidth hungry. For this reason, the fastest networks available would need to be deployed to connect host servers to accelerated servers to facilitate the movement of large volumes of unstructured data and training models (a) between the host CPU and accelerators, and (b) among accelerators in a high-performance computing cluster. Some Tier 1 Cloud service providers are deploying 400 Gbps Ethernet networks and beyond. The network interface card (NIC) must also evolve to ensure that server connectivity is not inhibited as data sets become larger. 100 Gbps NICs have been the standard server access speed for most accelerated compute servers. Most recently, however, 200 Gbps NICs are increasingly used with these high-end workloads, especially by Tier 1 Cloud service providers. Some vendors have added an additional layer of performance by integrating accelerated compute servers with Smart NICs or Data Processing Units (DPUs). For instance, Nvidia’s DGX system could be configured with two Bluefield-2 DPUs to facilitate packet processing of large datasets and provide multi-tenant isolation.
Rack Infrastructure
Accelerated compute servers, generally equipped with four or more GPUs, tend to be power hungry. For example, an Nvidia DGX system with 8 A100 GPUs has a maximum system power usage rated at 6.5kW. Extra consideration would be needed to ensure efficient thermal management. Today, air-based, thermal management infrastructure is predominantly used. However, as rack power densities are on the rise to support accelerated computing hardware, air-cooling efficiencies and limits are being reached. Novel liquid-based, thermal management solutions, including immersion cooling, are under development to further enhance the thermal efficiencies of accelerated compute servers.
These technology trends will continue to evolve and drive the commercialization of specialized hardware for AI applications. Please stay tuned for more updates from the upcoming Data Center Capex reports.