The data center industry is estimated to have consumed 205 terawatt-hours (TWh) or ~1% of the world’s energy consumption in 2018. Other industry estimates peg that rate higher at up to ~2%. Despite these different estimates, one thing is clear: the decade-old fear of runaway growth in data center energy consumption has proved to be unfounded. Hyperscale cloud service providers (CSPs) have largely managed that concern, with the help of industry vendors, through IT virtualization and higher utilization of power and cooling infrastructure. At the same time, enterprises data center operations, while historically less efficient, have transitioned to CSPs.
However, these estimates were calculated before the global COVID-19 pandemic, which saw the world embrace virtual collaboration, remote learning, and accelerated automation through artificial intelligence (AI) and machine learning (ML). While these trends materialized throughout 2020, rendering the industry (barely) able to meet demand, questions resurfaced about managing future energy consumption. For this reason, data center sustainability has become the most pressing issue in the data center industry, one in which data center physical infrastructure vendors believe they can play a critical role.
As part of Dell’Oro Group’s upcoming Data Center Physical Infrastructure program, we will focus on technologies that enable sustainable data center growth. That’s why data center thermal management, which consumes 30% to 40% of a data center’s annual energy consumption, second only to compute, is the logical starting place. Today, air-based, thermal management infrastructure is predominantly used. However, as rack power densities are on the rise to support accelerated computing hardware (such as GPUs and FPGAs), air-cooling efficiencies and limits are being reached. Liquids are a much more effective and efficient medium for transferring heat. For this reason, the data center industry is exploring different ways to safely bring liquids into the data center.
That’s why, when I had an opportunity to see CGG’s High Performance Compute Center, I experienced a level of nervousness and excitement that I haven’t felt in some time prior to touring a data center. This was the first time I have been inside a liquid immersion-cooled facility, supported by Green Revolution Cooling’s (GRC) infrastructure. GRC is a known leader in immersion-cooling technology, in addition to Asperitas, Submer, and other vendors. Visiting my first immersion-cooled facility felt more like a trip to Mars than the type of data center I’ve spent my entire career getting to know.
Although the data center industry treats liquid cooling as though its use for computing is new, it has actually been around for decades. It dates back to the 1990s, when it was used to cool IBM mainframes. Immersion cooling seeks to solve a similar problem today – removing heat directly at the source – but through a different method. A coolant distribution unit (CDU) is used to pump a liquid – usually some kind of mineral oil – to a rack manifold, where it fills and circulates the liquid through the rack (sometimes referred to as a vat or tank). Servers, which require some modification, are then vertically immersed in the liquid to capture and remove 100% of the generated heat. Right now, the big question being asked by the data center industry is how different does immersion cooling makes my data center?
CGG Doubles Compute Capacity with Immersion Cooling
Walking into the CGG High Performance Compute Center, any notion that I was headed to Mars was quickly dispelled. It looked like a conventional data center with a raised floor and traditional infrastructure, from the UPS down to the rack power distribution units (rPDU). The big difference was the horizontal immersion racks as opposed to vertical ones. As I observed the room, I quickly noticed was how quiet it was. CDU pumps produced the only noise. Things were quiet enough to have a conversation with the person standing next to me. The horizontal immersion racks created an open feeling, allowing me to see around the entire room.
However, a friendlier operating environment isn’t what drove CGG to adopt immersion cooling. The company had reached its limits of space, power, and cooling. In order to expand computing capacity, CGG needed more space and power or a new thermal-management solution. And the new thermal management solution – immersion cooling – did not disappoint. In the same floor space and power footprint, CGG was able to double its computing capacity. Additionally, a significant portion of the existing infrastructure was utilized, while deploying immersion racks in scalable, 100 kW cooling-capacity increments. As a result, CGG had no downtime and only limited capital expenditures (CAPEX) during the transition to immersion cooling.
These benefits aren’t unique to CGG’s deployment of immersion cooling. In fact, they can be achieved by many players in the data center industry struggling with space, power, or cooling constraints. To quantify the benefits, CAPEX for construction of a new immersion-cooled data center relative to a traditional air-cooled build can be reduced by 20%. This is the result of eliminating certain infrastructure, such as chillers or air handlers, in addition to smaller-sized electrical infrastructure, such as UPSs, switch gears, and power distribution.
The case for immersion cooling becomes even more compelling when considering operational expenditures (OPEX). Immersion-cooling systems use less power as a result of removing server fans, air handling units, and chilled water systems. Lower-power consumption for thermal management means reduced annual energy costs. Additionally, with fewer moving parts in an immersion-cooling solution, maintenance costs are also reduced. In total, immersion cooling OPEX costs can decrease by up to 33% compared to traditional air-cooled data center builds. From a total cost of ownership (TCO) perspective over the 10-year life of a data center, it’s achievable for an immersion-cooled data center to cost half as much as a traditional air-cooled build.
Immersion Cooling Brings Small Changes to Data Center Operations
So, what’s the catch? The human element of operations in the mission-critical, data center industry can’t be overlooked. Data center uptime is measured by the number of nines (e.g., 99.9% v. 99.9999% uptime), as downtime can translate into hundreds of thousands of dollars – or even millions – in lost revenue. Historically, this had led to slow adoption of new technologies. Early adopters are often driven by need, as is the case with liquid cooling for HPC. But, with increased adoption of accelerated compute, many other companies are already struggling or are expected to struggle with the limits of air-cooling in the near future.
In my visit to CGG’s High Performance Compute Center, I was most eager to learn about the “quirks” of immersion cooling. The biggest difference from air-cooled builds is in server maintenance. Servers have to be pulled out of the oil by hand or using a small, overhead lift. They can then be laid across the tank while work is performed, either immediately or after a short period of drip drying. After maintenance is complete, the server is simply immersed back into the rack.
Other operational differences that data center owners and operators must consider are:
- Containment of the oil in which servers are immersed is top of mind. For CGG, this didn’t appear to be a problem. Different combinations of rack and row and room containment are used to manage any dripping when removing servers. It’s definitely handy to keep a roll of oil-absorbent towels around but no major spills have occurred.
- Stickers imprinted with a server’s serial number can come loose during immersion. This seemed to be the biggest potential headache. If a sticker comes loose, it doesn’t cause any damage to the immersion cooling system due to the filtration system. However, it’s possible for a missing sticker to impact asset management. Some immersion-ready servers already utilize a pull-tag system. This eliminates the issue. Development of oil-resistant stickers is also being explored.
- Cable management isn’t more complex for immersion cooling, just different. CGG utilizes multiple generations of GRC immersion racks, which reflect the evolution of rPDU and network switch placement. They have moved between dry space in the rack and mounted on the back of the tank. GRC’s latest immersion-cooling product, the ICEraQ 10, utilizes dry space in the top-rear of the rack for rPDUs with networking switches mounted on the front behind a panel.
- Lastly, beware of crickets. It turns out that crickets have a taste for the particular immersion oil GRC uses, so an open bay door may lead to an extra visitor. Just like a loose serial number sticker, there is no threat of damage – just an unexpected find when opening the rack lid.
Immersion Cooling Answers the Call for Sustainable Data Centers of the Future
The engineered benefits of immersion cooling can’t be denied – higher utilization of space and power, while achieving lower CAPEX and OPEX relative to a traditional air-cooled facility. However, I didn’t need to visit an immersion-cooled facility to understand the cost savings. My biggest takeaway was correction of my misconception that an immersion-cooled data center would be dramatically different from an air-cooled facility. It was familiar, like other data centers I have toured. The only difference in physical infrastructure was the rack itself. IT infrastructure is mounted vertically, as opposed to horizontally. Immersion-ready servers are available today with expanding partnerships between chip, server, and immersion vendors working on the next generation of compute. While planning for a few operational differences that need to occur, to my surprise, necessary adjustments are relatively minor. So can immersion cooling be a part of the solution that supports sustainable data centers of the future? After my visit to CGG’s High Performance Compute Center, I believe it just might be.
This November, Dell’Oro Group will launch a new Data Center Physical Infrastructure subscription program. As the program’s lead analyst, I will dig deeper into the market outlook, growth drivers, and the competitive landscape of the data center physical infrastructure market. I will quantify industry trends and developments, providing a timely, accurate, and detailed analysis. To learn more about Dell’Oro Group’s new Data Center Physical Infrastructure program, please contact us at dgsales@delloro.com.