|
||||||
|
|
|
|
NVIDIA AI 基础设施的 Rubin 世代是全球首个实现 100% 液冷的技术——每一颗芯片、每一个网络组件,都完全由液体在闭环中冷却,系统内无任何风扇。这种液冷方法在《NVIDIA DSX AI》工厂参考设计中有所阐述,该指南概述了设计、构建和运营整个AI工厂基础设施栈的最佳实践。 尽管每一代人每瓦的计算能力显著增加,完整的液冷AI计算基础设施使数据中心能够显著降低冷却能耗——在超大规模中对整体数据中心能耗产生显著影响。 “NVIDIA DSX为AI工厂设计的水资源为零——我们消除了大量电力消耗,几乎消除了所有用水,”NVIDIA数据中心冷却与基础设施总监Ali Heydari表示。“基于干冷设计的系统是闭环系统,没有蒸发水冷——除非某些气候区每年大约1%需要冷却机。” 历史上,仅冷却量就占数据中心电力消耗的40%,使其成为提升效率能够降低运营成本和能源需求的最重要领域之一。 行业估计,冷却机厂温度仅提高1度即可降低约4%的制冷能源成本。在大规模生产中,这些节省会迅速积累起来。一座50兆瓦的超大规模设施通过转向液冷基础设施,每年可节省超过400万美元的冷却相关能源和水费成本。 在有利气候条件下,英伟达45度液冷架构可实现无冷却器运行,使用干式冷却器,将设施冷却用水量从传统塔式系统每年约260万加仑/兆瓦降至接近零——水资源消耗可达100%。 原因在于:传统的风冷数据中心依赖大量冷却空气来从IT设备中去除热量,这在炎热天气下往往需要耗能密集的冷却基础设施。而NVIDIA采用45度液冷技术,热量直接在芯片处捕获,并通过运行温度更高的液态回路传输,使户外干式冷却器全年高效排热,同时显著减少机械冷却需求和设施用水量。 数据中心的环境温度是灵活的——温暖的夏季空气是可以的——因为服务器内没有任何东西依赖冷空气。液体完成了所有工作——而且同样的液体可以在闭环中循环,这样就不会消耗新的水来冷却芯片。
行业的新标准
生态系统正在跟上步伐。Motivair是施耐德电气的先进冷却部门,近十年来一直与英伟达的产品路线图并肩作战——其总裁兼首席执行官理查德·惠特莫尔表示,随着功率密度超过空气冷却不再可行的门槛,这种合作关系只会更加紧密。 “一旦每颗芯片的功率超过某个水平,液冷就成为强制要求,”惠特莫尔说。 “太热而无法冷却人工智能基础设施”比你想象的还要热 实际上,芯片能承受比这种本能所显示的更温暖的环境。硅处理器会产生巨大的内部热量——冷却液进入全液冷芯片时温度为45摄氏度,冷却剂在约55度时排出,吸收了芯片表面的热量负荷。但性能并未下降。 处理器能够保持全功能运行,因为液冷冷板能将设备温度控制在经过验证的工作范围内,即使冷却液进入机架时温度为45摄氏度。 没有风扇,没有冰冷的过道——这是一台根本不同的机器
鲁宾建筑改变了这一局面。 冷却剂——75%水和25%丙二醇——流经直接放置在处理器上的冷板,从热源处抽取热量。冷却液温度高达45摄氏度,意味着在许多气候条件下,设施回路可以排散热量而不需启动机械冷却器和噪音较大的风扇。
在人工智能工厂中,冷却液从冷却剂分配单元流向服务器,形成闭环循环。 在合适的地理位置——户外空气稳定凉爽的地方——液冷数据中心可以通过冷却剂分配单元直接在热源处捕捉热量,并将其输送到室外干式冷却器,这些冷却器本质上是放置在建筑物外部的大型散热器线圈。 环线填满一次后,设施寿命内关闭。而且相比传统空气冷却基础设施,它在AI工厂中占用的空间要少得多。 “在合适的地理位置和正确的系统设计下,你不需要任何制冷设备,”惠特莫尔说。“你可以直接在户外装大散热器线圈,利用空气温度来冷却所有温度。效率极高。” 地理上的限制很重要。苏格兰高地的数据中心和亚利桑那州凤凰城的数据中心面临截然不同的现实。但即使在温暖气候中,向45摄氏度冷却剂的转变也使运营商更接近无冷水机的理想——冷水机一年中可能只有几天在外部气温需要时才启动。 这一新模型对AI工厂的另一个关键优势是具有废热回收潜力,即AI工厂运营产生的残余热量可以被重新利用,用于附近的商业或住宅建筑供暖。 没人解决的工程难题
NVIDIA的热工程团队重新设计了这些元件的热处理方式,设计了冷却回路,简化了液体通过单一进出口将液体输送到主板上多个高功率芯片的方式,从而实现了更简洁的托盘级冷却架构。 一个明显的效果是:Rubin服务器拥有干净且密封的前面板,而风冷服务器则是穿孔边框。另一个例子是:全液冷服务器比风冷服务器更高的机架密度,因此之前占据六个机架单元的系统现在可容纳为两个——更多的计算量、更少的空间和更少的噪音。
液冷基础设施、架空管道通过强大的AI服务器路由。
如果不提升计算冷却效率,大规模运行AI的能源成本将与硬件同步增长。液冷温度可达45摄氏度——比热水浴缸还热,对地球更凉爽——是行业缩小这一差距最重要的工具之一。 了解更多关于液冷、NVIDIA DSX平台用于AI工厂以及NVIDIA节能AI基础设施的理念。
NVIDIA’s latest AI servers can run on coolant warmer than a hot tub — and that counterintuitive choice is one of the biggest efficiency leaps in data center history. Hot tubs sit at about 38 to 40 degrees Celsius, warm enough that most people can only soak for about 15 minutes. NVIDIA’s newest AI servers can run their cooling liquid even hotter — up to 45 degrees Celsius, or 113 degrees Fahrenheit. That higher temperature limit is precisely what makes them more energy efficient. The Rubin generation of NVIDIA AI infrastructure is the world’s first to achieve 100% liquid cooling — every chip, every networking component, cooled entirely by liquid in a closed loop with no fans anywhere in the system. This liquid cooling methodology is outlined in the NVIDIA DSX AI factory reference design, a guide that outlines best practices to design, build and operate the entire AI factory infrastructure stack. Although each generation offers significantly more computing power for each watt, full liquid-cooled AI compute infrastructure enables data centers to dramatically reduce cooling energy consumption — making a meaningful difference to overall data center energy use at hyperscale. “The NVIDIA DSX reference design for AI factories has zero water consumption — we have eliminated massive amounts of power usage and pretty much all water usage,” said Ali Heydari, director of data center cooling and infrastructure at NVIDIA. “With dry-cooler-based designs, it’s a closed-loop system with no evaporative water cooling — outside of maybe 1% of the year when we might need chillers in some climates.” Historically, cooling alone has accounted for up to 40% of a data center’s electricity consumption, making it one of the most significant areas where efficiency improvements can drive down both operational expenses and energy demands. Industry estimates suggest that raising chiller plant temperatures by just one degree can cut cooling energy costs by about 4%. At scale, those savings add up quickly. A 50-megawatt hyperscale facility can save over $4 million annually in cooling-related energy and water costs by moving to liquid-cooled infrastructure. In favorable climates, NVIDIA’s 45-degree liquid-cooling architecture can enable chiller-less operation with dry coolers, reducing facility cooling water consumption from roughly 2.6 million gallons per megawatt per year for conventional cooling-tower-based systems to near zero — up to a 100% reduction in water use. The reason: traditional air-cooled data centers depend on large volumes of cooled air to remove heat from IT equipment, often requiring energy-intensive cooling infrastructure during hot weather. With NVIDIA’s 45-degree liquid cooling, heat is captured directly at the chip and transported through liquid loops operating at much higher temperatures, allowing outdoor dry coolers to reject heat efficiently for much of the year while significantly reducing mechanical cooling requirements and facility water consumption. The data center ambient temperature is flexible — warm summer air is fine — because nothing in the server depends on cool air. The liquid does all the work — and the same liquid can be recirculated in a closed loop so no new water is consumed to cool the chips.
A New Standard for the Industry The ecosystem is keeping pace. Motivair, the advanced cooling division of Schneider Electric, has worked alongside NVIDIA’s product roadmap for nearly a decade — and Richard Whitmore, its president and CEO, says the relationship only intensified as power densities crossed the threshold where air cooling was no longer a viable option. “Once the watts per chip crossed a certain level, liquid cooling became mandatory,” said Whitmore. Too Hot to Cool AI Infrastructure Is Hotter Than You’d Think In reality, chips can sustain far warmer environments than that instinct suggests. Silicon processors generate enormous internal heat — the coolant entering a fully liquid-cooled chip at 45 degrees Celsius exits at roughly 55 degrees, having absorbed that heat load across the chip surface. Yet performance doesn’t degrade. The processors continue to operate at full performance because liquid-cooled cold plates keep device temperatures within validated operating limits, even with coolant entering the rack at 45 degrees Celsius. No Fans, No Cold Aisles — A Fundamentally Different Machine The Rubin architecture changes the picture. Coolant — 75% water and 25% propylene glycol — flows through cold plates that sit directly on processors, pulling heat out at the source. Running that coolant at up to 45 degrees Celsius means that in many climates, the facility loop can reject heat without turning on mechanical chillers and noisy fans. In an AI factory, coolant flows from a coolant distribution unit to the servers in a closed-loop cyle. In the right geography — somewhere with reliably cool outdoor air — a liquid-cooled data center can reject its heat through coolant distribution units that capture heat directly at the source and transport it to outdoor dry coolers, essentially large radiator coils positioned outside the building. The loop is filled once and runs closed for the life of the facility. And it takes dramatically less space in the AI factory compared to traditional air-cooling infrastructure. “In the right geographic location, with the right system design, you don’t need any refrigeration equipment,” Whitmore said. “You can just put big radiator coils outside and use the air temperature for all your cooling. It’s incredibly efficient.” The geography caveat matters. A data center in the Scottish Highlands and one in Phoenix, Arizona, face very different realities. But even in warmer climates, the shift toward 45-degrees-Celsius coolant moves operators significantly closer to that chiller-less ideal — where chillers may turn on just a few days a year when the outside air temperature demands it. Another key benefit of this new model for AI factories is the potential for waste heat recovery, where residual heat from AI factory operations can be repurposed to heat commercial or residential buildings nearby. The Engineering Problem Nobody Had Solved NVIDIA’s thermal engineering team reworked how those components handle heat, designing cooling loops that simplify how liquid is routed to multiple high-power chips on the board using a single inlet and outlet, resulting in a cleaner tray-level cooling architecture. One visible outcome: Rubin servers have clean, sealed front panels where air-cooled servers have perforated bezels. Another: fully liquid cooled servers enable higher rack density than air-cooled servers, so a system that previously occupied six rack units now fits in two — more compute, less space, less noise. Liquid cooling infrastructure overhead pipes routes into powerful AI servers. Without efficiency improvements in how that compute is cooled, the energy cost of running AI at scale would grow in lockstep with the hardware. Liquid cooling at up to 45 degrees Celsius — hotter than a hot tub, cooler for the planet — is one of the most important tools the industry has to close that gap. Learn more about liquid cooling, the NVIDIA DSX platform for AI factories and NVIDIA’s approach to energy-efficient AI infrastructure. NVIDIA GTC Berlin Registration Is Now Open Register Now
关于我们 北京汉深流体技术有限公司是丹佛斯液冷产品的核心分销商,专注于为数据中心提供液体冷却解决方案。我们竭诚为人工智能计算集群和高密度服务器场景提供高效的冷却解决方案。我们在2026年会上再获殊荣 —— 成功蝉联丹佛斯动力系统 2025 财年 “卓越成长 业绩突破奖(Performance Outstanding Award)”。这是汉深流体继 2024 财年后再度斩获该奖项,既是丹佛斯对其业绩高速增长、市场拓展能力的高度认可,更彰显了汉深流体在数据中心液冷连接领域的专业积淀与行业标杆地位。汉深产品包括FD83全流量双联锁液冷快换接头(互锁球阀);液冷通用快速接头UQD & UQDB;OCP ORV3盲插快换接头BMQC;EHW194 EPDM液冷软管、电磁阀、压力和温度传感器。在人工智能AI、国家数字经济、东数西算、双碳、新基建战略的交汇点,公司聚焦组建高素质、经验丰富的液冷工程师团队,为客户提供卓越的工程设计和强大的客户服务,支持全球范围内的大批量交付。 公司产品涵盖:丹佛斯液冷通用零泄漏快换接头、EPDM软管、电磁阀、压力和温度传感器及Manifold。 数据中心液冷解决方案 ~ 满足您各种需求的一站式解决方案: 在快速连接器与管路配件领域,英伟达的官方参考设计中主要绑定了丹佛斯Danfoss产品线。这些核心零部件必须在高密度、高振动环境下提供绝对的密封保证: 1、丹佛斯 Hansen UQD/UQDB 系列:作为专为数据中心直接液冷定制的无滴漏快速接头,其通流系数(Cv)超越 OCP 行业标准25% 以上,在降低管路系统功耗的同时提供一键式单手推拉断开功能。其表面采用阳极氧化铝或 SS303/SS316 不锈钢,可承受-40°C至150°C的严苛温度环境。
|
|