In case there were any remaining doubts about the future of computing in the data center, those doubts were removed yesterday thanks to two major announcements highlighting the role of connectivity across computing elements. In a word, it’s heterogeneous.
The biggest news was the surprise announcement from Nvidia about its proposed $6.9 billion purchase of Israel-based Mellanox, a company best known for making high-performance networking and interconnect chips and cards for servers and other data center components. In a deal that’s largely seen as being complementary and financially accretive to Nvidia’s revenues, the GPU maker is looking to add Mellanox’s strength and position in providing networking elements into high-performance computing systems to its growing data center offerings.
Specifically, Mellanox is known as being a critical enabler for what’s often called east-to-west networking across servers within data centers. The explosive demand for this kind of capability is being driven, in large part, because of the growth of Docker-style containers, which are a critical enabling technology for advanced workloads such as AI training and advanced analytics. Many of these advanced applications spread their computing requirements across different servers through the use of containers, which enable larger applications to be split up into smaller software components. These applications often require large chunks of data to be shared across multiple servers and processed simultaneously via high-speed network connections. That’s exactly the Mellanox technology that Nvidia wants to be able to leverage for future data center business growth. (In fact, Nvidia uses Mellanox components in its DGX line of general-purpose GPU(GPGPU) powered servers and workstations.)
Nvidia’s interest is also being driven by the nature of how computing in the data center is evolving. The underlying principle is that advanced workloads like AI need to be architected in new ways, especially as Moore’s Law advancements have slowed the speed increases that were previously available to new generations of chips. In particular, in order to maintain performance advancements and provide the kind of computing power necessary for these workloads, these workloads are going to have to split across multiple chips, multiple chip architectures, and multiple servers. In other words, they need a heterogeneous computing environment.
Those same principles are also what drove the other data center-related announcement yesterday from Intel and a number of other major data center players including Dell EMC, HPE, Microsoft, Cisco, Google, Facebook, Huawei, and Alibaba. Specifically, they announced the creation of the Compute Express Link (CXL) Consortium and the 1.0 release of the CXL specification for high-speed interconnect between CPUs and accelerator chips. Leveraging the physical and electrical interconnect capabilities of the upcoming PCI 5.0 spec, CXL consists of a protocol that allows for a cache coherent, shared memory architecture that permits the shuttling of data between CPUs and various type of other chips, including TPUs, GPUs, FPGAs, and other types of AI accelerators. Again, the idea is that advanced data center workloads are becoming increasingly diverse and will require new types of chip architectures and computing models to achieve better performance over time.
At a basic level, the difference between the two announcements is that the Mellanox/Nvidia technology operates at a higher level between devices, whereas the CXL protocol works at the chip level within devices. In theory, the two could work together, with CXL-enabled servers communicating with each other over high-speed network links.
Though it’s early, the CXL announcement looks like it could have an important impact on the evolution of data center computing. But to be clear, a number of challenges still remain. For one, CXL already faces a competitor in the CCIX standard (which Mellanox happens to be part of), and Nvidia offers its own NVLink standard for fast GPU-to-GPU connections. In addition, AMD’s Infinity Fabric seems to offer similar capabilities. If other CPU vendors like AMD and Arm (and its licensees) sign onto the CXL standard, however, that would clearly have a big impact on its adoption, so this will be interesting to watch.
On the potential Nvidia Mellanox merger, the one big question (other than the necessary geographic approvals and the potential geopolitical implications) is whether or not the purchase could drive Intel and other big players in the data center space to work more with other networking suppliers. Only time will tell on that front.
What’s also interesting about both announcements is that they clearly highlight the evolution of data center computing away from simply adding more, faster x86-based CPU cores to individual servers to a much more complex mesh of computing pieces connected together across the data center. It’s heterogeneous computing coming to life from two different perspectives and yet they both clearly point to an important evolution of how computing is starting to be done in data centers around the world.