tech

Google launched a supercomputer with 26,000 H100s, accelerating the AI arms race

Cloud providers are assembling a vast army of GPUs to deliver more AI firepower. At today's annual Google I/O developer conference, Google announced an AI supercomputer with 26,000 GPUs—A3, which is another evidence of Google's increased resource investment in actively counterattacking the struggle for AI dominance with Microsoft.

This supercomputer has approximately 26,000 Nvidia H100 Hopper GPUs. For reference, the world's fastest public supercomputer, Frontier, has 37,000 AMD Instinct 250X GPUs.

"We can build an A3 supercomputer with up to 26,000 GPUs in a single cluster for our largest customers and are working to build multiple clusters in our largest regions," a Google spokesperson said in an email, adding, "Not all of our locations will scale to this size."

The system was announced at the Google I/O conference held in Mountain View, California. The developer conference has become a showcase for many of Google's artificial intelligence software and hardware capabilities. Google has accelerated its AI development following Microsoft's application of OpenAI's technology to Bing search and office productivity applications.

Advertisement

The supercomputer is aimed at customers who wish to train large language models. Google announced accompanying A3 virtual machine instances for companies that wish to use the supercomputer. Many cloud providers are now deploying H100 GPUs, and Nvidia launched its own DGX cloud service in March, which is expensive compared to renting the previous generation A100 GPUs.

Google stated that the A3 supercomputer is a significant upgrade to the computing resources provided by the existing A2 virtual machines with Nvidia A100 GPUs. Google is consolidating all A3 computing instances distributed across different geographical locations into one supercomputer.

"The scale of the A3 supercomputer can provide up to 26 exaflops of AI performance, which significantly reduces the time and cost of training large ML models," said Google's Director Roy Kim and Product Manager Chris Kleban in a blog post.

Companies use the exaflops performance metric to estimate the raw performance of AI computers, but critics remain reserved about it. In Google's case, the criticism is that the results are calculated in bfloat16 ("brain floating point") performance targeted at ML, which allows you to reach "exaflops" much faster than the double-precision (FP64) floating-point mathematics still used by most classical HPC applications.

The number of GPUs has become an important card for cloud providers to promote their AI computing services. Microsoft's AI supercomputer in Azure, built in cooperation with OpenAI, has 285,000 CPU cores and 10,000 GPUs. Microsoft also announced the next-generation AI supercomputer equipped with more GPUs. Oracle's cloud services provide access to 512 GPU clusters and are researching new technologies to improve the speed of GPU communication.

Google has been heavily promoting its TPU v4 artificial intelligence chips, which are used to run internal AI applications with LLMs, such as Google's Bard product. Google's AI subsidiary, DeepMind, stated that fast TPUs are guiding AI development for general and scientific applications.In comparison, Google's A3 supercomputer has a wide range of uses and can be adjusted for a broad spectrum of AI applications and Large Language Models (LLMs). Kim and Kleban stated in their blog post: "Given the demanding nature of these workloads, a one-size-fits-all approach is insufficient—you need infrastructure built specifically for AI."

Just as Google favors its Tensor Processing Units (TPUs), Nvidia's GPUs have become a necessity for cloud providers because customers are writing AI applications in CUDA, which is Nvidia's proprietary parallel computing model. This software toolkit is based on the dedicated AI and graphics cores of the H100, which provides the fastest results with accelerated generation.

Customers can run AI applications through A3 virtual machines and use Google's AI development and management services through Vertex AI, Google Kubernetes Engine, and Google Compute Engine services. Companies can use the GPUs on Google's A3 supercomputer as a one-time rental, combined with large language models to train large models. Then, new data is fed into the model to update it—without the need to retrain from scratch.

Google's A3 supercomputer is a hodgepodge of various technologies to enhance GPU-to-GPU communication and network performance. The A3 virtual machine is based on Intel's fourth-generation Xeon chips (codenamed Sapphire Rapids), which are offered alongside the H100 GPU. It is currently unclear whether the virtual CPUs in the VM will support the inference accelerators built into the Sapphire Rapids chips. The VM comes with DDR5 memory.

Training models on the Nvidia H100 is faster and cheaper than the previous generation A100 GPUs widely used in the cloud. A study conducted by AI services company MosaicML found that the H100 is "30% more cost-effective and three times faster" on its 7 billion parameter MosaicGPT large language model.

The H100 is also capable of inference, but considering the processing power offered by the H100, this might be seen as overkill. Google Cloud offers Nvidia's L4 GPUs for inference, and Intel has inference accelerators in its Sapphire Rapids CPUs.

"The A3 VM is also well-suited for inference workloads, with up to 30 times the inference performance compared to the A100 GPU on our A2 VMs," said Google's Kim and Kleban.

The A3 virtual machine is the first VM to connect GPU instances through an infrastructure processing unit called Mount Evans, which was jointly developed by Google and Intel. The IPU allows the A3 virtual machine to offload network, storage management, and security functions, which are traditionally performed on virtual CPUs. The IPU enables data transfer at speeds of 200Gbps.

"A3 is the first GPU instance to use our custom-designed 200Gbps IPU, with GPU-to-GPU data transfers bypassing the CPU host and flowing through an interface separate from other VM network and data traffic. This provides a tenfold increase in network bandwidth compared to our A2 VMs, with low tail latency and high bandwidth stability," said Google executives in a blog post.IPU's throughput may soon be challenged by Microsoft, whose upcoming AI supercomputer equipped with Nvidia H100 GPUs will feature the chipmaker's Quantum-2 400Gbps networking capabilities. Microsoft has not yet disclosed the number of H100 GPUs in its next-generation AI supercomputer.

The A3 supercomputer is built on a backbone derived from the company's Jupiter data center network architecture, which connects geographically disparate GPU clusters through optical links.

"For almost every workload structure, we have achieved workload bandwidth indistinguishable from more expensive off-the-shelf non-blocking network structures," said Google.

Google also shared that the A3 supercomputer will have eight H100 GPUs interconnected using Nvidia's proprietary switch and chip interconnect technology. The GPUs will be connected via NVSwitch and NVLink interconnects with a communication speed of approximately 3.6TBps. Azure offers the same speed on its AI supercomputer, and both companies have deployed Nvidia's circuit board designs.

"Each server uses NVLink and NVSwitch within the server to interconnect the 8 GPUs. To allow GPU servers to communicate with each other, we have used multiple IPUs on the Jupiter DC network architecture," said a Google spokesperson.

The setup is somewhat similar to Nvidia's DGX Superpod, which has a 127-node configuration, with each DGX node equipped with eight H100 GPUs.

Google Blog: A3 Supercomputer with NVIDIA H100 GPUs

Implementing state-of-the-art artificial intelligence (AI) and machine learning (ML) models requires substantial computing power, both for training the underlying models and for serving them after they have been trained. Considering the demands of these workloads, a one-size-fits-all approach is insufficient—you need infrastructure specifically built for AI.

Together with our partners, we offer a wide range of computing options for ML use cases, such as large language models (LLM), generative AI, and diffusion models. Recently, we launched G2 VMs, becoming the first to provide the new NVIDIA L4 Tensor Core GPU for serving generative AI workloads. Today, we expand this portfolio by introducing the private preview of the next-generation A3 GPU supercomputer. Google Cloud now offers a complete set of GPU options for training and inference of ML models.

Google Compute Engine A3 supercomputer is designed for training and serving the most demanding AI models, which power the innovations of today's generative AI and large language models. Our A3 VM combines NVIDIA H100 Tensor Core GPU with Google's leading networking technology, serving customers of all scales:1. The A3 is the first GPU instance to utilize our custom-designed 200 Gbps IPU, where GPU-to-GPU data transfer bypasses the CPU host and flows through an interface distinct from the network and data traffic of other VMs. Compared to our A2 VM, this enables up to 10 times the network bandwidth with low tail latency and high bandwidth stability.

2. Our industry-unique intelligent Jupiter data center network architecture is scalable to tens of thousands of highly interconnected GPUs and allows for full-bandwidth reconfigurable optical links that can adjust the topology on demand. For almost every workload structure, the workload bandwidth we achieve is indistinguishable from more expensive off-the-shelf non-blocking network structures, thereby reducing the Total Cost of Ownership (TCO).

3. The A3 supercomputer scale offers up to 26 exaFlops of AI performance, significantly reducing the time and cost of training large ML models. As companies transition from training to providing ML models, the A3 VM is also well-suited for inference workloads, with inference performance up to 30 times greater compared to our A2 VM, which is powered by NVIDIA A100 Tensor Core GPUs*.

The A3 GPU VM is specifically designed to provide the highest performance training for today's ML workloads, equipped with modern CPUs, improved host memory, next-generation NVIDIA GPUs, and major network upgrades. Here are the main features of the A3:

1. 8 H100 GPUs, leveraging NVIDIA's Hopper architecture, offering 3 times the computational throughput.

2. With NVIDIA NVSwitch and NVLink 4.0, the bisection bandwidth between A3's 8 GPUs is 3.6 TB/s.

3. Next-generation 4th Gen Intel Xeon Scalable processors.

4. 2TB of host memory, through 4800 MHz DDR5 DIMMs.

5. Increased network bandwidth by 10 times, supported by our hardware-assisted IPU, dedicated server-to-server GPU communication stack, and NCCL optimization.The A3 GPU VM represents a leap forward for customers developing the most advanced ML models. By significantly accelerating the training and inference of ML models, A3 VM enables businesses to quickly train more complex ML models, creating opportunities for our customers to build Large Language Models (LLMs), generative AI, and diffusion models to help optimize operations and stay ahead in the competition.

This release is based on our partnership with NVIDIA, aimed at providing our customers with a comprehensive GPU option for training and inference of ML models.

Ian Buck, Vice President of NVIDIA's High Performance Computing and Supercomputing, said, "Google Cloud's A3 VM, powered by the next-generation NVIDIA H100 GPU, will accelerate the training and serving of generative AI applications." "Following Google Cloud's recent launch of G2 instances, we are proud to continue our collaboration with Google Cloud to help transform global businesses with purpose-built AI infrastructure."

For customers who wish to develop complex ML models without the need for maintenance, you can deploy A3 VMs on Vertex AI, an end-to-end platform for building ML model training on a fully managed infrastructure built for low-latency services and high performance. Today, at Google I/O 2023, we are excited to build on these offerings by opening up generative AI support in Vertex AI to more customers and introducing new features and foundational models.

For customers who wish to build their own custom software stack, customers can also deploy A3 VMs on Google Kubernetes Engine (GKE) and Compute Engine, allowing you to train and serve the latest foundational models while enjoying automatic scaling, workload orchestration, and automatic upgrades.

"Google Cloud's A3 VM instances provide us with the computational power and scale to meet our most demanding training and inference workloads. We look forward to leveraging their expertise in the AI field and their leadership in large-scale infrastructure to provide a powerful platform for our ML workloads." - Noam Shazeer, CEO of Character.AI

At Google Cloud, artificial intelligence is in our DNA. We apply decades of experience running global-scale computations for AI. We designed this infrastructure to scale and optimize for running a variety of AI workloads - now, we are making it available to you.

Leave A Comment