Amazon Web Services (AWS) has announced the general availability of Amazon Elastic Compute Cloud (EC2) Capacity Blocks for machine learning (ML) workloads. This new offering allows customers to reserve high-performance compute capacity in Amazon's data centers for short durations, providing predictable access to the GPUs needed to power cutting-edge artificial intelligence development.
The launch comes amid booming demand for the compute-intensive workloads required to train and run AI models, especially large language models and generative AI applications. But GPUs required for such workloads are in short supply. Amazon's new capacity blocks aim to provide a flexible way for organizations to gain access to NVIDIA GPU clusters for their critical AI projects without long-term commitments.
"Advancements in machine learning have unlocked opportunities for organizations of all sizes, but demand for GPUs has outpaced supply," said David Brown, Vice President of Compute at AWS. "With EC2 Capacity Blocks, customers can reserve the GPU capacity they need for short durations to run their ML workloads, without having to hold onto GPU capacity when not in use."
With EC2 Capacity Blocks, customers can reserve hundreds of NVIDIA GPUs in EC2 UltraClusters designed for high-performance ML tasks. These blocks can be used with P5 instances powered by NVIDIA H100 Tensor Core GPUs, and customers can specify cluster size, future start date, and duration. EC2 Capacity Blocks ensure uninterrupted access to the necessary GPU compute capacity, which is particularly beneficial for critical ML projects.
The blocks are deployed in EC2 UltraClusters, providing low-latency and high-throughput connectivity. Customers can reserve GPU instances for durations between one to 14 days, up to eight weeks in advance, and in cluster sizes ranging from one to 64 instances (512 GPUs). This flexibility allows customers to run a wide range of ML workloads and pay only for the GPU time needed.
EC2 Capacity Blocks are designed to support various machine learning uses cases that require specialized high-performance hardware:
- Training and fine-tuning large AI models
- Running experiments and prototypes
- Scaling to meet surges in demand for AI apps
For startups and enterprises building with generative AI, access to GPUs in the cloud can accelerate development and remove infrastructure barriers.
Several companies, including Amplify Partners, Canva, Leonardo.Ai, and OctoML, have expressed enthusiasm for using EC2 Capacity Blocks for ML. AWS and NVIDIA's long-standing collaboration has resulted in scalable, high-performance GPU solutions that empower companies to build transformative generative AI applications.
For instance, Canva, a visual communications platform, anticipates utilizing EC2 Capacity Blocks to predictably scale hundreds of GPUs for training larger models. Similarly, Leonardo.Ai, which provides a generative AI platform for creative production, sees the new offering as an opportunity to elastically access GPU capacity for training and experimentation.
"This is a game-changer in the current supply-constrained environment," said Mark LaRosa of venture firm Amplify Partners. "It will provide startups with the GPU capacity they need, when they need it, without long-term capital commitments."
The launch highlights how cloud providers like AWS are enabling access to advanced computing power in more flexible ways. For machine learning workloads that ebb and flow, customers pay only for the capacity blocks booked, without excess idle resources.
Amazon EC2 Capacity Blocks for ML are now available in the AWS US East (Ohio) Region, with plans to expand to additional AWS Regions and Local Zones. As AI takes center stage, cloud infrastructure providers continue to evolve to keep pace with emerging demands. By providing a flexible, scalable, and cost-effective solution, AWS ensures that organizations can continue to innovate and transform their operations through generative AI, regardless of their size or industry.