NVIDIA and Anyscale announced a collaboration to boost performance and efficiency for large language model (LLM) development using Anyscale's Ray framework. This integration aims to significantly expedite building, training, and deploying generative AI models.
The companies are bringing NVIDIA's AI software into Anyscale's offerings, including the Ray open source platform, Anyscale's fully-managed Platform, and the newly announced Anyscale Endpoints.
NVIDIA is contributing TensorRT-LLM, Triton Inference Server, and NeMo to Ray's open source ecosystem. For organizations that prioritize enterprise-grade security, stability, and support, the companies are certifying the NVIDIA AI Enterprise software suite for the Anyscale Platform.
According to Anyscale, integrating NVIDIA's performant software and hardware can accelerate computing speeds for end-to-end LLM development. Specifically, TensorRT-LLM maximizes efficiency and parallelism, enabling up to 8x faster inference on NVIDIA's H100 GPUs versus previous generations.
Anyscale Endpoints, also newly announced, provides developer APIs to easily integrate pre-tuned LLMs into applications. Endpoints promotes scalable, cost-efficient production deployments.
NVIDIA Triton Inference Server standardizes AI model deployment and maximizes performance across various platforms, including the cloud, data centers, edge computing, and embedded devices. These capabilities bring additional efficiency to developers deploying AI on Ray and the Anyscale Platform.
NVIDIA NeMo, a cloud-native framework, allows developers to build, customize, and deploy generative AI models. The framework includes training and inferencing frameworks, guardrailing toolkits, and data curation tools, offering businesses an easy and cost-effective way to adopt generative AI. When integrated with Ray and the Anyscale Platform, NeMo enables businesses to tailor large language models using their enterprise data.
This collaboration aims to make cutting-edge AI acceleration accessible to a broad base of developers through Anyscale's products spanning open source, fully-managed cloud, and simple ML application integration.
It's a promising sign that the AI industry is focusing on removing barriers to entry, offering more scalable, efficient, and customizable solutions for both developers and enterprises. The NVIDIA AI integrations with Anyscale are currently in development and are expected to be available in Q4. Interested developers can apply for early access.