NVIDIA Research

NVIDIA's LATTE3D Can Generate 3D from Text Prompts in Seconds

March 21, 2024 • 1 min read

Researchers from NVIDIA have unveiled LATTE3D, a new model that is capable of transforming text prompts into high-quality 3D shapes in milliseconds. This rapid generation is particularly impressive and could significantly streamline the creative process. For example, it could allow designers to quickly iterate on ideas as they come to mind, rather than starting from scratch or searching through asset libraries.

According to Sanja Fidler, vice president of AI research at NVIDIA, "A year ago, it took an hour for AI models to generate 3D visuals of this quality — and the current state of the art is now around 10 to 12 seconds. We can now produce results an order of magnitude faster, putting near-real-time text-to-3D generation within reach for creators across industries."

The model generates multiple 3D shape options for each text prompt, giving creators a range of choices. Selected objects can then be optimized for higher quality within minutes and exported into various graphics software applications or platforms like NVIDIA Omniverse.

While the researchers trained LATTE3D specifically on datasets of animals and everyday objects, the model architecture could be adapted to train on various other data types. For example, a version trained on 3D plants could aid landscape designers in quickly populating garden renderings, while one trained on household objects could generate items for 3D home simulations used in training personal assistant robots.

LATTE3D's training involved NVIDIA A100 Tensor Core GPUs and diverse text prompts generated using ChatGPT. This approach improved the model's ability to handle the various ways a user might describe a 3D object.

Further research detailed in the paper showcases additional benefits, such as enhancing robustness through 3D priors, shape regularization, and model initialization. The two-stage pipeline, involving volumetric and surface-based rendering, allows for fast generation of detailed textured meshes.

With LATTE3D, NVIDIA is pushing the boundaries of generative AI, making it faster and more accessible for creators across industries to bring their ideas to life in 3D. As the technology continues to evolve, we can expect to see even more innovative applications and use cases emerge.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.

An Exclusive Leadership Retreat

Leading in the Intelligence Age

NVIDIA's LATTE3D Can Generate 3D from Text Prompts in Seconds