Meta Unveils Next-Generation AI Inference Accelerator: MTIA v2

Meta Unveils Next-Generation AI Inference Accelerator: MTIA v2

Meta has revealed details about the next generation of its Meta Training and Inference Accelerator (MTIA), a family of custom-made chips designed to optimize the company's AI workloads. This latest version showcases significant performance improvements over its predecessor, MTIA v1, and plays a crucial role in powering Meta's ranking and recommendation models for ads.

The new MTIA chip is part of Meta's growing investment in AI infrastructure, aiming to complement existing and future AI systems to deliver improved user experiences across its products and services. As the compute requirements for AI models continue to increase alongside their sophistication, Meta recognizes the importance of developing efficient and scalable solutions to support generative AI (GenAI) products, recommendation systems, and advanced AI research.

Under the hood, the new MTIA chip features an 8x8 grid of processing elements (PEs) that provide a substantial boost in dense compute performance (3.5x over MTIA v1) and sparse compute performance (7x improvement). The chip's architecture focuses on achieving the optimal balance of compute, memory bandwidth, and memory capacity for serving ranking and recommendation models efficiently, even with relatively low batch sizes.

Meta has developed a large, rack-based system that can accommodate up to 72 accelerators to support the next-generation silicon. The system is designed to clock the chip at 1.35GHz (up from 800 MHz) and run at 90 watts, ensuring denser capabilities with higher compute, memory bandwidth, and memory capacity compared to the first-generation design.

Software has been a key area of focus for Meta since the inception of its investment in MTIA. The MTIA stack is designed to fully integrate with PyTorch 2.0 and features like TorchDynamo and TorchInductor. Meta has also optimized the software stack by creating the Triton-MTIA compiler backend, which generates high-performance code for the MTIA hardware and improves developer productivity.

Early results show that the next-generation MTIA silicon has improved performance by 3x over the first-generation chip across four key models evaluated. At the platform level, with double the number of devices and a powerful 2-socket CPU, Meta has achieved a 6x model serving throughput and a 1.5x performance per watt improvement over the first-generation MTIA system.

MTIA has already been deployed in Meta's data centers and is actively serving models in production. The chip is proving to be highly complementary to commercially available GPUs in delivering the optimal mix of performance and efficiency on Meta-specific workloads. As part of Meta's long-term roadmap, MTIA will continue to evolve and scale to support the company's ambitious AI goals, including support for GenAI workloads and investments in memory bandwidth, networking, and capacity.

Let’s stay in touch. Get the latest AI news from Maginative in your inbox.

Subscribe