Google Cloud has announced a new partnership with European AI startup Mistral AI to bring their cutting-edge generative model to Google's Vertex AI platform.
Mistral's open-source foundation model, Mistral-7B, skyrocketed to popularity when released last month. Despite having just 7 billion parameters, it outperformed Meta's 13 billion parameter Llama 2 model on key benchmarks. Mistral-7B even approached the scores of Meta's massive 70 billion parameter Llama 2 model. Notably, Mistral-7B further distinguished itself by being launched as an uncensored base model, a move that was both audacious and game-changing.
Mistral-7B, is now natively available in the Vertex AI Notebooks. This integration provides seamless access to test, fine-tune, and deploy Mistral-7B on Google's managed AI service.
Incorporating techniques like Grouped-Query Attention (GQA) and Sliding Window Attention (SWA), Mistral's model stands out by achieving a delicate balance between rapid model inference and accuracy. Additionally, the SWA methodology facilitates the model in managing longer sequences without escalating costs, augmenting the accuracy of the ensuing large language model (LLM).
The integration supports Google Cloud's commitment to open-source AI development. Bringing Mistral into Vertex AI furthers the goal of enabling developers and researchers to build upon and contribute to improving AI models.
Users can deploy Mistral-7B leveraging vLLM, Google's optimized serving framework for large language models. Accelerated inference is possible through Google's extensive options for AI accelerators. Model workflows are streamlined through Vertex AI Model Registry. This central repository allows tracking model versions and direct deployment to endpoints.
Open-source large language models like Llama 2 and Mistral-7B on Vertex AI offer businesses accessible paths to launch AI services.