At Microsoft Build 2024, the tech giant announced exciting additions to its Phi-3 family of small, open models. The standout reveal was Phi-3-vision, a multimodal model that combines language and vision capabilities. This 4.2B parameter model can generate insights from charts and diagrams, making it a powerful tool for a wide range of applications.
Takeaway
Phi-3-vision
: A multimodal model that brings together language and vision capabilities, enabling it to understand and generate insights from text and images, including charts and diagrams.Phi-3-small
andPhi-3-medium
: These models, previously announced, are now available on Microsoft Azure, providing developers with powerful tools for building generative AI applications.Phi-3-mini
: The first model in the Phi-3 family, which is now also available through Azure AI's models as a service offering, making it easier for users to get started.
The Phi-3-vision model is designed to handle tasks such as optical character recognition (OCR), chart analysis, and diagram understanding. It is built to process and reason over real-world images, making it an invaluable tool for developers working with visual data.
The Phi-3 models offer significant performance and cost advantages over larger language models. Phi-3-small, for example, outperforms models twice its size, including GPT-3.5 Plus, despite having just 7 billion parameters. Phi-3-vision continues this trend, outperforming larger models such as Claude-3 Haiku and Gemini 1.0 Pro V in visual reasoning tasks.
The compact size of the Phi-3 models enables on-device deployment, making them ideal for low-latency AI experiences without the need for network connectivity. They are also more cost-effective, with Phi-3 being "dramatically cheaper" according to Microsoft's VP of GenAI research, Sébastien Bubeck.
With the evolving landscape of available models, choosing the right one depends on the specific use case and business needs. With the expansion of the Phi-3 family, Microsoft is providing developers with a versatile set of tools for building generative AI applications. The performance, cost-effectiveness, and versatility of the Phi-3 models make them a compelling choice for a wide range of use cases, demonstrating the potential of small language models in the AI landscape.