Apple

Apple Intelligence Explained: The Cutting-Edge AI Technology Behind the Magic

June 11, 2024 • 4 min read

At its annual Worldwide Developers Conference (WWDC), Apple unveiled Apple Intelligence, its branded 'personal intelligence system' that will be deeply integrated into its platforms.

Apple Intelligence is built on a family of generative models created by Apple, including on-device and server foundation models, a diffusion model for image generation, and a coding model. Additionally, Apple Intelligence can tap into third-party models like ChatGPT if needed for more complex requests.

Here's a quick overview of what we know so far about the two foundation models and how Apple Intelligence works:

On-Device Model

Size: ~3 billion parameters
Vocab size: 49K
Optimization: Employs low-bit quantization and grouped-query-attention for speed and efficiency.
Performance: Achieves a time-to-first-token latency of 0.6 milliseconds per prompt token and a generation rate of 30 tokens per second on iPhone 15 Pro.

Server-Based Model:

Vocab size: 100K
Capabilities: Handles more complex tasks using Private Cloud Compute, ensuring privacy and security.
Optimization: Uses advanced techniques like speculative decoding and context pruning to enhance performance.
Security: Built on a hardened subset of iOS foundations, ensuring user data privacy with robust encryption and secure boot processes.

Apple’s models are trained on carefully curated datasets that exclude any personal user data. The training data includes a mix of licensed data, publicly available data collected by AppleBot, and synthetic data. Post-training, Apple utilizes novel algorithms like rejection sampling fine-tuning and reinforcement learning from human feedback to improve the models' instruction-following capabilities. Apple stresses that it does not use users' private personal data or interactions when training the foundation models.

However, the real magic happens during the optimization phase. Apple has implemented a suite of cutting-edge techniques to ensure optimal performance and efficiency on mobile devices. By employing methods such as grouped-query-attention, shared embedding tables, low-bit palletization, and efficient key-value cache updates, Apple has managed to create highly compressed models that maintain quality while meeting the memory, power, and performance constraints of mobile devices.

Illustration of the Apple Intelligence ecosystem

Unlike general-purpose models like Google's Gemini Nano and Microsoft's Phi, Apple's models are fine-tuned for the everyday activities that users need on their devices, like summarization, mail replies, and proofreading. It does this by using a technique called “Low-Rank Adaptation” (LoRA)—small neural network modules plugged into various layers of the pre-trained model. This allows the models to adapt to different tasks while preserving their general knowledge. Importantly, these adapters can be dynamically loaded and swapped, allowing the foundation model to specialize on the fly for the task at hand.

0:00

/0:10

This makes things a bit tricky when trying to compare Apple's models to the rest of the field. While the company has shared performance evaluations that they conducted on both feature-specific adapters and foundation models, they have been decidely deliberate in how they benchmark their models.

Fraction of preferred responses in side-by-side evaluation of Apple's foundation model against comparable models.

Unsurprisingly, Apple compared their models with open-source and commercial competitors and found that their models were preferred by human graders for safety and helpfulness. They say they prioritize human evaluation, as these results correlate highly with user experience.

Writing ability on internal summarization and composition benchmarks (higher is better).

Apple says its models also demonstrated robust performance when faced with adversarial prompts, achieving lower violation rates for harmful content, sensitive topics, and factuality.

Overall, what we can say, is that Apple's on-device model appears roughly on par with other small language models, while the server model is about a GPT-3.5 class. For the limited use-cases that Apple Intelligence is focusing on, this level of capability will likely be more than enough. This is likely the reason why, despite the impressive evaluation numbers that they have shared, Apple has also opted to partner with OpenAI to handle more complex requests.

Apple has gone to great lengths to ensure that their AI models not only perform well but also run efficiently on mobile devices like the iPhone 15 Pro. And while, their approach to training and optimizing foundation models isn't revolutionary, by focusing on efficiency, performance, and scalability, they seem to have achieved remarkable results in delivering powerful and personalized AI experiences.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.

An Exclusive Leadership Retreat

Leading in the Intelligence Age

Apple Intelligence Explained: The Cutting-Edge AI Technology Behind the Magic

On-Device Model

Server-Based Model: