Google Unveils Gemini 2.0 Flash Thinking

By Chris McKay December 19, 2024 • 2 min read

Google has announced Gemini 2.0 Flash Thinking, an experimental AI model that uses test-time compute to "reason" and solve harder problems. Unlike OpenAI's o1, Google makes the internal reasoning process fully visible to users in real-time, offering a window into how the model arrives at its responses.

Key Points

This is the first model to bring test-time compute to Google's Gemini 2.0 Flash architecture
It is available through Google AI Studio and the Gemini API
The model has a 32k token input limit and only outputs text

Unlike traditional language models, Gemini 2.0 Flash Thinking pauses during computations to reason—considering related prompts and explaining its thinking before offering a solution. The release represents Google's entry into the growing field of "reasoning" AI models.

Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts.

Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning.

And we see promising results when we increase inference time…
— Jeff Dean (@JeffDean) December 19, 2024

Jeff Dean, chief scientist at Google DeepMind, explained on social media that the model is "trained to use thoughts to strengthen its reasoning," suggesting a deliberate approach to making AI decision-making more transparent and potentially more reliable.

In demos shared by Google, the model handles both visual and text-only challenges, offering robust insights into problems that range from programming puzzles to physics equations.

Just when you thought it was over... we’re introducing Gemini 2.0 Flash Thinking, a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts.

The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more 🧵
— Logan Kilpatrick (@OfficialLoganK) December 19, 2024

Being able to see the reasoning traces of the model is a big deal. As noted by Andrej Karpathy (a founding member of OpenAI) on X, this brings significant value to both users and developers. This transparency not only enhances trust but also provides an educational component, allowing users to learn from the model's logical process and iterative thought. For developers, it opens avenues to analyze and improve the model’s decision-making, making it a more collaborative and insightful tool.

Still, Flash Thinking is very much an experimental model: it has a 32K token input limit, can only handle text and image inputs, and produces text-only outputs. Additionally, many of the built-in tools available in other models like search or code execution are not available.

If you want to explore Gemini 2.0 Flash Thinking for yourself, you have two ways to dive in. You can head to Google AI Studio, and simply select the Gemini 2.0 Flash Thinking Experimental model in the model drop-down menu in the Settings pane. There's a dedicated "Thoughts" panel that shows you exactly how the model reasons through problems. Or, if you prefer to work with code, you can access it through the Gemini API. When using the API, you'll find the model's thoughts as the first element in your response content – just specify either gemini-2.0-flash-thinking-exp or gemini-2.0-flash-thinking-exp-1219 as your model code.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.