OpenAI

OpenAI Introduces 'Flex Processing': Lower Prices for Non-Urgent AI Workloads

April 17, 2025 • 1 min read

OpenAI is rolling out a new way to save on API usage—if you’re willing to wait. The company has launched Flex processing, a new API service tier that slashes costs for AI model usage by offering slower response times and “occasional resource unavailability.”

Key Points:

Flex processing trades response time for lower per-token costs
Available for o3 and o4-mini models, ideal for non-urgent workloads
Users in lower spend tiers must verify identity to access o3 and key features
Streaming and reasoning summaries now gated behind verification

Targeted at background tasks like model evaluations, async workflows, and data enrichment, Flex is clearly not meant for real-time applications—but it may be a budget-saver for developers working at scale. It is available now in beta for the o3 and o4-mini models, and the pricing mirrors the discounted token rates of OpenAI’s Batch API.

To use it, developers simply set a service_tier="flex" parameter in their API calls. But there’s a tradeoff: Flex jobs are more likely to timeout (with a default SDK timeout of 10 minutes) or return 429 errors due to resource constraints. OpenAI recommends increasing the timeout to up to 15 minutes and using retry strategies like exponential backoff—or falling back to the standard tier if timely responses matter.

Flex isn’t the only update: OpenAI is also tightening access to some of its most capable offerings. While o4-mini is available to customers in spend tiers 1 through 5, the more powerful o3 model is now restricted to those in tiers 4 and 5—unless developers in lower tiers complete a new ID verification process. The same verification is required to unlock reasoning summaries and streaming support, two features important for building more interactive or explainable AI applications.

This pricing and access shift reflects OpenAI’s evolving business strategy. As competition heats up with Google, Anthropic, and others, the company is refining its offerings to cater to both high-value enterprise users and cost-sensitive developers—while establishing tighter controls over its most advanced systems.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.

An Exclusive Leadership Retreat

Leading in the Intelligence Age

OpenAI Introduces 'Flex Processing': Lower Prices for Non-Urgent AI Workloads

Key Points: