In a strategic move to stay competitive in the rapidly evolving AI industry, OpenAI has unveiled a new API option called Flex processing. This offering provides a significantly cheaper alternative for developers who can trade off speed and guaranteed availability for lower prices — a particularly useful option for non-critical tasks.
What Is Flex Processing?
Flex processing is currently available in beta for OpenAI’s newest reasoning models, o3 and o4-mini. It’s designed for lower-priority and asynchronous workloads — such as data enrichment, model evaluation, and other non-production applications — where latency and uptime aren’t mission-critical.
According to OpenAI, Flex processing:
- Cuts API usage costs by 50%
- Offers variable performance (i.e., slower response times)
- May experience intermittent unavailability
This move helps OpenAI address growing concerns over the cost of advanced AI models, which have increased significantly as compute demands have surged.

Pricing Breakdown
Here’s how Flex pricing compares with standard rates:
Model | Standard Input | Flex Input | Standard Output | Flex Output |
---|---|---|---|---|
o3 | $10/M tokens | $5/M tokens | $40/M tokens | $20/M tokens |
o4-mini | $1.10/M tokens | $0.55/M tokens | $4.40/M tokens | $2.20/M tokens |
To put that in perspective, 1 million tokens equates to approximately 750,000 words — making these savings meaningful for developers running large-scale evaluations or pipelines.
A Response to Market Pressure
The release of Flex processing coincides with competitive moves from rival tech giants. On the same day, Google launched Gemini 2.5 Flash, a budget-friendly reasoning model said to rival DeepSeek’s R1 in both performance and price.
With AI costs under scrutiny and many organizations looking to scale affordably, OpenAI’s Flex aims to capture use cases that don’t require real-time performance but still demand the sophistication of frontier models.
Verification Now Required
Alongside Flex, OpenAI also revealed updates around access control for its most advanced models. In an email to developers, the company stated that:
- Users in tiers 1–3 (based on spend levels) must now complete ID verification to access models like o3.
- Reasoning summaries and streaming API features are also locked behind verification.
The move appears to be part of OpenAI’s broader strategy to reinforce responsible usage. According to the company, these checks are designed to prevent abuse and ensure compliance with its safety protocols.
“As AI becomes more powerful, we want to make sure it’s being used by verified users who follow our safety standards,” OpenAI said in its announcement.
A Balancing Act Between Power and Price
Flex processing is the latest example of OpenAI trying to walk the line between innovation and accessibility. While high-performance models remain expensive, Flex offers a middle ground — delivering advanced capabilities at a lower cost for users who don’t mind a bit of latency.
Whether this will attract more developers or shift industry dynamics remains to be seen. But as AI deployment continues to scale, options like Flex could become a vital tool for startups, researchers, and enterprises alike.
Also Read : OpenAI in Talks to Acquire Windsurf for $3 Billion: Report