Skip to main content

OpenAI doubled prices while Nvidia cut inference 35x

GPT-5 launched at $1.25 per million input tokens. GPT-5.5 costs $5.00 today. 4x cumulative in 8 months while Blackwell Ultra cut inference 35x.

OpenAI doubled prices while Nvidia cut inference 35x

Ricardo Argüello

Ricardo Argüello
Ricardo Argüello

CEO & Founder

Business Strategy 14 min read

Two numbers tell the story.

GPT-5 launched in August 2025 at $1.25 per million input tokens. GPT-5.5 launched today at $5.00 per million input tokens and $30 output. Exactly double GPT-5.4 from seven weeks ago, and 4x the launch price of GPT-5 from eight months ago. Andrew Curran confirmed the numbers within minutes of the announcement: “$5 per 1M input tokens and $30 per 1M output tokens, with a 1M context window.” The kind of pricing curve enterprise software last produced when Oracle and Microsoft owned the database and operating system markets, before either had real competition.

Software prices fall between versions. That is the rule, not the exception. The first Photoshop cost $895 in 1990; today’s Photoshop is $35 a month. The first iPhone cost $599 in 2007 on EDGE; the iPhone 17 costs $799 with 5G and three cameras. New versions do more and cost less per unit of capability. That ratio between capability and price is what made software an industry.

GPT-5.5 just broke it out loud. And the data point that finishes the surprise comes from Nvidia. Their recent SemiAnalysis collaboration documented that Blackwell Ultra cuts cost per million tokens of inference 35x versus Hopper, from $4.20 down to $0.12. Provider cost is collapsing. Developer cost is multiplying. The gap between those two curves is what finance calls margin expansion, and what every customer should call a strategic decision someone made about how much to extract from them.

Eight hours before the GPT-5.5 launch, on the same morning, Anthropic priced agent runtime at $0.08 per session-hour. By 1pm Pacific, OpenAI had set GPT-5.5 at $5 per million tokens. In a single morning, two frontier labs took opposite positions on how to monetize intelligence. That divergence is not minor. It is the first time the AI market has shown two business models running publicly in parallel, and the choice your company makes between them stops being a technical decision.

Price doubled. Inference fell 35x.

The pricing line told as a timeline lands harder:

  • August 2025: GPT-5 launches at $1.25 / $10 per million tokens (input / output)
  • March 2026: GPT-5.4 hits $2.50 / $15 (2x the original)
  • April 2026: GPT-5.5 hits $5.00 / $30 (2x of 5.4, 4x of the original launch)

4x cumulative in 8 months. Nvidia’s recent SemiAnalysis brief put inference cost reduction at 35x per million tokens between Hopper and Blackwell Ultra, dropping the unit from $4.20 to $0.12. The curve is real. It is documented in every benchmark from Artificial Analysis to ARC-AGI, and it is reflected in the wholesale prices Nvidia charges hyperscalers for compute capacity.

So where does the math close? Two readings, same outcome.

Either OpenAI watched its unit cost fall and chose not to pass any of it through to the customer, taking the difference as margin. Or OpenAI needs the margin because it is committed to an infrastructure capex that no API cash flow can sustain (Microsoft Azure billed OpenAI more than $13B in compute during 2025 alone) and chose to monetize the captive developer building on top.

Both readings produce the same conclusion for your enterprise. The price you pay is no longer a function of what the provider’s cost is. It is a function of what the provider believes it can extract from you. That changes how you build the business case, and more importantly, how you build the architecture.

OpenAI defends the hike with an efficiency argument: “GPT-5.5 uses significantly fewer tokens to complete the same Codex task.” The Decoder caught it in their coverage: the token savings cushion part of the hike if your workload is agentic-coding intensive. Cushion is not offset, and the cushion depends on your exact task. If your workload is long-form text generation, the doubled price lands in full.

This is not greed. It is the Microsoft playbook.

I have been watching this play out since 1990. I started in computing at 15, when Microsoft had just packaged Word, Excel, and PowerPoint as Office and was selling the bundle as a near-giveaway upgrade. Ten years later it had tripled the price. I watched Oracle do the same with databases through the 2000s. I watched SAP do it with ERP. I watched Adobe do it when Creative Suite moved to Cloud and the perpetual license died. Thirty-six years later, the pattern is identical. The only thing that changed is that the cycle compressed from five years to seven weeks, and the capture base shifted from mainframes to inference tokens.

Anyone who has spent five years in enterprise software recognizes the pattern. Microsoft perfected it between 1995 and 2010: ship the platform cheap, capture the developer base, then raise prices every cycle the competitive position allows. Office 365 went from $69.99 to $99.99 a year in 2024. SQL Server core licensing nearly tripled between 2012 and 2022. Azure repricing happens like clockwork every twelve months. The curve goes up, the customers are already inside, the customers do not move.

OpenAI is running the same playbook faster and at larger scale. The public numbers explain why the position allows it:

  • 900 million weekly active users on ChatGPT
  • 50 million subscribers paying $20 a month — close to $12B ARR from consumers alone
  • 9 million paying business customers
  • Microsoft, OpenAI’s largest distributor, generated more than $10B reselling OpenAI as Copilot

Those numbers give OpenAI what no AI competitor has: a consumer-driven revenue floor that does not depend on the API. The API is now a secondary monetization channel covering developers and enterprises that already chose to build on top. As a secondary channel, it can absorb price hikes without endangering the main consumer revenue line. That is the textbook condition under which a business raises prices.

The piece that completes the Microsoft analogy is the “superapp.” CNBC confirmed on March 19 the official plan: ChatGPT, Codex, and the Atlas browser merged into a single desktop superapp under Fidji Simo (CEO of Applications) with Greg Brockman backing it. The internal motivation, per the leaked memo: OpenAI watched Anthropic win one in four enterprise deals it used to own, and reacted by consolidating its own surface area. TechCrunch covered today’s launch with a headline that needs no interpretation: “OpenAI releases GPT-5.5, bringing company one step closer to an AI ‘super app’.”

The categories the eventual superapp is meant to absorb are the ones today described as “built on OpenAI.” Cursor, Lovable, Glean, Notion AI, the LLM-powered scraping tools, the third-party Codex variants, the AI browser extensions. All built on OpenAI’s API. All on the consolidation list.

The developer who pays $5 per million tokens today is, in plain terms, funding OpenAI to build the product that will eventually replace them. That sentence reads as melodrama until you do the math. Every query your agent sends to GPT-5.5 funds the next model improvement, which gets packaged into the next consumer product, which competes directly with your agent.

Why this price does not stay on your API bill

A non-technical CTO in Latin America might think “I do not buy tokens directly from OpenAI, this does not affect me.” That is the naive reading and the one that gets expensive next quarter. The GPT-5.5 price hike is going to land on your monthly invoice through four separate channels in the next 90 days, and only the first one is obvious:

  • Agents already built on GPT-5 that need migration. If your integrator or internal team built automations on GPT-5 six months ago at $1.25/M tokens, those agents are running on the older model. OpenAI offers support for a few more months, then deprecates. Migration to GPT-5.5 means rewriting prompts (every version changes token patterns and behavior) and paying up to 4x. If your integrator has no written migration plan before May, your operating cost just went up silently for the next quarter.
  • OpenAI embedded in your SaaS vendors. Your CRM, your help desk, your marketing automation platform, and your HR tooling almost certainly procure OpenAI through Azure OpenAI Service. The GPT-5.5 upgrade arrives as a “capability improvement” in the next product update and the vendor reprices its plan. The May conversation is “the AI module went from $X to $Y because we moved to the more capable model.” The real reason sits in OpenAI’s pricing page; the vendor’s note just relays it.
  • Developer tool subscriptions. Cursor, GitHub Copilot, Codex, ChatGPT Enterprise, Codeium. Every tool your engineering team relies on. ChatGPT Enterprise already costs $60 per seat per month; the next pricing cycle moves it up. Cursor went from $20 to $25 between April and August last year. That chain does not end in May.
  • “Pricing at vendor’s discretion” clauses you already signed. Re-read the TOS of every AI tool your company contracted in 2025. Most include a section reserving the right to adjust pricing on 30 days’ notice. You signed that clause when GPT-5 cost $1.25 per million tokens. It now applies to $5.00. If your annual fee scales with usage, the adjustment lands in the next billing cycle. If it scales with seats, it lands at renewal.

None of these four paths require your company to even know the difference between GPT-5 and GPT-5.5. The cost arrives regardless. The difference between the companies that absorb it without discussion and the companies that argue it before the bill arrives is having read the contracts before the hike.

What the people who already saw this say

The most honest reaction in the first hours came from Theo Browne (T3.gg), a developer with serious audience and live agents on GPT-5 in production. His first post after the announcement: “$5 per mil in, $30 per mil out. GPT-5.5 is smart. I’ve been using it for a bit. It’s also weird, hard to wrangle, and too expensive IMO. Double the price of GPT-5.4. 20% more expensive than Opus 4.7.” That is the read from inside someone with no consultancy to sell and no candle to light for any lab: the model is better, but the price is not justified against what already exists.

The operational line that most companies have not yet had to write down came from Chen Avnery, who works on agent governance, in a reply to Aakash Gupta’s thread: “We built every governance layer as model-agnostic plain text. Twelve agents, switched inference providers twice this quarter. Zero constraint files touched. If your agent stack is coupled to one model, you do not have a stack. You have a dependency.”

That sentence is the operational counter-argument most companies have not made yet. The difference between “I use GPT-5.5” and “I use an agent that can run on GPT-5.5, Claude Opus 4.7, or Kimi K2.6” is the only difference that survives a doubled price.

The comment that lands closest to what most CTOs should be saying to themselves this week dropped on a different post entirely. Atif Manzoor commented on Ben Valentin’s Kimi K2.6 post: “The shift from model superiority to agent effectiveness is the real story. When intelligence becomes a commodity, execution becomes the only moat that matters.” That comment showed up on a thread about an open-source model and applies cleanly to today’s OpenAI announcement. That is more telling than any opinion column.

Three conversations to have this week

There are three conversations worth having before the end of April.

The first is internal. Build a dependency map: what percentage of your AI workflows runs exclusively on OpenAI? The exact question is not “what model do we use.” It is “if OpenAI doubles the price again in July (which the seven-week cycle now permits), how many workflows do we have to rewrite from scratch?” If the answer is above 30%, you have a vendor concentration risk your CFO probably does not know exists. That belongs on the audit committee agenda, not just on the engineering backlog.

The second is with your SaaS vendors. The letter worth writing before month-end is the direct question to the account manager: “Which model underlies the AI module we are paying for? What is your pass-through policy when the model provider raises prices?” A vendor that cannot answer in concrete terms is a vendor without governance over their own dependency. Your cost will rise when theirs does, with no useful notice. The answer also tells you how honest the roadmap is.

The third is with whoever owns agent design inside your company. If you built with prompts going directly to OpenAI without an abstraction layer, today is the day to add one. Not for fashion, for arithmetic. The only way today’s doubling and the next one do not land on you in full is for your agents to run on an interface that can swap the underlying model without rewriting the workflow. That is not over-engineering; that is the difference between absorbing the hike silently and pushing back with real ammunition.

What we do at IQ Source about vendor concentration

This morning we published “The runtime is now a commodity. The moat is the workflow.” on the launch of Anthropic Managed Agents at $0.08 per hour. The two stories of the day read against each other. Anthropic pricing runtime cheap and OpenAI doubling the model price are two opposite ways to monetize the same intelligence. Choosing between them stops being a technical question.

In AI Operations work, the first workshop question already changed. It used to be “which workflow do you want to automate?” It is now “which workflows do you want to automate, and what level of single-vendor exposure are you willing to accept?” The answer changes the architecture. If the company can tolerate hard coupling to OpenAI, the design is one thing. If not, the design has to include a provider abstraction layer from day one.

In Tech Partner engagements, the difference is more visible. We build agents on an interchangeable enablement layer: the workflow lives on top, the model gets injected underneath. Switching from Claude Opus to GPT-5.5 to Kimi K2.6 means changing an environment variable, not rewriting prompts. That property costs roughly 15 to 20% more upfront work. This week it returns the doubled price of the underlying model.

What we do not do: recommend reactive provider switching. The right response to OpenAI’s hike is not “move to Anthropic.” It is “design so that the question of which provider you use becomes trivial.” A company that arrives at the next Claude price hike with the same exposure it had to GPT-5.5 ends up in the same uncomfortable conversation in six months, just with a different vendor name on the invoice.

This is not about OpenAI

The easy frame this week is going to be “OpenAI is a greedy vendor.” That frame feels good and changes nothing about your invoice next month. The right frame is different: vendor concentration in AI just moved from technical risk to P&L risk.

Twelve months ago the question was “which model gives the best results?” That had a technical answer and got refreshed each release cycle. Today the question is “what is my exposure to a pricing decision made by a vendor over which I have no influence?” That has a financial answer and gets refreshed every time the vendor raises prices.

Today’s doubling is not the last one. The Microsoft playbook has thirty years of history of raising prices whenever the position allows it. OpenAI has 900 million users and a superapp under construction. The position allows many more hikes, and the seven-week release cycle means the next one can land by mid-June.

The company that enters 2027 with high concentration on a single AI provider will have a difficult conversation with its CFO. The one that enters with model-agnostic architecture will have less drama. The difference between them is not budget. It is reading the contracts this week and adding an abstraction layer that costs little before it costs a lot.

Frequently Asked Questions

OpenAI GPT-5.5 pricing power vendor lock-in Microsoft playbook moat vendor concentration

Related Articles

The runtime is a commodity now. The moat is the workflow.
Business Strategy
· 8 min read

The runtime is a commodity now. The moat is the workflow.

Anthropic prices agent runtime at $0.08/hour and wipes out a cohort of infra startups. McKinsey: 80% of firms still see no AI impact on earnings.

Anthropic Managed Agents McKinsey
Meta records employees to train their replacements
Business Strategy
· 13 min read

Meta records employees to train their replacements

On April 21 Meta installed MCI to record US employees' mouse and keyboard. On May 20 it plans to cut 8,000 jobs. The capture clause is coming to your SaaS contracts.

Meta workplace surveillance Model Capability Initiative