Opus 4.7 Just Raised Its Price 27% Without Changing the Price. GPT-5.5 Doubled Its Rate in the Open. I Run Six Routines on a $20/Day Cap — Here's the Harness That Ate Both.

In one week, three vendors raised the real cost of AI through three different mechanisms: OpenAI doubled GPT-5.5's token rate openly (+49–92% real cost), Anthropic shipped Opus 4.7's tokenizer change invisibly (+12–27%), and GitHub flips Copilot to token-metered billing on June 1. This is the operator's read: which mechanism hits your stack, the exact cache-block + model-router + spend-cap architecture that absorbs it, and the real per-call math from a six-routine production pipeline running on a hard $20/day cap.

Anthropic shipped Opus 4.7 around May 14 with the list price untouched – $5/M in, $25/M out, same as 4.6 – and a new tokenizer that emits 32–45% more native tokens for the same text. That is a 12–27% production price increase that never appeared on a pricing page. GPT-5.5 doubled its rate in the open the same week. My six routines barely moved, and the reason is architecture, not luck.

Three vendors, three mechanisms, one direction

In one week, three vendors raised the real cost of AI through three different levers, and the pricing pages tell you about exactly one of them.

OpenAI did it in the open. GPT-5.5 input went from $2.50 to $5.00 per million tokens, output from $15 to $30 per million – a clean 2x on both sides. OpenRouter's switcher cohort, the users who flipped from 5.4 to 5.5 on the same workloads, measured a real-cost lift of +49–92% depending on prompt-length distribution. That is the honest version of a price hike: a number changed, you can argue with procurement about it, you can route around it.

Anthropic did it invisibly. Opus 4.7's list price is identical to 4.6 – the pricing page didn't move a pixel. What moved was the tokenizer. The new model emits 32–45% more native tokens for the same input text, which Anthropic disclosed as a 1.0–1.35x inflation range and OpenRouter measured at ~45% under 2K tokens and ~32–34% at production scale above 10K. Net real cost: +12–27% on everything above 2K, slightly cheaper below. No announcement to react to, no churn event, no procurement trigger – the next monthly bill is just higher.

GitHub did it structurally. On June 1, Copilot's premium-request units become token-metered AI Credits at published API rates, cached tokens included. Base plan prices are unchanged. Agentic users on annual plans, the ones running long-context refactor loops, get hit hardest as model multipliers stack.

The point isn't which vendor is the villain. The point is that the pricing page stopped being a proxy for the bill. List price is now necessary, not sufficient – and any cost model that doesn't sample your own native-token output is fiction.

The tokenizer change is the one that should scare you

Of the three mechanisms this week, the Anthropic move is the one that quietly resets how operators have to think about AI cost.

The number itself is well corroborated now. OpenRouter's switcher analysis measured 32–45% more native tokens on real production traffic. Simon Willison ran the same prompts through Anthropic's tokenizer directly and got a ~1.46x inflation on system prompts. Anthropic's own 4.7 release note discloses a 1.0–1.35x range. Three independent measurements, one direction. This is real, not a sampling artifact, not a benchmark quirk.

What makes it operationally dangerous is the delivery vector. An explicit price hike is a calendar event – legal looks at the contract, finance reforecasts, engineering benches an alternative. A tokenizer change is a retroactive hike on everything you already shipped. Every prompt in your codebase, every system message you tuned over the last six months, every cached context block – all of it just got 32–45% heavier the moment your client hit the new model endpoint. There is no line-item to point a CFO at. The number on the invoice is just bigger than it was.

For a vendor, this is a much better instrument than an announced raise. No churn event. No competitive comparison page to lose. No "Anthropic raises prices 27%" headline cycle. It looks like a model upgrade. It is a model upgrade. The economics shift quietly underneath.

For an operator, it changes which lever matters. Negotiating list price was already a poor use of time. Now it is irrelevant – the rate card is no longer where pricing lives.

The harness that ate it – my actual architecture

I run six publisher routines through a single chokepoint: brief generation, drafting, voice-aware editing, fact-check, image direction, and a final QA pass. Two of them hit Opus on the upgrade day. The hard $20/day cap held. The $1 per-instance cap held. The per-article cost moved by single digits.

That isn't luck. Those are four levers I built before this week's news, because I covered the Agent SDK's separate-meter shift and the Claude Code weekly-limit bump and concluded that vendor-side cost volatility was now the default, not the exception. Different mechanisms, same harness.

Lever 1 – a stable cached prefix. Each routine prepends a markdown contract (~3500 tokens) and a category voice block (~1500 tokens) as a 1-hour ephemeral cached prefix. The 90% cache discount means the tokenizer inflation lands almost entirely on the small dynamic tail. OpenRouter's breakdown: cache absorbs 9% of the extra tokens at 10–25K prompts, 77% at 50–128K, 93% above 128K. My drafting routine sits at ~14K, so the cache eats about 9% of the inflation, but the cached portion is 5K out of 14K of stable text that would otherwise have inflated at full rate. That structural decision is doing 70% of the work.

Lever 2 – model routing. Drafting runs on Sonnet. Brief synthesis and category tagging run on Haiku. Opus is reserved for the rare deep-research pass. One vendor's price move can't move the whole bill, because the frontier model is not the default. The router is also the option-value layer: when GPT-5.5 doubled, I could shift any one routine to a competitor without rewriting the harness.

Lever 3 – prompt-length discipline. The 2K–10K bucket is the most punished band on Opus 4.7 (+27–69% real cost on switcher data). That happens to be where lazy prompt engineering lives: bloated few-shot examples, redundant role definitions, unpurged stale context. Compressing dead context isn't hygiene anymore – it is direct economic return.

Lever 4 – one spend chokepoint. Every paid model call routes through one function. Preflight check: is today's spend below $20? Post-flight check: did this instance cost under $1? If either fails, the routine halts and pages me. A vendor doubling token counts overnight changes my per-article cost by single digits, not 27%, because the cap is the backstop. The cap doesn't make the model cheaper; it makes the worst case bounded.

Honest part: the cached prefix is the load-bearing lever. The other three are insurance. If I had been doing ad hoc per-request prompt assembly the way most teams still do, the full 32–45% would have landed on the bill and I would have noticed in week two, not week zero.

The move this week

If you're on Opus 4.7, stop reading the pricing page and start reading your own token counts. Sample 100 production calls, run them through Anthropic's token-count endpoint on 4.6 and 4.7, and measure the actual delta on your prompts. Then restructure into a stable cached prefix this week – not next sprint, not next planning cycle. The cache is the only lever that scales with prompt length, and prompt length is exactly where the inflation is worst.

If you're on GPT-5.5, do the switcher math before migrating off 5.4. The doubled rate is partially offset by a –19 to –34% reduction in completion length above 10K tokens, so the real cost depends entirely on your output-to-input ratio. For a long-context refactor workload it can come out roughly flat. For a chat-style product it doesn't.

If you're on Copilot, pull the early-May usage preview before June 1. Base-plan users at single-prompt cadence won't notice.

The bigger frame: AI is now the most volatile line in the tech budget. AWS doesn't double overnight. CDN bandwidth doesn't silently 1.4x its meter. Postgres doesn't change its tokenizer. The AI layer just did all three in a week. The harness – cache structure, model router, length discipline, hard spend cap – is no longer optional architecture for cost-sensitive teams. It is the price of running this category of infrastructure in production.

The vendors will keep doing this. Build the chokepoint, or absorb every decision they make for the next twelve months.

Last Updated

May 16, 2026

CategoryAI

More from AI

$Claude Agent SDK Goes On A Separate Meter June 15. The $200 Monthly Credit Won't Cover A Day Of My Publisher Routines. Here's The Token Math.$

Claude Agent SDK Goes On A Separate Meter June 15. The $200 Monthly Credit Won't Cover A Day Of My Publisher Routines. Here's The Token Math.

On May 13 Anthropic announced that starting June 15, 2026, programmatic Claude usage (Agent SDK, `claude -p`, Claude Code GitHub Actions, third-party apps) draws from a separate credit pool instead of subscription limits. Pro gets $20,…

May 15, 2026

$Claude Code Weekly Limits Just Got +50% Through July 13. I Run Six Routines on Max. Here's the Codex Migration Math the Bump Doesn't Fix.$

Claude Code Weekly Limits Just Got +50% Through July 13. I Run Six Routines on Max. Here's the Codex Migration Math the Bump Doesn't Fix.

On May 13, 2026, Anthropic raised Claude Code weekly limits 50 percent through July 13 for Pro, Max, Team, and seat-based Enterprise accounts. The same day, OpenAI offered two free months of Codex to teams who migrate inside 30 days. The…

May 14, 2026

Cursor Just Shipped Multi-Repo Cloud Agent Environments. Here's What It Still Can't Do That My Cloudflare Stack Already Does.

Cursor's May 13 release moves cloud agent environments from demo infra to production tooling. The release is real, the numbers are real, and the SERP is reading it wrong. This piece argues the real choice isn't managed-versus-DIY, it is…

May 14, 2026

View all AI articles