Frontrow Technology
← All insights & guides
Guide

Applied AI · Cost economics

Azure OpenAI PTU vs Pay-As-You-Go: Cost Compared (Australia 2026)

A senior-practitioner breakdown of Azure OpenAI PTU vs pay-as-you-go pricing for Australian teams in 2026 — the per-token economics, the ~150-200M token crossover, and Australia East reservation discounts. AUD ex GST.

Daniel Brown · 16 June 2026 · 7 min read

If you are running anything beyond a pilot on Azure OpenAI, the question that eventually lands on the architect's desk is blunt: do we keep paying per token, or do we buy dedicated capacity? Azure gives you both — pay-as-you-go (standard, token-metered) and Provisioned Throughput Units (PTU, capacity-metered) — and the wrong choice in either direction quietly burns money. This is a plain-English breakdown of the two models, the numbers that actually drive the decision, and where the crossover sits for an Australian team in 2026.

Pay-as-you-go bills you per token — input and output priced separately — and you pay nothing when no one is calling the model. For a GPT-4o-class model the published rate is indicative AUD list — confirm at purchase, but the underlying USD figure is roughly USD $2.50 per million input tokens and USD $10.00 per million output tokens on Global Standard. It is elastic, requires no commitment, and is the correct default for almost every workload that is new, spiky, or low-volume.

PTU flips the meter. Instead of paying per token, you reserve a slice of dedicated throughput — measured in Provisioned Throughput Units — and pay an hourly rate per unit whether you push one token through it or a billion. The headline number people quote is around AUD $2,448 per month per unit on a monthly reservation (indicative AUD list — confirm at purchase). The pure hourly PTU rate sits near USD $1 per PTU per hour, which works out to roughly USD $744 a month if you ran a single unit non-stop without any reservation discount.

Two reasons, and cost is usually the second one. The first is performance. A provisioned deployment gives you reserved capacity, which means consistent, predictable latency and no exposure to the 429 'capacity' throttling that standard deployments can hit during regional demand spikes. For a customer-facing assistant or a real-time agent where p95 latency is a contractual or UX problem, that determinism is the headline benefit — the savings are a bonus.

The second reason is the per-token economics at scale. Once you are pushing serious sustained volume, the fixed PTU cost spread across a very large number of tokens produces a much lower effective per-token rate — Microsoft positions this as up to roughly 70% savings versus pay-as-you-go on sustained, high-utilisation workloads. The catch is in those two words: sustained and high-utilisation. A PTU sitting half-idle is just an expensive way to buy tokens you could have metered.

For a GPT-4o-class model, the break-even between pay-as-you-go and a single provisioned unit lands in the order of 150-200M tokens per month per unit, assuming you keep that unit reasonably busy. Below that volume, you are almost always better off metering tokens. Above it — and with utilisation comfortably past 50% — the provisioned rate starts winning, and the gap widens the busier you run.

  • Under ~150M tokens/month, or bursty traffic with long idle periods: stay on pay-as-you-go.
  • Sustained 150-200M+ tokens/month at 50%+ utilisation on one unit: PTU starts to pay for itself.
  • Latency-critical, always-on workloads: consider PTU even slightly below the cost crossover, because you are buying determinism, not just cheaper tokens.

Treat that 150-200M figure as a sighting shot, not gospel. The real crossover shifts with your input-to-output token ratio (output is four times the price of input on pay-as-you-go, so output-heavy workloads tip toward PTU sooner) and with how much of the reserved capacity you genuinely use. Two teams with identical monthly token counts can land on opposite sides of the decision purely on the shape of their traffic.

PTU has three commercial gears. Pure hourly — no commitment, delete after an hour — is the most expensive per unit and exists for testing and short bursts. A monthly reservation cuts the rate materially (Microsoft cites up to around 64% off the hourly rate), and a yearly reservation cuts it further again (up to roughly 70%). The hourly rate is what makes a single idle PTU look extravagant; the reservation is what makes a committed, well-utilised PTU genuinely cheap. The longer the term, the larger the discount — and the larger the commitment.

All figures here are indicative AUD list — confirm at purchase. Reservations are also a true financial commitment booked against your Azure billing account, so they belong in a finance conversation, not just an architecture one.

Australia East is a supported PTU region, which matters for data-residency-sensitive clients who cannot route inference offshore. But two things bite Australian teams specifically. First, model availability for provisioned deployment varies by region — the model and version you want may have different PTU availability in Australia East than in a US region, so confirm before you design around it. Second, there is an FX exposure: the underlying rates are USD-denominated, so your AUD invoice drifts with the exchange rate unless you have agreed pricing under an Enterprise Agreement or Microsoft Customer Agreement. We always quote clients ex GST and flag the FX line item explicitly.

Don't buy PTU on a forecast. Run the workload on pay-as-you-go first and collect a fortnight of real telemetry — tokens per hour, the input/output split, and the daily utilisation curve. That data tells you three things at once: whether you are anywhere near the crossover, how spiky your traffic is, and how much of a provisioned unit you would actually keep busy.

  1. 1Instrument first. Capture per-hour token volume and the input/output ratio from live pay-as-you-go traffic before modelling anything.
  2. 2Size for the baseline, not the peak. Provision PTU to cover your steady floor, and let bursts spill over to a pay-as-you-go deployment — a hybrid almost always beats sizing PTU for your worst hour.
  3. 3Start hourly or monthly. Prove utilisation on a flexible PTU mode before stepping up to a yearly reservation; only annualise once the curve is genuinely stable.
  4. 4Re-price every quarter. Token rates, model versions, and your own traffic all move — a PTU decision that was right in Q1 can be wrong by Q3.

The short version: pay-as-you-go is the right default and stays the right default for the long tail of workloads. PTU earns its keep in exactly two situations — sustained high volume past the ~150-200M token crossover, or latency-critical always-on traffic where determinism is the point. The expensive mistake is buying a reservation to feel in control of costs, then leaving the capacity half-idle. Let the telemetry, not the anxiety, make the call.

Every dollar figure in this piece is indicative AUD list ex GST — confirm at purchase against your own region and model, because both Azure's rates and your traffic shape move over time.

Common questions

Frequently asked

Is Azure OpenAI PTU pricing the same in Australia East as in US regions?
Not exactly. The PTU model — an hourly rate per provisioned unit, with cheaper monthly and yearly reservations — is the same everywhere, but the actual dollar rate and which models are available for provisioned deployment vary by region. Australia East is a supported PTU region, but you should price your specific model in Australia East on the Azure pricing calculator rather than assuming US figures. There is also an FX angle: token rates are USD-denominated under the hood, so your AUD bill moves with the exchange rate unless you are on an EA/MCA with agreed pricing.
What roughly is the monthly token volume where PTU beats pay-as-you-go?
As a rule of thumb for a GPT-4o-class model in 2026, the crossover sits around 150-200M tokens per month per provisioned unit, assuming you keep that unit busy (50%+ sustained utilisation). Below that — or if your traffic is spiky and idle for long stretches — pay-as-you-go is almost always cheaper because you only pay for tokens you actually consume. The honest answer is that the crossover depends on your input/output token mix and your real utilisation, so model it on a fortnight of your own telemetry before committing.
Can we run PTU and pay-as-you-go at the same time?
Yes, and for most Australian teams that is the right answer. Provision enough PTU to cover your steady baseline load, then 'spillover' overflow and bursty traffic to a pay-as-you-go (standard) deployment. You get predictable latency and cost on the baseline, and you avoid paying reservation rates for capacity you only need a few hours a day. The trade-off is operational: you need routing logic and monitoring so requests fail over cleanly between the two deployments.
Does a PTU reservation lock us in for a year?
It can, but it does not have to. Provisioned throughput has three commercial modes: pure hourly (no commitment, delete any time after an hour), a monthly reservation, and a yearly reservation. The longer the term, the bigger the discount — but a yearly reservation is a genuine financial commitment, so we generally advise clients to prove the workload on hourly or monthly PTU first and only step up to annual once utilisation is demonstrably stable.

Want us to run this with your team?

30 minutes. No deck. We'll walk through your tenant, your priorities, and the next sensible move.