Chapter 19: Cost Modelling and Optimisation

What will this actually cost, and how do we optimise?

Technical leaders don't ask "what does Cloudflare charge?" They ask "will this be cheaper than what we're doing now, and when does that change?" Answering requires understanding the economics behind the prices.

Cloudflare's pricing attaches dollar signs to architectural guidance, providing concrete feedback on design decisions. When something costs significantly more, reconsider the approach. Understanding why reveals when Cloudflare's model wins, when it doesn't, and how to design systems where cost efficiency and correctness align.

The economics behind the pricing

Every pricing decision reflects Cloudflare's underlying cost structure. Understanding that structure lets you predict how architectural changes affect your bill before deployment.

Why CPU time, not wall time

Workers bill for CPU time consumed, not elapsed time. A Worker waiting 500ms for a database response but using only 2ms of CPU pays for 2ms. This seems generous until you understand the model.

V8 isolates share processes. While your Worker waits for I/O, other customers' code runs on the same CPU. Waiting costs Cloudflare nothing, so it costs you nothing. Computing costs them resources, so it costs you money.

Design for I/O parallelism without guilt. Fan out requests, wait for multiple backends, aggregate responses. The pattern expensive on per-duration billing is cheap here. Conversely, CPU-intensive operations (complex parsing, cryptographic computation, heavy string manipulation) cost more than their wall-clock time suggests.

A Worker orchestrating five parallel API calls, spending 2 seconds waiting but only 10ms computing, costs the same as a Worker returning immediately after 10ms of computation. On Lambda, you'd pay for the full 2 seconds. This single difference makes orchestration layers, BFF patterns, and API facades 10–100× cheaper on Cloudflare than equivalent Lambda implementations.

Why KV writes cost ten times what reads cost

KV reads cost $0.50 per million. Writes cost $5.00 per million. This 10:1 ratio reflects the operational difference between reads and writes in a globally distributed cache.

A KV read hits the nearest edge location. If cached, it's served immediately from infrastructure Cloudflare already built for their CDN. A KV write must propagate to over three hundred locations worldwide, triggering replication across Cloudflare's entire network. The eventual consistency window represents global propagation time, not processing delay.

Let Pricing Guide Architecture

Writing to KV more than reading? Wrong tool. The pricing signals the intended use case. Configuration data, feature flags, cached API responses align with KV's economics. Counters, session updates, real-time state fight the pricing model. Use Durable Objects or D1 instead.

Why D1 charges per row

D1 costs $0.001 per million rows read and $1.00 per million rows written. Not per query, per row. Query efficiency shows directly in your bill.

Index Your Queries

An unindexed query scanning a million rows costs $1. Every time it runs. A properly indexed query hitting ten rows costs $0.00001, a hundred thousand times cheaper. No other feedback mechanism makes optimisation benefits this concrete. Slow queries are expensive; fast queries are cheap.

D1 runs on SQLite, implemented as a Durable Object underneath. Each row read is a storage operation against that infrastructure. Cloudflare's costs scale with rows accessed, so yours do too. This isn't a pricing decision; it's cost pass-through creating powerful incentives for good database design.

Why egress is free

R2's zero egress pricing isn't charity; it's strategy. Cloudflare's business model differs fundamentally from hyperscalers.

AWS, Azure, and GCP make substantial revenue from egress charges at $0.05–0.09 per gigabyte. This creates incentives to keep data within their ecosystems and makes certain patterns (serving large assets globally, aggressive edge caching, media distribution) prohibitively expensive.

Cloudflare profits when traffic flows through their network. Every byte served from R2 reinforces their network's value. Charging for egress would contradict their business model. Patterns ruinously expensive on hyperscalers become viable. If your architecture involves significant egress, the savings can dwarf all other cost considerations.

Why Durable Objects bill for duration

Durable Objects charge $12.50 per million GB-seconds of duration. This seems to contradict the "CPU time not wall time" principle, but DOs have different operational characteristics.

An active Durable Object holds state in memory and maintains a globally-routable identity. Even idle within its activity window, it occupies memory and routing table entries. Cloudflare's costs scale with how long DOs remain active, so yours do too.

The crucial insight: inactive DOs cost nothing. A million user sessions where each user is active for 30 seconds daily costs far less than one always-on server, but only if the DOs actually sleep between interactions.

What keeps a DO awake?

In-flight requests
Open WebSocket connections (even hibernated ones maintain minimal state)
Time remaining in the idle timeout (10 seconds after last activity by default)
Pending alarms scheduled within the timeout window

What lets a DO sleep? Completing all requests, closing or hibernating WebSocket connections, and letting the idle timeout expire. After that, the DO costs nothing until the next request.

Avoid Heartbeat Anti-Pattern

The common mistake: implementing presence detection or heartbeat patterns that ping DOs periodically to keep them "warm." This defeats the economic model. Pinging a DO every 5 seconds means paying for continuous activity on something designed for sparse access. Rethink the pattern: store presence in KV with TTL expiration, accept slightly stale presence data, or use WebSocket hibernation to maintain connections without continuous billing.

Storage choice as economic decision

Every storage primitive answers a different question. KV: what's the value for this key? D1: which rows match these conditions? Durable Objects: what happened to this specific entity? R2: what's in this file? Choosing wrong doesn't just cost more; it makes your code fight the abstraction.

Decision framework by access pattern

If your pattern is...	Use	Because
Read-heavy, rarely updated	KV	$0.50/M reads, global caching
Write-heavy, needs queries	D1	$1.00/M writes, SQL flexibility
Per-entity coordination	Durable Objects	Single-threaded consistency
Large binary data	R2	Zero egress, S3 compatible
External PostgreSQL exists	Hyperdrive	Don't migrate, accelerate

Cost comparison by operation type

Operation	KV	D1	Durable Objects
Single read	$0.0000005	$0.000001 per row	~$0.000001
Single write	$0.000005	$0.000001 per row	~$0.000001
1000 reads	$0.0005	$0.001 (if 1 row each)	~$0.001
1000 writes	$0.005	$0.001 (if 1 row each)	~$0.001

KV reads are cheapest; KV writes most expensive. D1 costs depend entirely on rows touched; a query reading 1000 rows costs the same whether it's one query or a thousand single-row queries. Durable Objects have request costs plus duration; the comparison depends on how long objects stay active.

The pattern: use KV for read-heavy key-value access, D1 for write-heavy or query-complex workloads, and Durable Objects when you need coordination or per-entity state.

When Cloudflare wins on cost

The answer depends on workload characteristics, not volume alone.

High egress relative to compute

If your workload serves significant data to users, egress costs dominate on hyperscalers. A media application serving 10 TB monthly pays AWS roughly $900 in egress alone. The same egress from R2 costs nothing.

The crossover point is lower than most expect. Even 1 TB monthly costs $90 on AWS. If egress represents more than 20% of your hyperscaler bill, Cloudflare likely wins on total cost.

Spiky, unpredictable traffic

Hyperscaler cost efficiency requires capacity planning. Provisioned concurrency, reserved instances, committed use discounts trade flexibility for lower unit costs. If you can't predict traffic, you either overpay for unused capacity or face cold starts and throttling.

Cloudflare charges only for actual usage with no provisioning required. Traffic spikes cost more but only proportionally. For unpredictable patterns, this flexibility has concrete value that doesn't appear on comparative spreadsheets.

Request-heavy, compute-light workloads

The CPU time billing model dramatically favours I/O-bound workloads. An API gateway spending 5ms of CPU time across 500ms of wall time pays for 5ms on Cloudflare, 500ms on Lambda. For orchestration patterns, this difference is decisive.

When hyperscalers win

Cloudflare's model isn't universally superior.

Compute-intensive processing with predictable volume benefits from reserved capacity pricing. Heavy computation running hours daily on a predictable schedule favours EC2 reserved instances over per-millisecond billing.

Deep AWS service integration makes migration costly. If your architecture depends heavily on SQS, SNS, Step Functions, DynamoDB Streams, and IAM, the rewrite cost may exceed operational savings over any reasonable horizon.

GPU workloads for ML training have no Cloudflare equivalent. Workers AI provides inference, not training.

Large single databases beyond 10 GB require traditional managed databases. D1's horizontal model works for many applications, but some genuinely need monolithic relational stores.

Total cost of ownership

Comparing platforms requires accounting for expenses that don't appear on obvious line items.

Hidden hyperscaler costs

Technical leaders comparing Cloudflare to AWS routinely underestimate several categories.

NAT Gateway charges apply whenever private subnet resources access the internet. At $0.045 per hour plus $0.045 per GB processed, a moderately busy NAT Gateway costs $100-500 monthly, often more than the Lambda functions behind it.

Cross-AZ data transfer costs $0.01 per GB in each direction. Database in one AZ, compute in another: charges accumulate invisibly.

CloudWatch Logs charges for ingestion and storage. Verbose logging in high-volume applications generates surprising bills.

Provisioned concurrency for cold start mitigation costs whether invocations occur or not, accumulating continuously rather than per-use.

A worked comparison

Consider a SaaS API with these characteristics:

50 million requests monthly
Average response time 200ms, average CPU time 15ms
500 GB stored data, 5 TB monthly egress
Database averaging 50 rows read per request
70% cache hit rate (35M requests served from cache)

AWS Lambda + API Gateway + RDS + S3 + CloudFront:

Component	Monthly Cost
API Gateway (50M requests)	$175
Lambda (50M × 200ms × 512 MB)	$93
RDS (db.t3.medium + 500 GB)	$106
S3 (500 GB + operations)	$16
CloudFront (5 TB egress)	$425
NAT Gateway (estimated)	$150
Total	~$965

Cloudflare Workers + D1 + R2 + KV:

Component	Monthly Cost
Workers (50M requests, 15ms CPU avg)	$30
D1 rows (15M uncached × 50 rows)	$0.75
D1 storage (500 GB)	$375
KV reads (35M cached)	$17.50
R2 (500 GB + operations)	$11
R2 egress (5 TB)	$0
Total	~$434

The 55% savings comes from three sources: zero egress ($425), no NAT Gateway ($150), and CPU-time versus duration billing on compute.

How assumptions change the outcome

This comparison is sensitive to workload characteristics. Understanding the sensitivities matters more than memorising numbers.

Cache hit rate sensitivity:

Cache Hit Rate	Cloudflare Cost	Savings vs AWS
50%	$485	50%
70%	$434	55%
90%	$395	59%

Higher cache rates reduce both KV reads and D1 queries, improving Cloudflare's position.

Egress sensitivity:

Monthly Egress	AWS Cost	Cloudflare Cost	Savings
1 TB	$625	$434	31%
5 TB	$965	$434	55%
20 TB	$2,240	$434	81%

Egress dominates the comparison at scale. If you're serving 20 TB monthly, almost nothing else matters.

CPU time sensitivity:

Avg CPU Time	Cloudflare Workers Cost	Total Cloudflare
5ms	$15	$419
15ms	$30	$434
50ms	$75	$479
100ms	$150	$554

CPU-heavy workloads erode Cloudflare's advantage but don't eliminate it in this scenario; egress savings still dominate.

Model your specific workload. The comparison that matters is yours, not a hypothetical.

AI inference costs

AI workloads can dominate bills quickly. The economics differ from traditional compute.

Cost structure

Workers AI charges per token processed, with rates varying by model size. Larger models cost approximately 8-12x smaller models for the same token volume.

Model Class	Input Cost (per M tokens)	Output Cost (per M tokens)
Small (7B params)	~$0.10–0.20	~$0.40–0.80
Medium (13–34B)	~$0.30–0.50	~$1.00–2.00
Large (70B+)	~$1.00–2.00	~$4.00–8.00

Output tokens cost more than input tokens because generation requires sequential computation while input processing can be parallelised.

Worked example

A customer support chatbot processing 100,000 queries monthly:

Metric	Value
Queries	100,000
Avg prompt length	500 tokens
Avg response length	300 tokens
Total input tokens	50M
Total output tokens	30M

On a 7B parameter model:

Input: 50M × $0.15/M = $7.50
Output: 30M × $0.60/M = $18.00
Total: $25.50/month

On a 70B parameter model:

Input: 50M × $1.50/M = $75.00
Output: 30M × $6.00/M = $180.00
Total: $255/month

The 70B model costs 10x more. Is it 10x better for your use case? Often no. Customer support queries with clear intent and constrained response formats frequently work well with smaller models. Reserve large models for tasks requiring sophisticated reasoning or broad knowledge.

AI cost optimisation

Right-size models first. Don't default to the largest available. Test smaller models against your actual queries; quality differences may be negligible for your use case.

Set explicit max_tokens limits. Without limits, models can generate unexpectedly long responses, costing tokens you didn't intend to spend.

Cache repeated queries through AI Gateway. Similar questions? Cached responses eliminate inference costs entirely for duplicates.

Truncate context aggressively. Every prompt token costs money. Include only context the model needs. Summarise long documents rather than including full text.

Cost as architectural feedback

On Cloudflare, high costs usually indicate architectural problems, not just scale. The alignment between cost efficiency and correctness isn't accidental; pricing reflects operational costs, which reflect resource consumption.

Warning thresholds

These ratios suggest architectural misalignment:

Cost ratio warning thresholds

Warning Signal	Threshold	What It Indicates
D1 reads vs storage	Reads > 10x storage costs	Query inefficiency (missing indexes, broad queries, N+1 patterns)
KV writes vs total KV	Writes > 50% of KV costs	Misuse (data belongs in D1 or Durable Objects)
DO duration vs requests	Duration > 100x request costs	DOs not sleeping (heartbeats, polling, or WebSocket issues)
Workers CPU vs requests	CPU > 30% of request costs	Compute-heavy workloads (consider caching or Containers)
AI vs compute costs	AI > 5x compute (non-AI apps)	Prompt engineering issues or model over-provisioning

D1 read costs exceeding 10x storage costs indicates query inefficiency. Storage at $0.75/GB is fixed; reads at $0.001/million scale with queries. Paying more for reads than storage? You're scanning too many rows. Check for missing indexes, overly broad queries, or N+1 patterns.

KV write costs exceeding 50% of total KV costs indicates misuse. KV's economics assume read-heavy workloads. If writes dominate, the data belongs in D1 or Durable Objects.

Durable Object duration charges exceeding 100x request charges suggests DOs aren't sleeping. A DO handling 1000 requests should cost roughly the same whether requests arrive in 10 seconds or 10 hours, but only if the DO sleeps between bursts. Look for heartbeat patterns, polling, or WebSocket connections preventing hibernation.

Workers CPU time exceeding 30% of Workers request costs indicates compute-heavy workloads. Profile for expensive operations. Could computation move client-side? Could results be cached? Would Containers (with higher resource limits) be more appropriate?

AI costs exceeding 5x your compute costs for non-AI-primary applications suggests prompt engineering problems or model over-provisioning. Review prompt lengths, response limits, and model selection.

Common mistakes by workload type

SaaS APIs commonly store per-request state in D1 instead of Durable Objects. Each request queries and updates user state, causing row reads to dominate costs. Use Durable Objects for per-user or per-session state; reserve D1 for queries across entities.

Media applications commonly route asset delivery through Workers instead of serving directly from R2. The Worker adds CPU time to every asset request. Use R2 public buckets or presigned URLs for direct delivery. Workers should handle authentication and metadata, not byte streaming.

Real-time applications commonly keep Durable Objects alive with presence heartbeats. Every ping resets the idle timeout, causing continuous duration billing. Use WebSocket hibernation, store presence in KV with TTL, or accept eventually-consistent presence.

AI applications commonly include excessive context in prompts. Every historical message, every retrieved document, every system instruction: tokens accumulate. Summarise conversation history, limit retrieval results, cache repeated context.

The optimisation priority stack

When optimisation is warranted, prioritise by impact.

First: D1 query efficiency. The impact spans five orders of magnitude between indexed and unindexed queries. Run EXPLAIN QUERY PLAN on frequent queries. Ensure indexes exist for every column in a WHERE clause.

Second: Storage choice alignment. Moving data from the wrong storage primitive to the right one can cut costs by 90%. Review whether your KV usage should be D1 or vice versa.

Third: Durable Object lifecycle. Do heartbeats or polling keep DOs active unnecessarily? Each avoidable 10-second extension has cost.

Fourth: AI model and prompt efficiency. Test smaller models. Truncate prompts. Cache responses. Set token limits.

Fifth: Workers CPU optimisation. This matters only at scale. Below 100 million requests monthly, CPU optimisation rarely justifies engineering investment. At scale, profile before optimising. Intuition about CPU consumption is often wrong.

When to optimise

Optimisation has costs: engineering time, added complexity, potential bugs. Not all optimisation is worthwhile.

The three-month rule

Optimisation investment should return within three months. If your D1 costs $200/month and you estimate 50% savings, optimisation yields $100/month, or $300 over three months. If the engineering work costs more than $300 in time, don't optimise yet. Wait until scale makes it worthwhile.

This calculation changes at scale. At $5,000/month, the same 50% optimisation yields $7,500 over three months. That justifies substantial engineering investment.

The free tier cliff

Applications on Workers' free tier (100,000 requests daily) can see surprising first bills when exceeding limits. The transition isn't gradual: you go from $0 to paying for all usage.

Model paid tier costs before you need them. If your free tier application handles 50,000 requests daily, model what 200,000 requests would cost. Understanding the trajectory prevents surprises.

Defensive CPU caps

The limits.cpu_ms configuration does double duty. Most documentation emphasises raising the limit from the default 30 seconds to accommodate compute-heavy workloads. The defensive use, lowering the limit, receives less attention but matters more for cost control.

If your Worker's P99 CPU time is 15ms, setting cpu_ms to 50 creates a ceiling with reasonable headroom. Bugs causing infinite recursion, edge cases triggering pathological regex behaviour, or denial-of-wallet attacks crafting requests to maximise compute costs: all terminate at 50ms instead of running for 30 seconds. The request fails, but your bill doesn't explode.

wrangler.toml
[limits]
cpu_ms = 50

The same limit applies to Durable Objects, Workflows, and Queue consumers. The question is the same: what's a reasonable maximum for legitimate requests, with enough headroom for variance but not enough to cause meaningful cost damage if something goes wrong?

This isn't premature optimisation. It's establishing a sensible boundary before you need it. Setting it wrong costs a few failed requests you'll notice and adjust. Not setting it means discovering problems through your invoice.

Balancing cost and reliability

Cost optimisation can conflict with reliability investment. Removing redundancy saves money but increases risk. Aggressive caching reduces costs but can serve stale data.

Frame this trade-off explicitly: if your error budget is being exceeded, stop optimising for cost. No savings matter if users are suffering. Well under your error budget? Optimise aggressively. You may be over-investing in reliability users don't perceive.

Monitoring and response

Cost visibility requires instrumentation specific to Cloudflare's model.

What to track

Cost per request normalises for traffic variation. Cost per request increasing while traffic grows slower? Something has changed architecturally.

D1 rows per query averaged across your application reveals query efficiency trends. Rising averages suggest degrading query patterns or growing data volumes outpacing index effectiveness.

Durable Object active duration per request reveals whether DOs are sleeping appropriately. Duration growing while requests stay flat means DOs are staying active longer than necessary.

KV write-to-read ratio tracks whether KV usage matches its intended pattern. Ratios approaching 1:1 suggest migrating to D1 would reduce costs.

Responding to anomalies

When costs spike unexpectedly, follow this diagnostic sequence.

First, correlate with traffic. Did request volume increase proportionally? Costs doubled and traffic doubled means the system is working correctly.

Second, examine per-request metrics. If cost per request increased, identify what changed. Check recent deployments, new features, logging changes.

Third, check D1 query patterns. A removed index or changed query can cause dramatic cost increases. Review recent schema or code changes affecting database access.

Fourth, verify caching behaviour. Cache hit rate drops increase backend costs. If caching degraded, investigate why: TTL changes, cache invalidation bugs, new uncacheable paths.

Cost spikes are symptoms. Diagnosis identifies the cause; budget adjustment doesn't fix anything.

Modelling for growth

Technical leaders must project costs at scale for informed platform decisions.

Growth scenario template

Monthly Requests	Projected Cost	Per-Request Cost
10M	$150	$0.000015
50M	$520	$0.0000104
100M	$980	$0.0000098
500M	$4,200	$0.0000084

Per-request costs decline at scale: cache hit rates improve, fixed costs amortise, operational efficiency increases. Model this curve for your specific workload rather than assuming linear scaling.

Presenting to leadership

Lead with total cost of ownership, not component costs. The comparison that matters is "what does it cost to run this workload?", including egress, networking, and operational overhead.

Acknowledge switching costs honestly. Migration requires engineering investment. A comparison showing $500/month savings is meaningless if migration costs $50,000 in engineering time: that's eight years to break even.

Provide ranges, not point estimates. "This workload will cost $400-600 monthly based on traffic projections" gives leadership a budget range while communicating inherent uncertainty.

What comes next

Understanding costs enables informed decisions about investment and optimisation. Chapter 20 covers observability: seeing what's happening in production so you can identify issues, debug problems, and verify architectural decisions produce expected results.

Cost and observability work together. You cannot optimise what you cannot measure. You cannot attribute costs without understanding traffic patterns. You cannot explain cost anomalies without visibility into system behaviour.

The economics behind the pricing​

Why CPU time, not wall time​

Why KV writes cost ten times what reads cost​

Why D1 charges per row​

Why egress is free​

Why Durable Objects bill for duration​

Storage choice as economic decision​

Decision framework by access pattern​

Cost comparison by operation type​

When Cloudflare wins on cost​

High egress relative to compute​

Spiky, unpredictable traffic​

Request-heavy, compute-light workloads​

When hyperscalers win​

Total cost of ownership​

Hidden hyperscaler costs​

A worked comparison​

How assumptions change the outcome​

AI inference costs​

Cost structure​

Worked example​

AI cost optimisation​

Cost as architectural feedback​

Warning thresholds​

Common mistakes by workload type​

The optimisation priority stack​

When to optimise​

The three-month rule​

The free tier cliff​

Defensive CPU caps​

Balancing cost and reliability​

Monitoring and response​

What to track​

Responding to anomalies​

Modelling for growth​

Growth scenario template​

Presenting to leadership​

What comes next​