Skip to main content

Chapter 9: Containers: Beyond V8 Isolates

What do I do when Workers' constraints genuinely don't fit?


The pricing models of Workers and Containers reveal a fundamental difference: Workers charge for thinking while Containers charge for existing. Understanding this distinction determines which workloads are cheap on which platform, and misunderstanding it is the most expensive mistake when choosing between them.

A Worker making an API call and waiting 500ms for the response pays for milliseconds of actual computation, not the waiting (the I/O is free). A Container making the same call pays for all 500ms whether computing or idle, meaning the meter runs regardless of what your code does.

This cost inversion makes I/O-heavy workloads significantly more expensive on Containers, so the architectural question isn't "which is better" but rather "which matches your workload's profile." Getting this right can mean order-of-magnitude cost differences.

Containers exist for workloads that genuinely exceed Workers' constraints (more than 128 MB of memory, more than 5 minutes of CPU time, or runtimes V8 can't execute), so you should resist the temptation to reach for Containers whenever Workers feel limiting. Containers cost more, start slower, and add architectural complexity. The right question isn't "can Containers do this?" but rather "must Containers do this?"

Containers: Durable Objects with bigger bodies

Understanding Cloudflare's Container architecture reveals capabilities not obvious from documentation and explains why Containers don't need load balancers, service discovery, or orchestration complexity.

Every Container instance is backed by a Durable Object where the DO provides the brain (coordination, state, global addressability) and the Container provides the muscle (arbitrary runtimes, more memory, and longer computation). Requests flow through this chain:

This architecture inherits everything from Chapter 6 because when you derive an ID from a user identifier, every request worldwide for that user routes to the same Durable Object and therefore the same Container instance. The globally-unique routing that makes DOs powerful for coordination makes Containers globally addressable without load balancers.

The DO maintains state that survives container restarts using the same SQLite storage and strong consistency guarantees as any other Durable Object, so when a Container sleeps or crashes, the DO persists and provides whatever state the Container needs to resume when it wakes. The Container is ephemeral compute; the Durable Object is durable coordination.

Your sharding strategy (one container per user, per session, or shared pools) is expressed through how you derive DO IDs. The same decision you'd make for pure Durable Objects, with the same trade-offs. The routing code is trivial; the strategy decision is everything.

The real question: escape or restructure?

Before choosing Containers, you must ask yourself whether you can restructure your workload to fit Workers instead.

Many workloads that seem to exceed Workers' limits can be refactored: image processing that loads entire images into memory can stream instead, data transformations that buffer complete datasets can chunk and process incrementally, and batch operations that accumulate results can write intermediate state to R2 or D1.

Refactoring isn't free since it takes engineering time and adds complexity, but Containers aren't free either because they cost more per operation, start in seconds rather than milliseconds, and introduce a second compute model.

The decision framework is straightforward: if restructuring takes less than a week of engineering time and the workload runs more than 10,000 times per month, you should restructure since the ongoing cost savings will repay the investment within weeks. If restructuring requires architectural changes across multiple services or the workload runs rarely, accept Container complexity.

Restructure for Workers when:

  • Refactoring is straightforward (streaming instead of buffering, chunking instead of accumulating)
  • The workload runs frequently (cost savings compound rapidly)
  • Latency matters (Workers start in milliseconds; Containers in seconds)
  • You want architectural simplicity (one compute model, not two)

Accept Container complexity when:

  • Refactoring requires rewriting core logic or changing interfaces across services
  • You need a runtime Workers don't support (Go, Java, .NET, Rust without WASM)
  • The workload runs infrequently (Container overhead amortises over fewer invocations)
  • Existing containerised code works and rewriting provides no benefit

The worst outcome is reaching for Containers out of convenience, then discovering ongoing costs exceed what restructuring would have required, so do the analysis first.

Hard boundaries: when Containers are unavoidable

Some constraints can't be engineered around, and these represent cases where Containers are genuinely necessary, not preferences but actual requirements.

Memory beyond 128 MB: If your workload must hold more than 128 MB in memory simultaneously and the problem genuinely requires it, Workers can't help. Machine learning models that don't fit in 128 MB, image processing of very large images where streaming isn't possible, and in-memory computation for specialised workloads all need Containers, which provide up to 12 GB. If that's insufficient, use hyperscaler compute.

Non-JavaScript runtimes: Workers run JavaScript, TypeScript, Python, or WebAssembly, but Go, Rust without WASM compilation, Java, .NET, Ruby, and other runtimes require Containers. This includes existing containerised applications like internal tools, legacy services, and third-party software you can't or won't rewrite.

CPU time beyond 5 minutes: Workers limit CPU time to 5 minutes maximum while Containers have no such limit. Video transcoding, complex simulations, and batch data transformations that compute for hours all need Containers. Note the distinction: CPU time, not wall time. A Worker can wait for I/O indefinitely without consuming CPU time, but if your workload genuinely computes for more than 5 minutes, it needs Containers.

Filesystem requirements: Workers have no persistent filesystem, but Containers provide up to 20 GB disk per instance. Workloads writing temporary files, maintaining local caches, or running software expecting filesystem access need Containers.

Hard boundaries: when Containers can't help

Sometimes Workers don't fit but neither do Containers, so recognising these cases early saves considerable effort.

Resources beyond Container limits: Containers max out at 4 vCPU and 12 GB memory, so if you need 32 GB RAM for a large ML model or 8 cores for parallel processing, use hyperscaler compute (EC2, GCE, Azure VMs) where larger instance types exist.

Inbound TCP/UDP: Containers cannot accept inbound TCP or UDP connections from the internet because all traffic routes through Workers or Durable Objects via HTTP. Game servers expecting direct UDP, custom protocol servers, and IoT gateways using MQTT or CoAP directly cannot use Cloudflare Containers. These workloads need traditional cloud infrastructure with load balancers and public IP addresses.

Nested containers: Docker-in-Docker is not possible, so CI/CD systems spawning containers and container-based testing frameworks cannot run on Cloudflare.

Choosing a routing strategy

Your Durable Object ID strategy is your Container scaling strategy because per-user IDs mean per-user containers, pool IDs mean shared containers, and session IDs mean session-sticky containers. Each approach has different cost, isolation, and latency characteristics.

Per-user containers derive the DO ID from a user identifier, giving each user their own container instance and coordinating Durable Object with strong isolation and simplified state management since all requests route to the same instance. The cost is potentially many containers with low individual utilisation: 10,000 active users might mean 10,000 container instances, most sleeping. Sleep is free, but each request to a sleeping container incurs cold-start latency. This pattern fits when users need isolated resources, per-user state is substantial, or security requires separation.

Shared pools derive the DO ID from a pool identifier, routing multiple users to the same container instances so concentrated traffic keeps containers warm and reduces cold starts, with fewer containers meaning lower costs when traffic is moderate. The trade-off is noisy neighbours, where one user's expensive operation affects everyone sharing that container. This pattern fits stateless workloads or workloads where state lives in external storage and requires careful attention to timeouts and resource limits within your container application.

Session-sticky routing derives the DO ID from a session identifier so requests within a session route to the same container while different sessions may route to different instances. This provides in-session state without per-user cost and containers sleep when sessions end, so economics fall between per-user and pooled approaches. Watch session duration carefully: long sessions mean long-running containers while very short sessions mean frequent cold starts.

RequirementStrategyTrade-off
Strong user isolationPer-userHigher cost, more cold starts
Cost efficiencyShared poolNo isolation, noisy-neighbour risk
In-session stateSession-stickyModerate cost, session-duration sensitivity
Mixed workloadHybridComplexity, but optimises each case

For hybrid approaches, route different request types differently. Stateless API calls go to shared pools for cost efficiency; user-specific heavy processing goes to per-user containers for isolation. The routing logic lives in your Worker; the Container receives requests without knowing how they were routed.

Instance sizing

Containers come in predefined sizes ranging from lite (256 MB, 1/16 vCPU) to standard-4 (12 GB, 4 vCPU), and choosing correctly requires understanding your workload's actual resource consumption.

The sizing framework depends on your workload's dominant constraint:

Workload profileStart withUpgrade signal
Memory-bound, low CPUMinimum memory that fits, smallest vCPUCPU exhaustion under concurrent load
CPU-bound, moderate memorystandard-1Memory pressure from parallel requests
Disk-intensiveMatch disk to throughput needsI/O wait showing in latency metrics
Unknown or variablestandard-1Observe for two weeks before optimising

Start with the smallest instance type you believe might work (usually standard-1 for real workloads, basic for lightweight services), monitor actual resource usage, and optimise after you have data rather than before.

Memory exhaustion manifests as OOM kills where your container dies mid-request and requests fail unpredictably under load. Verify you're not leaking memory first (a leak exhausts any instance size eventually) before increasing instance size if memory usage is legitimate.

CPU exhaustion manifests as slow responses and timeouts because your container is compute-bound and queuing work faster than it can complete. If CPU utilisation is consistently high and latency suffers, increase vCPU, but verify you're not spinning on inefficient code first.

Disk exhaustion manifests as write failures, so verify you're cleaning up temporary files properly before increasing disk allocation.

Container Resource Ratios

Cloudflare enforces minimum resource ratios: at least 3 GB memory per vCPU, at most 2 GB disk per 1 GB memory. These prevent pathological configurations. If you need 8 GB memory, you need at least standard-3. If you need 16 GB disk, you need at least 8 GB memory.

The cost model in practice

The cost inversion between Workers and Containers can mean order-of-magnitude differences in total cost.

Consider a workload processing 100,000 requests per day. Each request validates input (1ms CPU), calls an external API (200ms wait), transforms the response (2ms CPU), and returns. Total: 3ms computation, 200ms waiting.

On Workers, you pay for 3ms of CPU per request: 300 seconds of CPU time daily, roughly $0.20 monthly. The 200ms of waiting costs nothing.

Cost Model Inversion

Containers bill for wall time, not CPU time. A Container waiting 500ms for an API response pays for all 500 milliseconds. For I/O-heavy workloads, this can cost 100-500x more than Workers doing the same work.

On Containers, you pay for 203ms of wall time per request. Assuming requests don't overlap perfectly, that's roughly 6 hours of container time daily. A standard-1 instance costs approximately $100 monthly: 500 times more than Workers, because Containers charge for the waiting Workers provide free.

A workload that seems to need Containers because of occasional memory spikes might cost 500 times more than a restructured version within Workers' limits, so the engineering effort to restructure often repays itself within days.

Containers become cost-competitive when CPU utilisation is high and consistent; video thumbnail extraction (2 seconds I/O to download, 5 seconds CPU to process frames, 500ms I/O to upload) cannot run on Workers, so the comparison becomes Containers versus hyperscaler compute rather than Containers versus Workers. At roughly $400 monthly for 100,000 daily requests, Containers provide global distribution and automatic scaling while a comparable EC2 instance costs less but requires provisioning, scaling configuration, and regional deployment.

Designing for cold starts

You cannot eliminate cold starts; you can only choose who experiences them.

When a request arrives for a sleeping container, the Durable Object receives it in milliseconds, requests container start, and waits 2-10 seconds for the image to load and process to initialise before forwarding the request. Total cold-start latency is typically 3-15 seconds, making this unacceptable for interactive applications.

Cold Start Reality

Container cold starts typically take 3-15 seconds. You cannot eliminate them, only mitigate them. For interactive applications requiring sub-second response times, this is often a disqualifying constraint.

The solution is architectural rather than optimisation: route interactive traffic to Workers and heavy computation to Containers so the user gets millisecond responses while the Container wakes in the background.

Design for graceful degradation. When a request requires Container processing and the Container is cold, return immediately with a "processing" acknowledgement. Deliver results via webhook, polling, or WebSocket notification. The user experiences fast feedback; the Container processes without blocking them.

Use hybrid routing: Quick validation, authentication, and request shaping happen in Workers while heavy processing happens in Containers that Workers invoke asynchronously, so the Worker responds immediately and the Container result arrives when ready.

Pre-warm on predictable traffic. If you know traffic is coming (user logged in, batch job scheduled, webhook expected), send a lightweight request to wake the container before the real work arrives. Shift cold-start latency from user-facing requests to background preparation.

Extend sleep timeout selectively. Configure longer sleepAfter values for containers with frequent but irregular traffic. A container sleeping after 30 minutes instead of 10 stays warm for more requests, at the cost of paying for idle time. This trade-off makes sense when cold-start latency matters more than cost.

Optimise startup as a secondary measure: Minimise container image size, lazy-load dependencies, and defer initialisation not needed for the first request. These optimisations reduce cold starts from 10 seconds to 5 (still unacceptable for interactive traffic), so architectural approaches that avoid exposing users to cold starts matter more than shaving seconds from startup.

For batch processing, scheduled jobs, and asynchronous workflows, cold starts don't matter because a workflow step taking 30 seconds of processing doesn't suffer noticeably from 5 seconds of startup, so route interactive traffic away from cold-start-sensitive paths.

Observability: what container failures actually look like

Standard observability advice applies to Containers as to any compute, though what matters here is understanding Container-specific failure patterns and what they indicate.

Error rate spikes in the Container but not the DO indicate a Container application bug, not routing or coordination. The DO successfully received and forwarded requests; the Container failed to process them. Debug your application code, not your Cloudflare configuration.

Error rate spikes in both DO and Container simultaneously suggest resource exhaustion or infrastructure issues. The DO might be failing to start the Container, or the Container crashing on startup. Check resource limits, image validity, and Cloudflare status.

Latency increases without error rate changes have two common causes: if latency correlates with traffic, your Container is compute-bound and requests are queuing, and if latency increases randomly regardless of traffic, you're seeing cold starts. Distinguish by correlating request timestamps with container start events.

Intermittent failures under load usually indicate memory exhaustion because a Container running fine at low traffic OOMs when concurrent requests multiply memory usage. If memory climbs toward limits before failures, increase instance size or reduce concurrency.

Requests succeed but return wrong results is your application bug. The platform delivered the request correctly; your code processed it incorrectly.

The diagnostic question is simple: where in the Worker-DO-Container chain did things go wrong? Worker errors mean routing problems; DO errors mean coordination or lifecycle problems; Container errors mean application problems. Trace IDs propagating across all three let you follow a request's path.

When to choose hyperscalers instead

Cloudflare Containers optimise for global distribution and tight integration with Workers and Durable Objects, while hyperscaler alternatives optimise for different things that sometimes matter more.

Choose hyperscalers when you need deep VPC integration: Cloudflare Containers communicate via HTTP through Workers, and while Workers VPC Services (currently in beta) provides secure connectivity to private resources through Cloudflare Tunnel, hyperscaler containers offer native VPC attachment without intermediate layers. For workloads requiring extensive private network access or complex networking topologies, hyperscaler containers with native VPC integration remain simpler.

Choose hyperscalers when you need larger instances: Cloudflare maxes out at 4 vCPU and 12 GB memory, whereas AWS Fargate provides up to 4 vCPU and 30GB memory and Google Cloud Run provides up to 8 vCPU and 32 GB memory. If your workload genuinely needs more resources per instance, hyperscalers are your only option.

Choose hyperscalers when you need deep integration with hyperscaler-specific services: If your architecture depends on SQS, DynamoDB, BigQuery, or Cosmos DB, running containers on the same platform simplifies authentication, reduces latency, and consolidates billing.

Choose Cloudflare when global distribution matters more than raw instance size: Cloudflare Containers deploy globally by default while hyperscaler containers require explicit multi-region configuration. If your users are worldwide and latency matters, Cloudflare's automatic distribution is valuable.

Choose Cloudflare when your architecture already uses Workers and Durable Objects: The DO coordination model backing Containers is the same model you're already using, so adding Containers extends your existing architecture whereas adopting hyperscaler containers introduces a separate system with separate deployment, monitoring, and operational patterns.

Choose Cloudflare when the DO coordination model simplifies your design: If your Container workload benefits from globally-unique routing, persistent coordination state, or Chapter 6's patterns, the DO-backed architecture provides these for free while building equivalent coordination on hyperscaler containers requires additional infrastructure.

Deciding factorCloudflareHyperscaler
Global distributionAutomaticManual multi-region
Maximum resources4 vCPU, 12 GB8 vCPU, 32 GB
Private networkingVPC Services (beta)Native VPC integration
Coordination modelDurable ObjectsBuild your own
Ecosystem integrationWorkers, R2, D1Full hyperscaler suite

What comes next

The next chapter covers Cloudflare Realtime: audio and video streaming at the edge. Where Durable Objects with WebSockets handle text and data synchronisation, Realtime provides the WebRTC infrastructure for actual voice and video communication. The distinction matters: different protocols, different infrastructure, different use cases.

After Realtime, Part IV addresses data and storage: how to choose between D1, R2, KV, and Hyperdrive, and how to use each effectively. The stateful compute primitives from Part III store state in these storage systems. Understanding both compute and storage completes the architectural picture.