Chapter 1: The Cloudflare Developer Platform

What makes Cloudflare architecturally different, and why does it matter?

A Worker is a serverless function that handles HTTP requests. If you've used AWS Lambda or Azure Functions, the concept is familiar: you write code that receives a request and returns a response, and the platform handles everything else.

The first time you deploy a Worker, something feels wrong. You write a function, run a command, and seconds later it's live, not in one region, but everywhere. No servers to provision. No capacity planning. No cold start mitigation. No region selection dropdown. The instinct honed by years of working with AWS, Azure, or GCP insists that you must have missed a step.

You haven't. The absence of those steps is the point.

Cloudflare's Developer Platform is a fundamentally different approach to building applications. Understanding that difference at the level of mental models and architectural implications separates teams that succeed on this platform from those that fight against it. This chapter establishes the foundational concepts that make everything else in this book make sense.

The V8 isolate model

Understanding why Workers behave differently requires understanding how they execute. Workers do not run in containers. They do not run in virtual machines. They run in V8 isolates, the same execution environment that powers Chrome's JavaScript engine, and this distinction shapes everything about how you design applications on Cloudflare.

What an isolate actually is

An isolate is a lightweight sandbox existing within an already-running process. V8, the JavaScript engine behind Chrome, can create thousands of isolates inside a single process, each with its own memory and execution context. Google built isolates because every browser tab runs untrusted code; they needed strong isolation without the overhead of a separate process per tab. Cloudflare applies this same mechanism to run code from different customers safely on shared infrastructure.

Key Architecture Decision

Isolates aren't containers or VMs. They're sandboxes within an existing V8 process, created in under 5ms versus 100 to 1,000ms for containers. This architectural choice enables single-digit-millisecond cold starts but constrains memory to 128 MB per isolate.

Creating a VM means booting an operating system, which requires seconds of startup time. Spawning a container means initialising a process and loading a runtime, taking hundreds of milliseconds at best. An isolate sidesteps both these constraints: it's created within an already-running V8 engine where the execution environment already exists. Creating one costs roughly a hundred times less than spawning a new Node.js process, and starting one is proportionally faster.

What happens when a request arrives

The request lifecycle explains why Workers avoid the cold-start patterns you expect from traditional serverless.

When a request arrives at a Cloudflare data centre, the runtime checks whether a warm isolate already exists for your Worker. If one does, the request routes directly to it with no cold start whatsoever. The isolate is already loaded in memory, your code is already compiled to machine bytecode, and execution begins immediately.

If no warm isolate exists, the runtime creates one. This is where the isolate model pays off. The V8 engine is already running, shared across all Workers in that data centre. Creating your isolate means allocating memory inside that existing engine and loading your compiled code, not booting an operating system or starting a process. Cloudflare reports isolate creation under 5ms, with many cold starts under 3ms. By comparison, AWS Lambda cold starts are typically 100ms to over 1s depending on runtime and package size, and Azure Functions show similar behaviour.

Once your isolate exists, it can handle multiple concurrent requests. The runtime routes subsequent requests to warm isolates when available, creating new isolates only when existing ones are at capacity. Idle isolates are evicted after a short period to reclaim memory, but the recreation cost is so low that this cycle rarely matters in practice because recreating an isolate takes under 5ms, imperceptible to users.

The cold start mitigation strategies common on other platforms (provisioned concurrency, scheduled warming requests, keep-alive pings) become unnecessary overhead on Cloudflare. The platform's baseline behaviour already provides what those strategies attempt to achieve, eliminating an entire category of operational complexity.

Memory: 128 MB and what that means

Each isolate running your Worker can consume up to 128 MB of memory, encompassing the JavaScript heap and any WebAssembly linear memory. This is a hard, non-configurable limit that cannot be adjusted through pricing tiers or support requests. If your workload routinely requires more than 128 MB per request, Workers are the wrong tool for that workload.

Hard Memory Limit

The 128 MB limit cannot be increased. If your workload needs more memory, use Containers (up to 12 GB) instead of trying to optimise around this constraint.

The subtlety is that this limit applies per isolate, not per request. A single isolate can handle many concurrent requests, and those requests share the 128 MB pool. Memory-heavy workloads face compounding pressure under load, precisely when resource constraints hurt most.

Rather than fighting this constraint, design around it through streaming and incremental processing. Stream large payloads through R2 rather than buffering them in memory. Process data incrementally rather than loading entire datasets into memory simultaneously. If you genuinely need more memory per execution unit, Containers provide up to 12 GB, though you pay for that expanded capacity through cold start time, operational complexity, and higher cost.

CPU time versus wall time

Workers measure CPU time separately from wall time, and understanding this distinction fundamentally changes how you think about cost and design.

CPU time represents the actual execution of instructions on a processor. Wall time represents the total elapsed time from request arrival to response completion. This distinction matters because when your Worker makes a subrequest to an external API and waits for the response, wall time accumulates but CPU time does not. Similarly, when your Worker queries D1 and waits for results, wall time accumulates while CPU time remains zero.

The free tier limits CPU time to 10 milliseconds per request, sufficient for testing simple logic but not representative of production capabilities. Paid plans provide 30 seconds by default, extendable to 5 minutes through configuration. Wall time is unlimited on all plans. This means a Worker can orchestrate a complex operation that takes minutes of real time (making multiple API calls, waiting for webhooks, coordinating with external services) while consuming only milliseconds of billable CPU time.

This model inverts the economics of I/O-heavy workloads compared to traditional serverless platforms. In traditional serverless, you pay for wall time regardless of what your code is doing; an AWS Lambda function that spends 10 seconds waiting for a slow database query costs the same as one that spends 10 seconds computing. In Workers, waiting is free; only actual compute time on the processor triggers charges.

The strategic design implication is to prefer I/O operations over computation wherever the trade-off exists. Offload heavy computation to purpose-built services rather than implementing it in your Worker. Make subrequests to external services rather than reimplementing their logic locally. The platform's billing model rewards this design pattern economically.

Economic Insight

Workers charge for CPU time, not wall time. Waiting for I/O is free. A Worker that waits 2 seconds for an API response while computing for 20ms costs the same as one that completes in 20ms with no I/O. This inverts traditional serverless economics where waiting time is billable.

Resource constraints beyond memory and CPU

Paid plans default to 10,000 subrequests per invocation, configurable up to 10 million through the limits.subrequests setting in your Wrangler configuration. A subrequest is any outbound call your Worker makes: fetching from external APIs, reading from bound resources like D1 or KV, or invoking other Workers. Most workloads stay well under the default ceiling, but the configurability matters for long-running Workers handling open WebSockets on Durable Objects or executing extended Workflows, where the old fixed limit of 1,000 was a genuine constraint. You can also set a lower limit to protect against runaway code or unexpected costs, making subrequests a tuneable safety valve rather than a hard wall.

Free plans remain more constrained: 50 external subrequests and 1,000 subrequests to Cloudflare services per invocation.

Single-threaded execution within requests

Each request to your Worker executes in a single-threaded context. There is no parallelism within a single request; you cannot spawn threads or use parallel processing primitives to speed up a single operation. If you need to perform three independent operations, you can await them concurrently using Promise.all() which allows concurrent I/O operations, but the JavaScript execution itself remains single-threaded.

For parallelism across different requests, the platform scales automatically without any configuration. A million simultaneous requests will execute across many isolates on many machines in many data centres globally. You need not configure scaling, think about capacity planning, or manage resources. The platform handles all of this transparently.

For parallelism within a single request when you need concurrent computation, you can dispatch work to multiple Workers using service bindings. Each Worker executes independently on potentially different machines, and results are combined by the calling Worker. This approach introduces more overhead than threading but enables work to happen across the distributed system rather than being confined to one isolate.

Global by default

When you deploy a Worker, it deploys to every Cloudflare data centre simultaneously. As of February 2026, this means over 300 cities in more than 100 countries. There is no region selector because there is no region selection. Your code runs everywhere.

This isn't marketing abstraction or optimistic documentation. A user in São Paulo hits Cloudflare's São Paulo data centre; a user in Singapore hits Singapore. The same code runs in different places simultaneously with no configuration or coordination required from you.

Why this changes everything

The traditional cloud model assumes a fundamental trade-off that Cloudflare eliminates: you must choose where to run your code, and users far from that location experience latency. Multi-region deployment mitigates this burden but introduces complexity: you need to manage replication, handle failover, reason about consistency across regions, and pay for capacity in locations where you might not need it.

Cloudflare's model eliminates this trade-off entirely for stateless workloads. A Worker serving API responses runs equally well in Tokyo and Toronto with no manual region selection required. A Worker transforming requests runs at the edge closest to each user, automatically and without configuration. You have no region selection dropdown, no multi-region configuration to manage, and no cross-region consistency concerns because there's no region to select.

The architectural opportunity is to push logic to the edge that previously required centralised servers. Authentication checks that add hundreds of milliseconds when performed in a distant data centre become single-digit-millisecond operations when performed at the edge. Personalisation logic that requires a round-trip to origin can execute alongside the user's request. API composition that normally requires orchestrating calls from a single location can happen at the network edge, cutting latency to each backend service.

Placement control: when global isn't optimal

Not every workload benefits from running close to users. A Worker making seven calls to a PostgreSQL database in Frankfurt spends most of its time waiting for round-trips. Running it in Sydney for an Australian user means seven trans-Pacific hops. Running it in Frankfurt means one round-trip from the user, seven fast local calls, and one round-trip back.

Cloudflare offers two approaches to placement optimisation. Smart Placement analyses your Worker's traffic patterns and automatically places execution close to your backend infrastructure. Explicit placement hints let you specify exactly where your Worker should run when you know your backend's location.

Deployment Decision

Smart Placement runs Workers near backends instead of near users when that reduces total latency. Enable it when your Worker makes multiple calls to geographically concentrated services. For known, fixed backend locations, explicit placement hints targeting specific cloud regions (like aws:us-east-1) provide more precise control.

Smart Placement's decision isn't static. Cloudflare continuously monitors your Worker's behaviour and adjusts placement as patterns change. If your backend traffic shifts from predominantly European to predominantly Asian destinations, placement adapts accordingly. You don't manage this; the platform handles it.

Explicit placement hints work differently: you tell Cloudflare exactly where to run by specifying a cloud region identifier or by exposing your infrastructure to placement probes. This is useful when you know your backend won't move and want deterministic placement without waiting for Smart Placement to learn your traffic patterns.

The decision of which to use follows a clear heuristic: if your Worker calls a single backend in a known, fixed location, explicit hints provide immediate optimal placement. If you have multiple backends, backends that might move, or you're unsure of optimal placement, Smart Placement adapts automatically. If your Worker is computationally focused with minimal backend calls, or if your backends are themselves globally distributed, the default user-proximity placement is probably better.

Jurisdictional restrictions

Global deployment creates compliance challenges. GDPR requires certain data to stay within the EU. Other regulations impose similar geographic restrictions. Cloudflare addresses this through jurisdictional controls that constrain where your code executes and where data persists.

For Durable Objects, Cloudflare's stateful compute primitive, you can restrict both execution and storage to specific jurisdictions. A Durable Object created with a jurisdiction of eu will only run in Cloudflare's EU data centres and will only store data in EU locations. The object remains globally accessible (a user in Japan can still interact with it) but all processing and storage happens within the EU.

For Workers without persistent state, Regional Services restricts which data centres will process requests, ensuring that request data isn't processed outside specified regions.

These controls add latency for users outside the specified jurisdiction, but they provide the compliance guarantees that make Cloudflare viable for regulated workloads.

The binding model

Resources in Cloudflare connect to your code through bindings: named references configured in your project that become properties on the environment object passed to your Worker.

src/worker.ts
export default {
  async fetch(request: Request, env: Env) {
    // env.DB is a D1 database binding
    const result = await env.DB.prepare("SELECT * FROM users").all();

    // env.STORAGE is an R2 bucket binding
    const file = await env.STORAGE.get("config.json");

    // env.CACHE is a KV namespace binding
    const cached = await env.CACHE.get("recent-queries");

    return new Response(JSON.stringify(result));
  }
};

This model differs fundamentally from connection strings, environment variables containing credentials, or SDK initialisation patterns common in other platforms.

Configuration, not code

A binding is a contract that env.DB will be a D1 database at runtime. Which specific database that refers to depends entirely on configuration, not on your code itself. In production, your wrangler.toml (the configuration file for Wrangler, Cloudflare's CLI tool) specifies your production database ID. In local development, Wrangler provides a local SQLite file or connects to a remote staging database, depending on your configuration. In tests, you can bind a mock implementation. Regardless of all these variations, the code (env.DB.prepare(...)) never needs to change.

This separation enables environment portability that other platforms achieve only through careful discipline and discipline doesn't always hold. You avoid juggling environment variables for different contexts. You eliminate conditional client initialisation based on the execution environment. You remove the risk of accidentally shipping code with staging credentials embedded. The binding abstracts the resource identity, and configuration determines what that binding refers to, keeping concerns separate.

No network overhead for bound resources

When your Worker calls a bound resource (a D1 database, a KV namespace, a Durable Object), that call bypasses the public internet entirely. Typically it stays within the same machine; at worst, it routes through Cloudflare's internal network within the same data centre.

This matters for performance, but it matters more profoundly for how you reason about your application's architecture. In traditional cloud environments, every service call implies network latency, failure modes, and security considerations that shape design. You batch calls to amortise overhead. You add caching layers to avoid round-trips. You design defensively for network partitions.

On Cloudflare, calling a bound resource is closer to calling a local function than calling a remote service over a network. The failure modes, latency characteristics, and cost model differ substantially. You can make many small calls rather than batching them into few large calls. You can query iteratively rather than constructing complex single queries upfront. Efficiency still matters, but the cost calculus differs from what your hyperscaler instincts may suggest.

SSRF immunity through design

Bindings eliminate an entire category of vulnerability that plagues traditional architectures. Server-Side Request Forgery attacks trick a server into making requests to internal services on the attacker's behalf. Imagine a social media application where users can set their avatar by providing a URL. The server fetches that URL and stores the image. If an attacker provides a URL like https://internal-auth-service/admin/users, the server might fetch internal data and expose it through the avatar response.

Security by Design

Bindings prevent SSRF attacks architecturally, not through policy. Internal services accessed via env.SERVICE aren't addressable through URLs that attackers could provide to fetch(). The attack surface doesn't exist.

On traditional platforms, defending against SSRF requires careful input validation, URL allowlisting, and network segmentation. On Workers, the traditional SSRF attack surface doesn't exist in the same way. Internal services are accessed through bindings (env.AUTH_SERVICE.fetch()) rather than through URLs passed to the global fetch() function. There is no URL an attacker can provide that reaches your internal services, because those services aren't addressable by URL from within your Worker. The global fetch() function reaches the public internet, while bindings reach bound resources. The two communication paths don't intersect.

This protection extends to private network access through VPC Services. When a Worker connects to resources in your VPC through a binding, requests route only to the specific configured service. The binding provides access to that service without exposing the rest of your private network. An attacker who compromises request handling in your Worker still cannot pivot to arbitrary internal hosts.

The separation is architectural, not policy-based. The model itself prevents the attack.

Service bindings for composition

Workers can bind to other Workers through service bindings, creating a type-safe, RPC-style interface without the overhead of HTTP serialisation or the latency of network traversal. When both Workers are co-located in the same data centre, service binding calls typically complete in under a millisecond, often in the 0.1 to 0.5 millisecond range. Compare this to the 50 to 200 milliseconds typical for equivalent external HTTP calls traversing the public internet with all its latency and overhead.

src/api-worker.ts
// In your main Worker
export default {
  async fetch(request: Request, env: Env) {
    // env.AUTH is a service binding to another Worker
    const user = await env.AUTH.authenticate(request);

    // env.PRICING is a service binding to another Worker
    const prices = await env.PRICING.getQuote(user.tier, items);

    return new Response(JSON.stringify({ user, prices }));
  }
};

When both Workers deploy to the same Cloudflare location, the service binding call stays within that location, often within the same process without leaving the machine. When they deploy to different locations (perhaps due to Smart Placement), Cloudflare routes the call through its internal network appropriately. Your code remains unchanged regardless of this placement.

This low-latency model enables decomposition without the costs that normally accompany microservice architectures. You can split a monolithic Worker into focused components, each independently deployable and testable, without introducing the inter-service latency costs or the operational complexity of service meshes.

Platform philosophy

Cloudflare's Developer Platform embodies specific design principles that shape what the platform excels at and what it deliberately avoids. Understanding the trajectory helps contextualise these principles.

How the platform evolved

Workers launched in 2017 as a way to run JavaScript at Cloudflare's edge, initially for request transformation and simple logic. The platform has expanded methodically in the years since, with each addition addressing real limitations that users encountered in production. KV arrived in 2018 for edge-native caching. Durable Objects launched in 2020 to solve coordination problems that stateless Workers couldn't address. D1 brought relational data to the edge in 2022. R2 eliminated egress fees entirely for object storage. Queues, Hyperdrive, and Workers AI followed in 2023. Workflows added durable execution in 2024.

The pattern is consistent throughout the platform's evolution: offering primitives that compose rather than monolithic services that prescribe patterns to users. Each addition solves a specific category of problem while remaining independent and composable with the others. This consistency isn't accidental; it reflects a deliberate architectural philosophy that rewards understanding the underlying model.

Horizontal scaling as first principle

The entire platform is architected for horizontal scaling from the start. D1 databases are limited to 10 GB each, but you can have 50,000 of them per account for multiple independent workloads. Durable Objects are designed for millions of small objects to distribute state, not a few large ones. Workers scale automatically without configuration to handle any request volume.

The 10 GB limit isn't a technical constraint Cloudflare couldn't engineer around; it's a deliberate design choice that makes good architecture the easy architecture. A SaaS application doesn't create one monolithic database and struggle when it outgrows 10 GB eventually. Instead, it creates a database per tenant and never hits the limit because each tenant's database stays well under it. A real-time application doesn't create one Durable Object and struggle with coordination bottlenecks. It creates an object per user, per document, per game session, and lets the platform handle distribution across its global infrastructure.

Mental Model Shift

Think of D1 databases like rows, not like servers. Creating thousands is the intended pattern.

The mental shift is from "how do I make this resource big enough" to "how do I shard this workload correctly." The latter question leads to architectures that scale naturally with usage.

Primitives over managed services

Cloudflare deliberately provides primitives (Workers, Durable Objects, D1, R2, Queues) rather than high-level managed services that abstract those primitives. You don't get a managed GraphQL API layer or a pre-built authentication service or a fully-featured CMS system. Instead, you get compute, storage, and coordination primitives that you compose together into solutions tailored to your specific needs.

This approach has a name in distributed systems literature: unbundling. Traditional databases bundle many concerns into one monolithic system: storage, indexing, caching, query execution, transaction management, and replication all together. Cloudflare deliberately unbundles these capabilities. D1 provides relational storage, KV provides edge caching, Durable Objects provide coordination, R2 provides object storage, and Queues provide asynchronous messaging. Each primitive does one thing well and composes with the others.

The trade-off is genuine and worth understanding. With a traditional database, creating an index is a single command, and the database handles keeping it synchronised automatically. With unbundled primitives, maintaining derived data requires explicit code. You implement authentication rather than merely configuring it. You build your API layer rather than selecting from pre-built options.

The benefit is flexibility and composability that monolithic systems cannot provide. You can combine primitives in ways that an integrated system wouldn't support because its designers didn't anticipate your use case. The primitives don't constrain you to patterns their creators anticipated. For organisations with strong engineering capability and specific requirements, primitives often fit better than managed services that almost but not quite match your needs. For organisations seeking rapid deployment of standard patterns with minimal customisation, the hyperscaler managed services may provide faster initial time to value.

Edge-native, not edge-adapted

The platform was built from the ground up for edge deployment, not adapted after the fact from centralised architectures. This native design shows in decisions that wouldn't make sense for traditional centrally-deployed systems.

The execution model assumes your code runs in hundreds of locations simultaneously from the moment you deploy. Durable Objects provide global coordination without requiring you to think about multi-region replication or consistency. The cost model charges for actual compute rather than provisioned capacity sitting idle. Deployment is instantaneous and global because the architecture doesn't require propagating container images to regional clusters.

Edge-adapted services represent traditional cloud offerings extended to edge locations after initial design, and they carry the assumptions of their centralised origins throughout their design. They often require you to manually select edge locations, actively manage edge-to-origin relationships, and consciously handle eventual consistency between edge caches and authoritative stores.

Cloudflare's edge-native approach eliminates these operational concerns entirely for workloads that fit its model, but it also means that architectural patterns assuming centralised infrastructure need fundamental rethinking.

Mental model shifts

Coming from AWS, Azure, or GCP, certain assumptions embedded in those platforms need explicit reconsideration on Cloudflare because the mental models differ fundamentally.

From region selection to automatic placement

The fundamental hyperscaler question, "which region should I deploy to?", has no Cloudflare equivalent because code doesn't deploy to a region. Code deploys everywhere. Data services may have placement implications when you need to control where state lives, but compute placement happens automatically.

This approach eliminates the multi-region complexity that enterprises spend significant effort managing on hyperscalers, which translates to operational simplification. It also eliminates the multi-region control that some enterprises require for specific reasons. If you need to ensure code runs only in specific jurisdictions for regulatory reasons, Cloudflare provides that control through jurisdictional restrictions, but it's opt-in restriction, not default selection.

From capacity planning to automatic scaling

Another hyperscaler question, "how many instances do I need?", has no Cloudflare equivalent. Workers scale automatically to match demand without configuration. You need not specify minimum instances, maximum instances, target utilisation levels, or scaling policies.

This automatic approach eliminates both over-provisioning waste and under-provisioning risk simultaneously. It also eliminates the ability to constrain costs artificially through capacity limits, which is both a feature and a constraint. If your Worker processes a million requests, you pay for a million requests; there's no "stop at 100,000" circuit breaker configuration. Cost control must happen through architectural design choices and continuous usage monitoring, not through capacity constraints.

From connection pools to bindings

The typical hyperscaler pattern of initialising connection pools, managing credentials, and handling connection lifecycle throughout the process has no Cloudflare equivalent for Cloudflare-native resources. Bindings abstract away this entire category of infrastructure concern.

For external resources (databases not on Cloudflare, third-party APIs), you still manage connections yourself. Hyperdrive provides connection pooling for external PostgreSQL and MySQL databases, centralised at the edge. For everything else, you handle it in code, with the understanding that Workers are ephemeral and short-lived; you shouldn't hold persistent connections across requests.

From cold start mitigation to default behaviour

The hyperscaler practice of provisioned concurrency, scheduled warming requests, and architecture specifically designed to avoid cold starts has no Cloudflare equivalent because these techniques are entirely unnecessary. Cold starts are sub-5-millisecond by default on Cloudflare.

This approach eliminates an entire category of operational complexity and cost that hyperscaler users accept as normal. It also means that certain architectural patterns designed specifically around cold-start avoidance (keeping Lambda functions warm with scheduled pings, using provisioned concurrency for latency-sensitive paths) become unnecessary overhead on Cloudflare and should be abandoned.

When the model fits

The platform excels at request/response patterns (API endpoints, web applications, webhook processors) where each request handles independently and benefits from global deployment without configuration. It's equally strong for coordination-heavy workloads such as real-time collaboration, game state, rate limiting, and session management. If you need single-threaded, strongly consistent state accessible from anywhere in the world, Durable Objects are essentially the only primitive on any cloud platform that provides it without significant operational complexity.

I/O-heavy workloads that spend most of their time waiting on external services benefit significantly from the CPU-time billing model. This makes API aggregation and webhook orchestration economically attractive compared to wall-time billing. Latency-sensitive global applications see immediate improvements in end-user experience without multi-region management overhead.

When it doesn't fit

The platform struggles when your workload fundamentally contradicts its underlying assumptions.

Memory-intensive operations requiring more than 128 MB per execution unit cannot run in Workers and will fail. Large file processing, complex data transformations, and certain ML inference tasks need Containers with their longer cold starts or external compute.

CPU-intensive operations requiring more than 5 minutes of continuous computation cannot complete in a single Worker invocation. Video transcoding, complex simulations, and large-scale data processing need architectures designed differently, either using Containers or distributing work across multiple Workers.

Workloads requiring specific geographic placement (not for compliance reasons but purely for latency to specific backends) need Smart Placement configuration and may not benefit from global deployment at all, contradicting the model's assumptions.

Traditional database patterns expecting a single large database with complex relational queries spanning the entire dataset may fight D1's 10 GB-per-database model persistently. However, Hyperdrive connects Workers to external PostgreSQL and MySQL databases with connection pooling and query caching at the edge, providing a production-ready alternative for workloads that don't fit D1's horizontal scaling model.

What comes next

Chapter 2 addresses strategic assessment at the organisational level, asking whether your organisation should adopt Cloudflare and what the full implications are beyond the technical. That chapter provides evaluation frameworks, migration playbooks, and honest discussion of when hyperscalers remain the better choice.

Chapters 3 through 5 cover Workers in depth, including the compute model, full-stack application patterns, and the local development and testing workflows that determine whether your team will be productive on this platform.

Chapters 6 through 9 address stateful systems, covering Durable Objects for coordination, Workflows for durable execution, Queues for asynchronous processing, and Containers for escaping isolate constraints.

The platform rewards understanding. Teams that grasp the isolate model design better applications. Teams that understand binding semantics write more portable code. Teams that internalise horizontal scaling assumptions build architectures that grow gracefully.

The goal of this book is to provide that understanding, not as documentation of features, but as the architectural insight that Cloudflare's own solutions architects wish existed when helping enterprise customers succeed.

The V8 isolate model​

What an isolate actually is​

What happens when a request arrives​

Memory: 128 MB and what that means​

CPU time versus wall time​

Resource constraints beyond memory and CPU​

Single-threaded execution within requests​

Global by default​

Why this changes everything​

Placement control: when global isn't optimal​

Jurisdictional restrictions​

The binding model​

Configuration, not code​

No network overhead for bound resources​

SSRF immunity through design​

Service bindings for composition​

Platform philosophy​

How the platform evolved​

Horizontal scaling as first principle​

Primitives over managed services​

Edge-native, not edge-adapted​

Mental model shifts​

From region selection to automatic placement​

From capacity planning to automatic scaling​

From connection pools to bindings​

From cold start mitigation to default behaviour​

When the model fits​

When it doesn't fit​

What comes next​