Chapter 22: Architectural Patterns and Reference Designs

What patterns should I follow for common architectural challenges?

Part VI established operational foundations: cost management, observability, and secure deployment. Part VII applies everything from this book to real architectural challenges.

Patterns encode judgment. The code is the easy part. Knowing when to apply which pattern separates architecture from implementation. This chapter presents architectural patterns adapted for Cloudflare's edge platform, not as recipes to follow blindly, but as decision frameworks for your specific constraints.

Each pattern exists because it solves a real problem. But patterns have costs: complexity, latency, operational overhead. The goal isn't applying patterns because they're elegant. It's applying them because they're appropriate. Sometimes the right answer is no pattern at all.

The virtue of boring

Before exploring patterns, internalise this principle: boring is good. Software systems should be predictable, doing what they're supposed to do without surprises, puzzles, or heroics. The lack of excitement and suspense is actually a desirable property of source code, unlike a detective story.

This principle guides pattern selection. The right pattern isn't the cleverest or most elegant; it's the one that makes your system boring. Predictable. Understandable. Debuggable by someone at 3am who's never seen the code.

When evaluating patterns, ask whether they make the system easier or harder to understand, introduce complexity that will confuse future maintainers, or solve problems you actually have.

Clever architectures become legacy nightmares while simple architectures endure. When in doubt, choose boring.

Establishing your latency budget

Before selecting patterns, establish your latency budget. A checkout flow might allow two seconds total; a real-time game might require sub-100ms response. Work backwards from your budget to determine which patterns you can afford.

Latency costs on Cloudflare: service binding to a colocated Worker adds under 1ms. Service binding to a Worker in another location adds 5-15ms depending on distance. Durable Object call adds 5-10ms in the same region as the user, 100-200ms cross-continental. External API call varies wildly: 50ms to a nearby service, 300ms or more to a distant one, plus processing time.

If your budget is 200ms and your backend database call takes 150ms, you have 50ms for everything else. A gateway Worker with rate limiting might consume 15ms, while a cross-continental Durable Object call for user session data would blow the budget entirely. These numbers should drive your architecture, not the other way around.

The patterns in this chapter have different latency profiles: gateway patterns add a hop but enable caching that can eliminate hops entirely; event-driven patterns trade immediate latency for eventual processing; collaboration patterns accept cross-continental latency as the cost of strong consistency. Know your budget before choosing your pattern.

Choosing your pattern

Before diving into individual patterns, consider what problem you're solving.

Problem	Pattern	Cloudflare Primitive	When to Avoid
Centralised auth, routing, rate limiting	API Gateway	Worker + service bindings	Simple apps with single backend
Client-specific API shapes	Backend for Frontend	Worker per client type	Small teams, stable API
Real-time synchronisation	Collaboration	Durable Objects	Low-frequency updates, tolerance for polling
Multi-step reliable processes	Saga	Workflows	Simple operations that can retry atomically
Decoupled async processing	Event-driven	Queues	Synchronous requirements, ordering guarantees

The patterns build on primitives covered earlier: Workers for compute, Durable Objects for coordination, Workflows for durability, Queues for decoupling. If you haven't internalised those primitives, the patterns will feel arbitrary. If you have, they'll feel inevitable.

API gateway pattern

Every Worker is already a gateway. It receives requests, processes them, returns responses. The pattern question isn't whether to have a gateway; it's whether to make it explicit.

Why the pattern exists

Traditional API gateway concerns (authentication, rate limiting, routing, transformation) don't disappear on Cloudflare. They concentrate. A dedicated gateway Worker provides a single point for cross-cutting concerns, consistent policy enforcement, and a clear boundary between external traffic and internal services.

The architectural question is where trust lives. In a gateway pattern, backends trust the gateway completely; they don't re-validate authentication. This is the centralisation payoff. If backends must re-validate anyway, the gateway is just an expensive proxy adding latency without value.

What's different at the edge

Gateway patterns on traditional infrastructure carry latency costs because each hop (request to gateway, gateway to backend, backend response, gateway transformation) adds milliseconds. Architects balance centralisation against latency, often accepting inconsistent auth implementations to avoid the gateway tax.

Cloudflare changes this calculation fundamentally. Workers have near-zero cold starts, so the gateway doesn't add startup latency. Service bindings to backend Workers bypass the public internet entirely; when services are colocated, a binding call adds under 1ms. The latency argument against gateway patterns largely disappears.

The decision framework

Make the gateway explicit when: multiple backend services need consistent authentication, rate limiting must be globally consistent, you need request/response transformation, or observability requires a single instrumentation point. Keep routing implicit when: single backend service, auth handled externally through Cloudflare Access, the gateway would be pass-through with no logic, or every millisecond matters and you've measured the gateway's cost.

Complexity threshold matters too. One backend? Let the Worker be the gateway. Two or three backends with shared auth? Consider a gateway. Five or more backends with different auth and rate-limit policies? Definitely build a gateway.

Implementation essence

A gateway reduces to three concerns: authenticate, decide, route. The code is less interesting than the architectural commitment it represents.

src/gateway.ts
export default {
  async fetch(request: Request, env: Env) {
    const user = await authenticate(request, env);
    if (!user && requiresAuth(request)) {
      return new Response("Unauthorized", { status: 401 });
    }

    if (!await checkRateLimit(user?.id || getClientIP(request), env)) {
      return new Response("Too Many Requests", { status: 429 });
    }

    return route(request, user, env);
  }
};

The authenticate function validates tokens at the edge, no round-trip to an auth server. The checkRateLimit function calls a Durable Object maintaining per-key counters with global consistency. The route function dispatches to backend services via bindings, injecting user context as headers that backends trust implicitly.

Backend services don't re-validate authentication. They trust the gateway. This trust relationship is the architecture. Not comfortable with that trust? You don't want this pattern.

Trade-offs to accept

Gateway centralisation means gateway outages affect everything. Your gateway Worker becomes critical infrastructure. Test thoroughly, monitor obsessively, accept that Cloudflare issues take down your gateway with everything else.

The gateway becomes a coordination point for deployments. Changing auth logic or rate limits requires gateway deployment, not backend deployment. For some teams, this enables centralised policy management; for others, it's friction slowing backend teams.

Rate limiting at the edge

Rate limiting deserves special attention because it illustrates the edge advantage most clearly, and because getting it wrong creates either security vulnerabilities or latency problems.

The centralisation problem

Traditional rate limiters require a centralised counter. Every request must atomically increment and check a counter somewhere. On AWS, this typically means ElastiCache Redis in a single region or DynamoDB with conditional writes. Both create a latency hotspot. A user in Sydney rate-limited against a Redis cluster in us-east-1 adds 200ms of network latency to every request, just for the rate limit check. The rate limiter becomes slower than the operation it's protecting.

Multi-region Redis provides read replicas, but writes still go to the primary. During the replication window, rate limits leak; requests slip through because the replica hasn't seen the latest count. Accept the leakage or pay the latency tax. No third option.

How Durable Objects change the calculus

Durable Objects provide atomic counters with automatic global routing. Each rate limit key (user ID, API key, IP address) maps to a Durable Object. When a request arrives, regardless of which edge location receives it, the platform routes to that key's Durable Object automatically. The object might live in Frankfurt, Sydney, or Virginia; routing is transparent.

The architectural difference is fundamental: hyperscaler rate limiting routes to a central store; Cloudflare rate limiting routes to a distributed actor. The former creates a global bottleneck; the latter eliminates it.

For a user in Sydney hitting their own rate limit object (likely near Sydney because that's where they first made requests), the check adds perhaps 10ms. Same user hitting an object in London: 150ms. But with proper key design, users usually hit objects near them. A per-user rate limit object gets created when that user first requests, in the region nearest to them. Subsequent requests route to that same nearby object.

When this matters

The difference between 10ms and 200ms for a rate limit check determines whether rate limiting is invisible or dominates your latency budget. For APIs serving global users with per-user rate limits, Durable Objects make rate limiting essentially free. For APIs where all users share a single global rate limit, you're back to the centralisation problem: all requests route to one object, and distant users pay the latency cost.

Design rate limit keys accordingly. Per-user limits work beautifully. Per-endpoint global limits create hotspots. Per-user-per-endpoint limits give you the best of both worlds but require more Durable Objects.

Backend for frontend

Different clients need different API shapes. Mobile apps on cellular connections need minimal payloads. Web applications need richer data. Internal tools need everything. The Backend for Frontend pattern provides client-specific facades that aggregate and transform data for each client's needs.

Why the pattern exists

Generic APIs serve the lowest common denominator, returning too much data (wasting bandwidth for mobile clients) or too little (forcing multiple round-trips from web clients). BFF acknowledges that different clients have legitimately different needs and provides APIs shaped for each.

But here's what's often left unsaid: BFF is fundamentally an organisational pattern disguised as a technical one. It works when you have separate teams for separate clients (a mobile team owning the mobile BFF, a web team owning the web BFF), but without that team structure, you're just adding API surfaces to maintain.

What's different at the edge

Traditional BFF implementations run in your data centre, close to backends but far from users. The BFF aggregates data efficiently because backends are nearby, but still ships the full response across the internet to the client.

Edge BFF inverts this model. Your BFF Worker runs close to the user, potentially far from backends, which seems worse (doesn't it add latency to backend calls?), but often improves overall latency.

A mobile client in São Paulo making four sequential calls to backends in US-East experiences 4 × 180ms = 720ms minimum, assuming instant backend response. Same client calling a BFF in São Paulo, which makes four parallel calls to US-East, experiences 180ms for the slowest parallel fetch plus perhaps 20ms for the local response. Total: 200ms versus 720ms. The BFF is further from backends, but the client is closer to the BFF, and parallelisation eliminates the sequential penalty.

This advantage scales with backend calls and distance between client and backend. For clients on good connections close to backends, the advantage shrinks. For mobile clients on poor connections far from backends, edge BFF is transformative.

The decision framework

Use BFF when clients have genuinely different data needs, not just preferences but fundamentally different requirements driven by bandwidth constraints, screen size, or update cycles. Use it when mobile performance is critical and payloads must be minimal. Use it when you can staff separate teams for separate BFFs.

Avoid BFF when a single API shape serves all clients adequately, when one small team maintains everything, when client needs are similar enough that field selection suffices, or when GraphQL already provides the flexibility you need. GraphQL provides client-specific queries without client-specific servers, potentially what you need without the operational overhead of multiple BFFs.

Implementation essence

A BFF Worker is an aggregation point. It knows what its client needs and fetches exactly that.

src/bff-handlers.ts
async function getMobileDashboard(userId: string, env: Env) {
  // Mobile needs summary only: single query, minimal payload
  const summary = await env.DB.prepare(`
    SELECT COUNT(*) as orders, SUM(total) as revenue
    FROM orders WHERE user_id = ?
  `).bind(userId).first();

  return Response.json({ summary });
}

async function getWebDashboard(userId: string, env: Env) {
  // Web needs detail: parallel fetches, rich payload
  const [summary, recentOrders, topProducts, notifications] = await Promise.all([
    getSummary(env, userId),
    getRecentOrders(env, userId),
    getTopProducts(env, userId),
    getNotifications(env, userId)
  ]);

  return Response.json({ summary, recentOrders, topProducts, notifications });
}

The mobile endpoint makes one database call and returns kilobytes. The web endpoint makes four parallel calls and returns tens of kilobytes. Same underlying data, radically different shapes. Each BFF knows its client intimately; when client needs change, the BFF changes with them.

A/b testing at the edge

A/B testing traditionally requires client-side JavaScript that flickers as variants load, or server-side infrastructure that adds latency to every request. Edge A/B testing eliminates both problems: the Worker intercepts the request, assigns the user to a variant, and routes to the appropriate content before the response begins.

Why edge a/b testing

The latency advantage is significant. Traditional server-side A/B testing requires a round-trip to a central assignment service before routing can occur. Edge A/B testing makes the assignment decision in the same location handling the request, adding sub-millisecond overhead rather than tens of milliseconds.

Same-URL testing becomes straightforward. Both control and test variants serve from identical URLs. The Worker routes to appropriate origins based on user assignment, eliminating client-side variant detection and the associated flicker.

The assignment pattern

The core pattern uses cookies for assignment persistence and KV for experiment configuration:

Edge A/B testing assignment
export default {
  async fetch(request: Request, env: Env) {
    const url = new URL(request.url);
    const experiment = await env.EXPERIMENTS.get(url.pathname, "json");
    if (!experiment) return fetch(request);  // No experiment for this path

    // Check for existing assignment
    const cookies = parseCookies(request.headers.get("Cookie") || "");
    let variant = cookies[`exp_${experiment.id}`];

    if (!variant) {
      // Assign based on configured weights
      const rand = Math.random();
      let cumulative = 0;
      for (const [v, weight] of Object.entries(experiment.variants)) {
        cumulative += weight;
        if (rand < cumulative) { variant = v; break; }
      }
    }

    // Route to variant's origin
    const origin = experiment.origins[variant];
    const response = await fetch(new Request(origin + url.pathname, request));

    // Set assignment cookie if new
    if (!cookies[`exp_${experiment.id}`]) {
      const headers = new Headers(response.headers);
      headers.append("Set-Cookie",
        `exp_${experiment.id}=${variant}; Path=/; Max-Age=${experiment.duration}`);
      return new Response(response.body, { ...response, headers });
    }

    return response;
  }
};

Configuration-code separation

Store experiment configuration in KV, completely decoupled from Worker deployments. Traffic allocation changes, experiment activation, and targeting rule updates propagate globally without redeploying code.

This separation enables marketing and product teams to manage experiments without engineering involvement for routine changes. The Worker code handles mechanics; KV holds business logic.

When edge a/b testing fits

Use edge A/B testing when every request must be assigned quickly and you control the serving infrastructure. It excels at page-level experiments (different landing pages), feature flag evaluation, and traffic splitting for canary deployments.

Consider alternatives when you need sophisticated analytics integration (most analytics platforms have their own A/B testing), when experiments span multiple domains or platforms, or when you're already invested in a feature flag service that handles assignment.

Real-time collaboration

Real-time features (live cursors, collaborative editing, multiplayer games, presence indicators) require persistent connections and consistent state. Durable Objects eliminate traditional WebSocket architecture complexity by providing both.

Why traditional WebSocket architecture is complex

WebSocket connections are stateful, but traditional servers are stateless. This fundamental mismatch creates operational complexity: you need sticky sessions or a pub/sub layer to route messages to the right server; scaling requires managing connection distribution; failover means reconnecting clients to new servers while preserving state. Most teams either accept the complexity or reach for managed services like Pusher or Ably.

Durable Objects collapse this complexity entirely. Each collaboration session maps to a single Durable Object holding authoritative state and managing all connections, with no routing layer (the platform handles that) and no sticky session configuration (each object is inherently sticky as a globally-unique instance).

The output gating guarantee

Output gating (Chapter 6) matters profoundly for real-time applications. When a Durable Object writes to storage, all outbound communication blocks until the write is durable. For collaboration, a message isn't sent to other participants until the state change it represents is persisted.

Traditional WebSocket architectures achieve this through complex acknowledgement protocols: server writes, waits for confirmation, then broadcasts. Durable Objects make durability-before-broadcast the default:

src/collaboration.ts
async handleEdit(userId: string, edit: Edit) {
  // Output gating: broadcast blocks until this write is durable
  await this.state.storage.put("document", applyEdit(this.document, edit));
  this.broadcast({ type: "edit", userId, edit });
}

The broadcast doesn't happen until the storage write is confirmed durable. You inherit this guarantee.

Geographic latency reality

Geographic latency exists and must be accounted for. If your collaboration room's Durable Object lives in London and a user connects from Tokyo, that user experiences 150-200ms round-trip latency for every interaction. For collaboration tools where users type and see results, this is usually acceptable (200ms feels slightly laggy but usable), but for real-time games requiring sub-50ms response, it's potentially disqualifying.

The placement algorithm helps when users are geographically clustered. The object is created where the first request originates, so if created by a user in Singapore with most participants in Asia, the object lives in Asia and everyone gets good latency. With globally distributed participants, someone pays the latency cost. Physics always wins.

The decision framework

Use Durable Objects for real-time when: authoritative server-side state needed, consistency matters more than raw throughput, you're already on Cloudflare and want platform integration, WebSocket hibernation can manage costs.

Consider managed services (Pusher, Ably) when: features beyond basic pub/sub needed (presence, history, channel management), team lacks distributed systems experience, larger ecosystem of client SDKs required.

Cost calculation deserves attention. A hibernating WebSocket connection costs very little; you pay when messages flow, not when connections idle. For applications with many connections but bursty traffic (collaboration tools, chat applications), hibernation makes Durable Objects cost-competitive with managed services. Constant high-frequency messages? Run the numbers carefully.

Trade-offs to accept

Durable Objects are single-threaded. One object handles all connections for one session. For most collaboration scenarios (dozens or hundreds of concurrent editors), fine. For massive fan-out with thousands of subscribers to a single channel, you need a distribution tree: one root object coordinates child objects that each handle a subset of connections.

Conflict resolution is your problem. The example above uses last-write-wins, losing concurrent edits. For true collaborative editing, you need CRDTs or operational transformation. Deep topics beyond this book's scope; study them separately. The platform provides primitives: single-threaded access, durable storage, output gating. The conflict resolution algorithm is your responsibility.

Event-driven architecture

Events decouple producers from consumers. Services emit events when state changes; other services react asynchronously. The pattern enables independent scaling, fault isolation, and system evolution without tight coordination.

Why events matter

Synchronous request-response creates coupling: when Service A calls Service B directly, A waits for B, fails if B fails, and must know where B lives. For simple systems, this coupling is fine; it's not inherently bad. But for complex systems with many services, coupling becomes brittle.

Events are for consequences, not requirements. If the caller needs to know whether an operation succeeded, that's a synchronous call. If the caller just needs to announce that something happened (order created, payment received, user signed up), that's an event. The caller moves on; interested parties react when ready.

What's different at the edge

Queue consumers on Cloudflare escape Workers' tightest constraints. Request-handling Workers cap at 30 seconds CPU time; queue consumers run up to 15 minutes wall-time. This difference isn't incidental; it's why Queues answer heavy processing problems. If your workload doesn't fit in a request handler, it likely fits in a queue consumer.

Cloudflare Queues provide at-least-once delivery, a weaker guarantee than exactly-once but the honest one. Truly exactly-once delivery in distributed systems is either impossible or prohibitively expensive, so you must design consumers to be idempotent: processing the same event twice produces the same result as processing once.

The idempotency imperative

At-least-once delivery means events may arrive multiple times. Without idempotency, you might send two confirmation emails or decrement inventory twice. The pattern: check whether you've processed this event; skip if yes; process and record if no.

src/event-processor.ts
async function processEvent(event: Event, env: Env) {
  const processed = await env.DB.prepare(
    "SELECT 1 FROM processed_events WHERE event_id = ?"
  ).bind(event.eventId).first();

  if (processed) return; // Already handled

  await handleEvent(event);
  await env.DB.prepare(
    "INSERT INTO processed_events (event_id) VALUES (?)"
  ).bind(event.eventId).run();
}

Order Matters for Crash Recovery

Handle first, record second. Record before handling and a crash after recording means the event never gets processed. Handle before recording and a crash after handling means the event gets processed twice, but your idempotent handler makes that safe.

Queues vs Workflows

Both handle asynchronous work but solve different problems. Queues process independent items where order doesn't matter and items don't depend on each other. Workflows orchestrate dependent steps where step 2 needs step 1's result and failures require compensation.

Processing a batch of independent tasks (uploaded files, notification deliveries, data imports)? Use Queues. Coordinating a multi-step process (order fulfilment, account provisioning, data migration)? Use Workflows. The saga pattern shows how Workflows handle coordination.

Saga pattern for distributed transactions

When operations span multiple services and must all succeed or all roll back, the saga pattern coordinates through compensating actions. Workflows provide the durable execution sagas need.

src/workflows/order-saga.ts
export class OrderSaga extends Workflow {
  async run(event: OrderCreated) {
    const reservation = await this.step.do("reserve-inventory", async () => {
      return await reserveInventory(event.items);
    });

    try {
      await this.step.do("process-payment", async () => {
        return await processPayment(event.customerId, event.total);
      });
    } catch (error) {
      // Payment failed: compensate by releasing inventory
      await this.step.do("release-inventory", async () => {
        return await releaseInventory(reservation.id);
      });
      throw error;
    }

    await this.step.do("complete-order", async () => {
      return await markOrderComplete(event.orderId);
    });
  }
}

A Workflow is a saga that survives your Worker crashing. The platform remembers which step you were on. Crash after reserving inventory but before processing payment? The Workflow resumes from the payment step when infrastructure recovers. Payment fails? The compensation logic runs. The saga completes correctly regardless of infrastructure failures.

Trade-offs to accept

Event-driven systems are harder to debug. Tracing a request through synchronous calls is straightforward: follow the stack. Tracing events through queues requires correlation IDs and distributed tracing. Invest in observability before you need it.

Eventual consistency is the default. When you emit an event, consumers process it eventually, typically within seconds, but not instantly. If downstream systems need immediate consistency, events are the wrong pattern.

Ordering isn't guaranteed. Queues process in approximate order, but you cannot rely on strict ordering. If order matters, use Workflows (execute steps sequentially) or design events to be order-independent.

Service composition

Complex applications comprise multiple services. How those services communicate affects latency, reliability, and operational complexity. Choices here ripple through your architecture.

The three-layer model

When decomposing a system, the three-layer architecture provides useful vocabulary: the interaction layer handles user-facing concerns (API gateways, BFFs, authentication); the control layer contains business logic and orchestration, deciding what to do based on requests and coordinating components; the mechanism layer provides capabilities (storage, external APIs, notification services).

A gateway Worker is interaction; a Workflow coordinating an order is control; D1 and R2 are mechanism. This framing helps decide where new functionality belongs and prevents mixing concerns (a gateway that also contains business logic, or a storage abstraction that also makes policy decisions). Unsure where code belongs? Ask which layer it serves.

Service bindings

Service bindings are why microservices on Cloudflare don't carry the latency tax they do elsewhere. Decompose freely. The latency argument against microservices (network calls between services add unacceptable overhead) doesn't apply when calls stay on Cloudflare's network and skip TLS negotiation entirely.

Service bindings aren't just faster than HTTP; they're a different kind of call. No public internet traversal. No DNS resolution, no connection pooling, no TLS negotiation. The cost is coupling: bound services must be Cloudflare Workers, sharing a deployment context in the sense that a binding targets a specific service name.

Configure bindings in wrangler.toml and use them like local services:

src/service-call.ts
const user = await env.USER_SERVICE.fetch(
  new Request(`http://internal/users/${userId}`)
).then(r => r.json());

The http://internal URL is convention; the request never hits the internet. The binding routes directly to the target Worker.

External apis

Calls to external APIs traverse the public internet and need different handling. Use aggressive timeouts; a Worker waiting on a slow external API still consumes resources. Have fallback strategies for when external services are slow or unavailable.

Circuit breakers add value when: external API has frequent partial outages, timeouts are expensive, you have a meaningful fallback (cached data, degraded functionality). If you can't do anything useful when the circuit is open, a circuit breaker just changes when you return an error, not whether you return one. Consider whether the complexity is worth it.

Composition patterns that combine

Patterns rarely exist in isolation. Common combinations: gateway plus rate limiting (almost every production API); BFF plus caching (BFF knows what data its client needs and can cache accordingly); event-driven plus saga (operations spanning services needing reliable completion); collaboration plus events (persist changes and maintain audit trail alongside real-time updates).

When combining patterns, watch for complexity accumulation. Each pattern solves a problem but adds operational surface. A system with five patterns is harder to debug than one with two. Add patterns when they solve problems you actually have, not problems you might theoretically encounter.

Anti-patterns to avoid

Patterns have failure modes. These anti-patterns show what happens when patterns are applied without judgment.

The everything gateway

A gateway containing business logic becomes a monolith at the edge. Gateways should handle cross-cutting concerns: authentication, rate limiting, routing. When business logic creeps in (validation rules, data transformation, conditional workflows), the gateway becomes a development bottleneck. Every feature change requires gateway deployment. Teams can't work independently.

The test: could you replace this gateway with a different implementation handling the same cross-cutting concerns? If business logic is embedded, you can't.

Premature event-driven

Using queues for operations that should be synchronous adds complexity without benefit. Not every operation needs decoupling. Caller needs to know whether operation succeeded? Operation is fast and reliable? No benefit to processing later? Use a synchronous call.

The smell: a queue consumer immediately processing every message with no batching benefit, no retry benefit, no decoupling benefit. You've added a queue for architectural purity rather than solving a real problem.

Durable Objects for everything

Durable Objects are powerful but not the only tool. Using them when KV or D1 would suffice means paying for coordination you don't need. KV handles configuration and caching. D1 handles relational data. Durable Objects handle coordination.

The test: does this use case require single-threaded access to mutable state? Just reading configuration? Use KV. Querying related data? Use D1. Need atomic read-modify-write cycles or real-time coordination? Use Durable Objects.

Saga without compensation

Implementing a saga that reserves resources without defining how to release them is a recipe for leaked state. Every saga step that acquires something (reserves inventory, charges a card, allocates capacity) needs a corresponding compensation step that releases it.

danger

Design compensations before acquisitions. If you can't figure out how to undo a step, reconsider whether it belongs in a saga.

Architecture decision records

Patterns represent judgment crystallised into reusable form. Architecture Decision Records capture the judgment behind specific choices: why you chose this pattern over that one, what trade-offs you accepted, what would make you reconsider.

ADRs matter because architectural decisions outlive the people who made them. New team member asks why you have separate databases per tenant? The ADR explains. Circumstances change and you wonder whether to revisit a decision? The ADR provides context.

ADR structure

A useful ADR answers five questions: What context prompted this decision? What alternatives did we consider? What did we choose and why? What trade-offs did we accept? What would make us revisit this?

Keep ADRs concise. A page is usually enough. Capture judgment, not documentation.

ADR: multi-database vs single database for tenants

Context: Multi-tenant SaaS application. Tenant data must be isolated; regulatory requirements prohibit any data leakage between tenants. We need to decide how to structure D1 databases.

Options considered:

Single database with tenant_id column: every table includes tenant identifier, every query filters by tenant. Application logic enforces isolation. Simpler operationally: one database to manage, cross-tenant queries possible for analytics. But query mistakes can leak data, noisy neighbours affect everyone, 10 GB limit applies to all tenants combined.

Separate database per tenant: each tenant gets their own D1 database, application routes based on authenticated tenant. Complete isolation: impossible to accidentally query wrong tenant. 10 GB limit per tenant, delete a tenant by deleting their database. But schema migrations must apply to all databases, no cross-tenant queries without application-level aggregation, more databases to manage.

External PostgreSQL with row-level security: Hyperdrive to managed PostgreSQL, row-level security policies enforcing isolation. Mature tooling, larger storage limits, familiar ecosystem. But adds database call latency, increases operational complexity, row-level security is notoriously complex and error-prone.

Decision: Separate D1 databases per tenant.

Rationale: Regulatory requirements make data isolation non-negotiable. Row-level security provides isolation in theory, but it's complex and a policy mistake leaks data silently. Separate databases make leakage impossible: no row to leak because rows exist in different databases.

The 10 GB per-tenant limit aligns with expected data volumes. Largest tenants might reach 5 GB; most under 1 GB. Effectively unlimited for our use case.

Schema migration complexity manageable through automation. Build tooling to apply migrations across all tenant databases with rollback capability.

Consequences accepted: No cross-tenant analytics without aggregation layer. Tenant onboarding requires database provisioning. Schema changes require careful rollout coordination.

Reconsideration triggers: Tenant needs more than 10 GB: partition their data or move to external PostgreSQL. Cross-tenant analytics becomes critical: need aggregation strategy.

ADR: Durable Objects vs external Redis for rate limiting

Context: Need globally consistent counters for rate limiting. Every API request must check and increment a counter. Counter must be consistent; can't allow requests to slip through because of eventual consistency.

Options considered:

Durable Objects: each rate limit key maps to a Durable Object maintaining the counter, handling increment/check atomically. Global consistency guaranteed, no external service to manage, co-location with Workers. Team needs to learn Durable Objects patterns; solution is Cloudflare-specific.

External Redis (Upstash or ElastiCache): Redis INCR for atomic increment. Team has Redis experience, rich data structures beyond counters, potentially cheaper at very high throughput. But adds external dependency with additional latency; global consistency requires either single region (penalises distant users) or multi-region with eventual consistency (defeats the purpose).

Decision: Durable Objects for rate limiting.

Rationale: Global consistency is the hard requirement. External Redis provides consistency within a region, but global consistency requires either single region (penalising distant users) or multi-region with eventual consistency (requests slip through during replication windows).

Durable Objects provide exactly what we need: atomic operations with global routing. Counter for user X always goes to the same object, regardless of edge location. User's rate limit object typically lives near them because created when they first requested from their location.

Consequences accepted: Team needs to learn Durable Objects patterns. Rate limit counters Cloudflare-specific, harder to reuse outside platform. No access to Redis data structures like sorted sets.

Reconsideration triggers: More complex rate limiting needed (sliding windows, leaky bucket): Redis data structures might be worth the consistency trade-off. Already using external Redis for other features: consolidation might simplify operations.

ADR: Queues vs Workflows for background processing

Context: Need to process uploaded files (validate, transform, store, notify). Processing takes 10-30 seconds per file. Users upload 1000-10000 files daily in bursts.

Options considered:

Queues: upload handler enqueues file reference, queue consumer processes files, failed files retry automatically. Simple mental model, automatic parallelism, lower cost per operation. But no visibility into processing state, hard to implement multi-step processing with partial rollback.

Workflows: upload handler creates Workflow instance with steps for validate, transform, store, notify. Each step durable and retried independently, dashboard visibility into state, compensation logic straightforward. But more expensive per operation, sequential by default.

Decision: Queues for this workload.

Rationale: File processing is fundamentally independent. Each file processes without reference to others. Partial failure doesn't affect other files. Sweet spot for Queues.

Workflows would add cost and complexity without benefit. Don't need per-file state visibility; aggregate success/failure metrics suffice. Don't need compensation; failed file simply doesn't appear in processed set, users can re-upload.

Consequences accepted: No per-file processing dashboard, only aggregate monitoring. Failed files require re-upload rather than automatic retry with compensation. Processing order not guaranteed.

Reconsideration triggers: Processing pipelines where step 2 depends on step 1's result. Users need visibility into individual file processing state. Failures require complex compensation with partial cleanup and notifications.

What comes next

Patterns provide reusable solutions. Decision records capture the judgment behind specific choices. Together, they form the architectural vocabulary for building on Cloudflare.

Chapter 23 focuses on multi-tenant and platform architectures: building systems that serve many customers with proper isolation, custom domains, and usage tracking. Multi-tenancy is where patterns combine (gateway for routing, separate databases for isolation, Durable Objects for per-tenant state, Workflows for tenant onboarding).

Chapter 24 provides honest assessment of when Cloudflare isn't the right choice. Patterns help you build well on Cloudflare; knowing when not to use Cloudflare helps you build well, period.

The virtue of boring​

Establishing your latency budget​

Choosing your pattern​

API gateway pattern​

Why the pattern exists​

What's different at the edge​

The decision framework​

Implementation essence​

Trade-offs to accept​

Rate limiting at the edge​

The centralisation problem​

How Durable Objects change the calculus​

When this matters​

Backend for frontend​

Why the pattern exists​

What's different at the edge​

The decision framework​

Implementation essence​

A/b testing at the edge​

Why edge a/b testing​

The assignment pattern​

Configuration-code separation​

When edge a/b testing fits​

Real-time collaboration​

Why traditional WebSocket architecture is complex​

The output gating guarantee​

Geographic latency reality​

The decision framework​

Trade-offs to accept​

Event-driven architecture​

Why events matter​

What's different at the edge​

The idempotency imperative​

Queues vs Workflows​

Saga pattern for distributed transactions​

Trade-offs to accept​

Service composition​

The three-layer model​

Service bindings​

External apis​

Composition patterns that combine​

Anti-patterns to avoid​

The everything gateway​

Premature event-driven​

Durable Objects for everything​

Saga without compensation​

Architecture decision records​

ADR structure​

ADR: multi-database vs single database for tenants​

ADR: Durable Objects vs external Redis for rate limiting​

ADR: Queues vs Workflows for background processing​

What comes next​

The virtue of boring

Establishing your latency budget

Choosing your pattern

API gateway pattern

Why the pattern exists

What's different at the edge

The decision framework

Implementation essence

Trade-offs to accept

Rate limiting at the edge

The centralisation problem

How Durable Objects change the calculus

When this matters

Backend for frontend

Why the pattern exists

What's different at the edge

The decision framework

Implementation essence

A/b testing at the edge

Why edge a/b testing

The assignment pattern

Configuration-code separation

When edge a/b testing fits

Real-time collaboration

Why traditional WebSocket architecture is complex

The output gating guarantee

Geographic latency reality

The decision framework

Trade-offs to accept

Event-driven architecture

Why events matter

What's different at the edge

The idempotency imperative

Queues vs Workflows

Saga pattern for distributed transactions

Trade-offs to accept

Service composition

The three-layer model

Service bindings

External apis

Composition patterns that combine

Anti-patterns to avoid

The everything gateway

Premature event-driven

Durable Objects for everything

Saga without compensation

Architecture decision records

ADR structure

ADR: multi-database vs single database for tenants

ADR: Durable Objects vs external Redis for rate limiting

ADR: Queues vs Workflows for background processing

What comes next