Chapter 5: Local Development, Testing, and Debugging

How do I develop, test, and debug Workers effectively before deploying?

The promise of serverless is reduced operational burden. The reality is that the burden shifts from managing servers to managing the gap between local development and production behaviour. This gap exists on every serverless platform, but Cloudflare's architecture creates a distinctive version. Understanding where simulation ends and reality begins determines whether you develop with confidence or get surprised in production.

Local development suffices well for testing business logic; integration behaviour with real services requires production testing. Accept this early, and you'll have fewer production incidents.

The simulation boundary

Cloudflare built Miniflare because the Workers runtime can be simulated: V8 with a specific set of Web APIs, something a local process can replicate faithfully. But the infrastructure around Workers cannot be meaningfully simulated on your laptop. The global network, the edge cache, and the geographic distribution are the point. This boundary tells you when to trust local development and when not to.

What Miniflare actually simulates

Miniflare runs the actual V8 JavaScript engine powering production Workers. Your code executes in the same environment with the same Web APIs: fetch, Request, Response, Headers, crypto, streams. When Miniflare says your code works, the JavaScript genuinely works as it will in production.

Binding interfaces match production. KV operations use local SQLite storage with the same API semantics. D1 queries run against actual SQLite with the same SQL dialect. R2 operations write to local files with the same interface. Durable Objects run with the same storage API and single-threaded execution guarantee.

Resource limits are enforced: CPU time constraints, memory limits, subrequest counts. Miniflare applies the same boundaries you'll hit in production.

For business logic, data transformation, request handling, and business rules, local simulation provides high-fidelity accuracy.

Where simulation breaks down

The gap between local and production isn't a tooling failure. It's a fundamental property of distributed systems. You can simulate the runtime; you cannot simulate the network. This gap produces recurring bugs that deserve explicit names.

The Simulation Gap

Local D1 queries return in microseconds; production queries take 10-50ms due to network traversal. Code making 20 sequential queries seems instant locally but adds 200-1000ms in production. This timing assumption violation is the #1 source of "works on my machine" failures.

Local-production divergence is the broad failure mode where code works perfectly in wrangler dev but fails at the edge.

Timing assumption violation represents the most common form of divergence. Local D1 queries return in microseconds (no network traversal); production queries cross networks, typically 10-50ms to the nearest D1 instance. Twenty sequential database queries seem instant locally but add a full second of latency in production. Tests pass because local execution is fast; production fails because global distribution adds network latency. No amount of local testing reveals this divergence.

Cache behaviour cannot be meaningfully tested locally. Cloudflare's edge cache spans 300+ locations with complex invalidation semantics you cannot replicate. Locally, the Cache API works (you can store and retrieve values) but you're testing the API only, not the cache. Production cache hits, misses, and invalidation patterns only emerge under real traffic.

Durable Object placement cannot be tested locally. In production, Durable Objects are globally distributed, placed near their first request, accessed across networks. Locally, every Durable Object runs in the same process with zero network latency. Tests verifying Durable Object coordination pass locally and fail in production because latency characteristics differ fundamentally.

Geographic distribution doesn't exist locally. Your Worker runs in one place: your machine. In production, it runs in 300+ locations simultaneously. Smart Placement, jurisdictional restrictions, and regional affinity cannot be tested locally; they're emergent properties of global deployment, not pure code execution.

External services see different clients in production. Third-party APIs may behave differently when called from Cloudflare's IP ranges versus your development machine. Rate limits, geographic restrictions, and security rules vary by source IP.

The practical implication

This boundary creates a clear development strategy: local simulation for logic, production infrastructure for integration. Testing integration behaviour locally wastes time; you're testing a fiction. Iterating on logic against remote services wastes time; you're waiting for networks when you could have instant feedback.

Developers who struggle with Workers development usually distrust local simulation too much (running everything remotely, slow iteration) or trust it too much (never testing against real services until production).

Development modes and when to use them

Wrangler provides two modes: local simulation and remote development. The choice isn't preference. It's fitness for purpose.

Local development

wrangler dev runs your Worker locally with simulated bindings. D1 uses local SQLite files. KV and R2 use local storage. Durable Objects run in-process without network traversal. Changes reflect instantly; the feedback loop is sub-second.

Use local development when iterating on request handling, routing, or business logic; developing features that don't depend on specific data; running unit tests; working with sensitive production data you shouldn't touch; or when iteration speed matters more than integration accuracy.

The mental model: you're testing your code, not your infrastructure.

Remote development

wrangler dev --remote runs your code locally but connects bindings to real Cloudflare services deployed. D1 queries hit real databases. KV reads hit real namespaces. R2 operations touch real buckets. Network latency is real and observable.

Use remote development when validating database queries against production schemas, testing with realistic data volumes, debugging integration issues that local simulation can't reproduce, verifying latency characteristics, or when accuracy matters more than speed.

The mental model: you're testing your infrastructure, not just your code.

The environment strategy

Neither mode should touch production data. Configure separate environments:

wrangler.toml
[[d1_databases]]
binding = "DB"
database_name = "myapp-dev"
database_id = "dev-database-id"

[env.staging]
[[env.staging.d1_databases]]
binding = "DB"
database_name = "myapp-staging"
database_id = "staging-database-id"

[env.production]
[[env.production.d1_databases]]
binding = "DB"
database_name = "myapp-production"
database_id = "production-database-id"

Essential Safety Practice

Configure dev/staging/production environments before writing code. The five minutes spent prevents the production incident you'll otherwise have.

Local development uses default bindings that are isolated, safe, and disposable. Remote development uses wrangler dev --remote --env staging: real services, test data. Production deployment uses wrangler deploy --env production: real services, real users.

Staging environment design

Staging should mirror production structure without production data.

Separate resources, same schemas. Your staging D1 database should have identical schema to production. Your staging KV namespace should have the same key patterns. Structural differences defeat the purpose of staging.

Realistic data, not production data. Seed staging with synthetic data exercising your code paths. If production has millions of records, staging needs enough to reveal pagination bugs and query performance issues; thousands, not millions, but not empty.

Same binding names, different resources. Code references env.DB everywhere. Environment configuration determines which database that resolves to. Never branch on environment in application code; let configuration handle it.

Regular refresh from production schema. When you migrate production, migrate staging. Schema drift causes false confidence: tests pass in staging, fail in production because the schema differs.

The vite alternative

Wrangler is the standard development tool for Workers, but the Cloudflare Vite plugin provides an alternative for teams already using Vite.

Vite is a frontend build tool that's become the default for React, Vue, Svelte, and other modern frameworks. If your project already uses Vite (a React SPA with an API backend, a full-stack framework like React Router or SvelteKit), the Vite plugin integrates Workers development into your existing workflow rather than requiring a separate tool.

What the plugin provides

The plugin runs your Worker code within Vite's development server, providing hot module replacement. Change your Worker code and see results immediately without restarting the server. For frontend-heavy projects with Workers backends, this unified experience is significantly smoother than running Wrangler and Vite simultaneously.

TypeScript types for your bindings generate automatically. The plugin reads your Wrangler configuration and produces type definitions, ensuring accurate types for KV, D1, R2, and other bindings without manual maintenance.

Local simulation works identically to Wrangler. The same Miniflare runtime powers both. The difference is developer experience, not simulation fidelity.

When to choose vite over Wrangler

Use the Vite plugin when building frontend-centric applications where Vite is already your build tool. React Router v7 officially supports the Vite plugin for full-stack SSR development. If your framework documentation recommends it, follow that guidance.

Use the Vite plugin when hot module replacement matters. Wrangler restarts on file changes; Vite updates modules in place. For rapid frontend iteration, HMR provides faster feedback.

Use Wrangler when building API-only Workers without frontend components; it's simpler for Workers that don't need Vite's frontend capabilities.

Use Wrangler when you need remote development. Wrangler's --remote flag connects to real Cloudflare services; the Vite plugin doesn't currently support this.

Use Wrangler for production deployment regardless of development tool choice. The Vite plugin is for development; wrangler deploy handles production.

Configuration

Configure the plugin in your Vite configuration file:

vite.config.ts
import { cloudflare } from "@cloudflare/vite-plugin";
import { defineConfig } from "vite";

export default defineConfig({
  plugins: [
    cloudflare({
      configPath: "./wrangler.toml",
      // Optional: override or extend Wrangler config
      config: {
        // Programmatic configuration options
      }
    }),
  ],
});

The plugin reads your existing wrangler.toml for bindings, environment variables, and other configuration; you have one configuration file for both development and deployment.

The choice between Wrangler and Vite is about workflow fit, not capability. Both provide accurate local simulation. Teams using Vite for frontend development benefit from the unified experience; teams building backend-only Workers gain nothing from adding Vite to their toolchain.

Testing stateful edge systems

The testing pyramid applies to Workers, but edge-specific considerations change what matters at each level.

Why edge testing differs from Lambda testing

Hyperscaler serverless development emphasises cold start testing because cold starts dominate user experience significantly. Lambda functions can take seconds to cold start; Azure Functions show similar latencies. Developers spend significant effort testing warm versus cold behaviour, optimising initialisation, managing provisioned concurrency.

None of this applies to Workers because cold starts are sub-millisecond by default, a property of the V8 isolate model rather than something you optimise for. You cannot meaningfully test "cold" versus "warm" behaviour because the difference is imperceptible to users. This frees testing effort from cold start concerns, allowing focus on what actually matters: integration behaviour and coordination.

Hyperscaler testing also emphasises region-specific behaviour: same-region versus cross-region latency, regional failover, availability zone concerns. Workers eliminates this testing category entirely because deployment is global by default. You cannot deploy to the wrong region; there is no region selection dropdown.

What remains is testing integration with Cloudflare services and coordination in stateful systems.

Testing Durable Objects

Testing Durable Objects for coordination is like testing a lock for thread safety: the interesting bugs only appear under concurrent load.

Durable Objects' value lies in their guarantees: single-threaded execution within an object, strong consistency, serialised access to state. Unit tests verify that methods compute correct results. Integration tests verify that guarantees hold under concurrent access patterns.

Unit testing Durable Object methods involves testing method logic in isolation by instantiating your class directly with a mock storage API. Verify that given inputs produce expected outputs, storage operations occur in the right order, error conditions are handled correctly. This catches logic bugs but cannot verify coordination guarantees.

Integration testing coordination guarantees is necessary because the guarantees that make Durable Objects valuable (serialised execution, output gating) only exist in the real runtime. Testing them requires deploying and making concurrent requests.

Consider a counter Durable Object. Unit tests verify that the increment method reads the current value, adds one, writes the result. But the interesting bug (two concurrent increments reading the same value, both writing the same incremented value, losing one increment) cannot occur in a unit test because there's no concurrency. Only integration tests with actual concurrent requests reveal whether coordination works correctly.

The test pattern:

Deploy to staging
Make N concurrent requests that should conflict
Verify the final state reflects all N operations

Ten concurrent increment requests resulting in a count of ten means coordination works. Less than ten means a race condition that unit tests could never find.

Testing state persistence across requests matters because Durable Objects persist state across requests, but local simulation doesn't persist across test runs. Test against real services:

Make a request that creates state
Stop and restart your Worker (or wait for the DO to sleep)
Make a request that reads state
Verify the state survived

This catches bugs where state is held in memory but never written to storage, a failure mode local testing often misses because the object never sleeps.

Output gating cannot be directly tested by inspection. Output gating (the guarantee that external effects are delayed until writes are durable) is invisible to your code; you can only trust the platform's guarantee. What you can test: that your code doesn't depend on side effects completing before durability is confirmed.

Choosing test granularity

Every test has a cost (time to run, infrastructure to maintain) and a benefit (bugs it catches, confidence it provides). The ratio should guide your strategy.

Test Type	Cost	Catches	Use When
Unit tests (mocked bindings)	Milliseconds	Logic bugs	Business logic, data transformation
Unit tests (Miniflare)	Seconds	Binding API misuse	Complex binding interactions
Integration tests	10+ seconds	Schema mismatches, query bugs	Database code, critical paths
E2E tests	Minutes	Deployment configuration	Critical user journeys

Trivial binding interactions (simple get, put, delete operations): unit test with hand-rolled mocks. The binding API is stable; you're testing your logic, not the platform.

Complex binding interactions (SQL queries, transactions, Durable Object coordination): integration test against real services deployed. Mocking complex interactions is error-prone; you'll spend more time debugging mocks than improving application code.

Critical code (authentication, payment, data integrity): integration test regardless of complexity. Some code is too important to trust to simulation.

Writing more mock code than application code? Step back and reconsider. Either simplify your binding interactions or accept the cost of integration testing.

Binding mock mismatch deserves special attention because your mock behaves the way you think the real service behaves, which may differ from reality. D1's transaction semantics, KV's eventual consistency, R2's conditional operations all have subtle behaviour. Mocks often implement the happy path while omitting edge cases production surfaces. The more complex the interaction, the more likely your mock diverges in ways that matter.

Debugging distributed edge systems

When something breaks in production, the debugging approach differs from traditional server applications because failure modes differ. Understanding how failures propagate helps you locate problems faster.

The debugging mental model

A request arrives at Cloudflare's edge, executes your Worker, potentially calls bindings or external services, returns a response. Failures can occur at any point:

Worker execution failures crash the request with an exception. These appear in logs with stack traces; they're usually straightforward to diagnose and fix.

Binding failures occur when Cloudflare services encounter errors: D1 query syntax errors, KV key not found, R2 permission issues. These surface as exceptions if you don't catch them, or as unexpected return values if you do catch them.

External service failures occur when third-party APIs time out, return errors, or behave unexpectedly. Hardest to diagnose because the failure is outside Cloudflare's observability.

Coordination failures in Durable Objects manifest as unexpected state rather than crashes, resulting in wrong results. Two requests that should have been serialised weren't, or state that should have persisted didn't.

When debugging, first determine the category. Stack traces indicate Worker or binding failures. Unexpected state with no errors indicates coordination failures. Intermittent issues correlating with specific external services indicate external failures.

Tracing requests across services

When a request touches multiple Workers or Durable Objects, you need a trace ID to correlate logs. Generate one at the entry point and propagate it through service bindings:

Trace ID propagation
const traceId = request.headers.get("x-trace-id") ?? crypto.randomUUID();
// Include traceId in all log statements
// Pass traceId header to service binding calls

Without trace IDs, debugging distributed requests becomes archaeology: matching timestamps, guessing at causation, hoping logs align across services. With trace IDs, you search for one ID and see the entire request journey across all services.

Log sampling and its implications

Under high traffic, Cloudflare samples logs; you won't see every request logged. Usually this is fine; you don't need every successful request. But rare errors may not appear in logs at all.

For critical operations where you must capture every failure, log to an external service (via Logpush or explicit logging) rather than relying on console output. Console logs are sampled; Logpush captures everything.

The diagnostic toolkit

wrangler tail streams logs in real-time with filtering. Use for active debugging: something is wrong now, you need to see what. Filter by status code, URL path, or IP.

Dashboard analytics show aggregated patterns. Use for retrospective analysis: errors spiked yesterday at 3pm. What changed? Correlate error rate spikes with deployment times, traffic spikes, or geographic patterns.

Logpush sends logs to external storage: R2, S3, or observability platforms. Configure it before you need it. Post-incident analysis requires logs; without them, you're guessing.

Common failure patterns and their signatures

Certain failures recur across Workers applications. Naming them creates shared vocabulary for design discussions and post-incident analysis.

Sequential database query latency

Symptom: Requests work locally but time out or perform poorly in production. Wall time is high; CPU time is low.

Cause: Many sequential database queries. Each adds 10-50ms of network latency. Twenty queries means 200-1000ms that doesn't exist locally.

Diagnosis: Look for loops containing await on database operations. High wall time with low CPU time indicates waiting on I/O.

Fix: Batch queries where possible. Use D1.batch() for multiple independent queries. Restructure code for fewer round trips. Consider denormalising data to reduce query count.

This timing pattern catches nearly every developer new to Workers at least once.

Subrequest limit exhaustion

Symptom: Requests fail after many fetch calls. Error references subrequest limits.

Cause: Workers default to 10,000 subrequests per invocation on paid plans (50 external and 1,000 to Cloudflare services on free plans). Fan-out patterns, long-lived WebSocket connections, and extended Workflows can hit this limit.

Diagnosis: Count fetch calls, including bindings and external services. Each binding operation counts as a subrequest. Check your configured limit in wrangler.jsonc under limits.subrequests; if unset, the default of 10,000 applies.

Fix: First, check whether raising the limit solves the problem. Paid plans support up to 10 million subrequests per invocation through Wrangler configuration. If your workload legitimately requires high fan-out, increasing the limit is the correct first response. For workloads where the subrequest count is unpredictable or unbounded, batch operations where APIs support it, use continuation tokens rather than fetching all pages at once, or use Queues to distribute work across multiple Worker invocations. You can also set a lower limit alongside cpu_ms to protect against runaway code.

CPU time exhaustion

Symptom: Requests fail with CPU time limit errors. High CPU time in analytics.

Cause: Computation-heavy operations exceed the 30-second default limit (or 50ms on free plans).

Diagnosis: Profile to identify expensive operations. JSON parsing of large payloads, complex string manipulation, and cryptographic operations are common culprits.

Fix: Stream instead of buffering; parse JSON incrementally and process data in chunks. Offload heavy computation to Containers. Use Queues to distribute work. For legitimate heavy computation that must happen synchronously, request increased CPU limits (up to 5 minutes on paid plans).

CPU Time Limits by Plan

Plan	Default CPU Time	Maximum with Configuration
Free	10ms	10ms
Paid (Standard)	30 seconds	5 minutes

Note: Legacy "Bundled" plans (no longer available to new accounts) have a 50ms CPU limit. If you're on a Bundled plan from before March 2024, this limit still applies.

Memory pressure

Symptom: Requests fail mysteriously without clear error messages. This could be crashes or incomplete responses.

Cause: Large payloads or accumulated state exhaust the 128 MB isolate limit.

Diagnosis: Memory issues are hard to diagnose because the failure mode is a crash, not a catchable exception. Look for patterns: failures correlating with large request bodies, data accumulating in loops without releasing references, unbounded collection growth.

Fix: Stream large data instead of buffering; use Request and Response bodies as streams. Release references early. Prefer processing and discarding over collecting and batch-processing.

Unhandled promise rejections

Symptom: Requests fail with unhandled rejection errors. Stack traces may be unhelpful.

Cause: Async operations fail without catch handlers.

Diagnosis: Look for await calls without try/catch, or .then() chains without .catch().

Fix: Wrap your entire handler in try/catch. Handle errors from every awaited operation explicitly. Use TypeScript's strict mode to catch potential undefined results.

Edge-specific failures

Some failures only manifest at the edge because they depend on properties that don't exist locally.

Geographic distribution mismatch occurs when code assumes consistent behaviour across locations. User A in London writes data; User B in Sydney reads immediately and gets stale results. This isn't wrong code per se; global distribution introduces propagation delays that don't exist locally. KV is particularly susceptible: eventual consistency means writes propagate globally within roughly 60 seconds, but "within 60 seconds" includes "not yet" for some locations.

External service IP filtering causes mysterious failures where code that worked locally fails in production. Many third-party APIs rate-limit or block requests from cloud provider IP ranges. Your local machine has a residential IP; Cloudflare's edge has well-known IP ranges that security systems treat differently.

Request routing variance surfaces when Smart Placement or custom placement hints route requests unexpectedly. A Durable Object placed near its first user in Tokyo serves that user well but adds latency for users in New York. This is working as designed, but "working as designed" and "meeting expectations" aren't the same.

Hyperscaler comparison: development experience

How Workers development differs from Lambda or Azure Functions:

Aspect	Workers	Lambda	Azure Functions
Local simulation fidelity	High (V8 runtime matches production)	Low (SAM Local approximates)	Medium (Azurite simulates storage)
Cold start testing	Unnecessary (sub-millisecond)	Critical (seconds of latency)	Important (hundreds of milliseconds)
Integration test speed	Seconds (instant deployment)	Minutes (container build and deploy)	Minutes (similar to Lambda)
Deployment time	Seconds	30 seconds to 2 minutes	30 seconds to 2 minutes
Rollback time	Seconds	Minutes	Minutes
Region testing	Unnecessary (global by default)	Required (regional deployment)	Required (regional deployment)
Environment complexity	Low (wrangler.toml)	Medium (SAM templates, IAM)	Medium (ARM templates, configuration)

The net effect: Workers development cycles are significantly faster than hyperscaler equivalents. Changes deploy in seconds. Rollbacks are instant. The low cost of trying something in production encourages rapid experimentation.

The risk is that same deployment speed: bugs deploy globally in seconds. Hyperscaler deployments are slow enough to catch mistakes mid-deploy. Workers deployments complete before you finish reading the error message from your build.

danger

A bug in your Worker deploys to 300+ locations before your CI pipeline finishes printing test output. Pre-deployment checks aren't optional; they're your only defence against instant global incidents.

Deployment and rollback

Workers deploy in seconds because there's nothing to deploy: no containers, no infrastructure, no cold start. This speed is a feature until it's a bug shipped to 300+ locations simultaneously.

The speed trap

Fast deployment encourages rapid iteration, which is good for development. It also enables rapid incident creation, which is bad for production. A Lambda bug takes minutes to deploy; you might notice failing tests before deployment completes. A Worker bug deploys before the test output renders in your CI system.

Mitigate with:

Pre-deployment checks that block. Type checking and unit tests should complete before deployment begins. Configure CI to fail fast on errors.

Automatic rollback triggers. Monitor error rates after each deployment. If the rate exceeds a threshold (5% when the baseline is 0.1%), rollback automatically. Cloudflare's API makes this straightforward to implement.

Gradual rollout for critical changes. Workers doesn't have built-in canary deployments, but you can implement them with a routing Worker directing a percentage of traffic to a new version. This is overkill for most applications; essential for high-traffic critical systems.

Rollback mechanics

wrangler rollback reverts to the previous version in seconds. Know this command well. Practice it before you need it under incident pressure. In an incident, rollback first, investigate second. Debugging is easier without the pressure of production traffic affected.

Cloudflare retains recent versions automatically; you can rollback to any recent deployment, not just the immediately previous one.

Preview deployments

wrangler deploy --env preview creates a separate deployment at a preview URL. Use for testing changes against real infrastructure without affecting production, sharing work-in-progress with colleagues, or running integration tests against a deployed version.

Preview deployments use real Cloudflare services (real D1, real KV, real R2) but are accessible only at their preview URL; they're production-like without being production.

What comes next

This chapter covered the development lifecycle: understanding the simulation boundary, choosing between local and remote development, testing stateful systems, debugging production issues. Edge development differs from hyperscaler serverless in specific, predictable ways: faster iteration, different failure modes, coordination concerns that don't exist in stateless systems.

Chapter 6 introduces Durable Objects fully. The testing patterns here, especially coordination testing, become central. Durable Objects are Workers' most distinctive feature, requiring the mental models this chapter established: understanding what simulation can and cannot provide, testing coordination under real concurrency, debugging distributed state.

The remaining chapters assume you can develop, test, and deploy effectively. The practices here (staging environments, structured logging, trace IDs, rapid rollback) underlie everything that follows.

The simulation boundary​

What Miniflare actually simulates​

Where simulation breaks down​

The practical implication​

Development modes and when to use them​

Local development​

Remote development​

The environment strategy​

Staging environment design​

The vite alternative​

What the plugin provides​

When to choose vite over Wrangler​

Configuration​

Testing stateful edge systems​

Why edge testing differs from Lambda testing​

Testing Durable Objects​

Choosing test granularity​

Debugging distributed edge systems​

The debugging mental model​

Tracing requests across services​

Log sampling and its implications​

The diagnostic toolkit​

Common failure patterns and their signatures​

Sequential database query latency​

Subrequest limit exhaustion​

CPU time exhaustion​

Memory pressure​

Unhandled promise rejections​

Edge-specific failures​

Hyperscaler comparison: development experience​

Deployment and rollback​

The speed trap​

Rollback mechanics​

Preview deployments​

What comes next​