Chapter 8: Queues: Asynchronous Processing
How do I handle work that doesn't need immediate response?
Not every operation belongs in a request/response cycle; sending a welcome email shouldn't block user registration, processing an uploaded image shouldn't delay the upload confirmation, and syncing data to a third-party system shouldn't slow down the operation that triggered it.
Queues decouple producers from consumers by allowing a Worker to write a message and return immediately while another Worker (or the same one in a different context) processes that message later. The user gets a fast response, and the background work completes eventually.
Queues offer a simpler model than Workflows: no orchestration, no step dependencies, and no durable state spanning operations. Messages go in, messages come out, and work gets done. For many background processing needs, that simplicity is exactly right, though knowing where queues stop being the right abstraction matters more than mastering their features.
What's different from SQS and Service Bus
If you're coming from AWS or Azure, the biggest difference isn't features but rather the operational model. SQS is a standalone service configured separately from Lambda, with IAM roles connecting them, visibility timeouts that interact badly with Lambda timeouts, and a separate deployment pipeline. Cloudflare Queues, by contrast, is a binding within your Worker project where queue and consumer share the same codebase, the same deployment, and the same configuration file.
Fewer moving parts means fewer failure modes: you won't debug production incidents caused by mismatched visibility timeout and Lambda timeout settings because those settings don't exist separately, and you won't trace IAM permission errors across services since there are no cross-service permissions to configure.
The tradeoff is capability; SQS FIFO provides ordering guarantees and exactly-once processing semantics that Cloudflare Queues doesn't attempt, and Azure Service Bus offers sessions, larger messages, and sophisticated routing. If you need those capabilities, use those services even if the rest of your stack is on Cloudflare. There's no shame in using the right tool, and pretending Cloudflare Queues competes with enterprise messaging platforms on features would be dishonest.
The honest comparison is that Cloudflare Queues is simpler to operate and natively integrated with Workers, though this comes at the cost of advanced messaging features you may or may not need. If your requirements are simple (process this later, retry on failure, alert if it keeps failing), Cloudflare Queues handles that with less operational surface than hyperscaler alternatives, but if you need strict ordering, exactly-once semantics, or messages larger than 128 KB without indirection, you should look elsewhere.
The mental model
Queues are promises to do something later: the producer gets a fast response while the consumer fulfils the promise eventually, and everything else (batching, retries, dead letters) is infrastructure for keeping that promise.
Queues answer "do this later"; Workflows answer "do these things in order"; Durable Objects answer "coordinate this now". Choose based on what your problem actually is, not which tool you know best.
Think of a queue as a time-shifted function call. Instead of calling processOrder(order) synchronously and waiting, you write a message describing the call and let the system invoke it later. The queue handles the mechanics: persisting the intent, distributing it to available workers, retrying on failure, routing failures somewhere visible.
Queues excel at deferring work, distributing load, and surviving failures, but they struggle with anything requiring coordination between messages, visibility into progress, or guaranteed ordering. If you find yourself building those capabilities on top of a queue, you've chosen the wrong primitive.
The producer-consumer relationship is deliberately loose because producers don't know which consumer will handle their message, when, or whether it will succeed on the first attempt, and consumers don't know how many producers exist or what rate messages will arrive. This loose coupling allows the system to scale, retry, and recover without coordination, but queues can't answer "what happened to my message?" or "how far along is this process?" The moment you need those answers, you've outgrown queues.
When to choose Queues
Use queues deliberately, not by default.
Queues excel when tasks are independent (each message processes without knowledge of others), order doesn't matter (or can be handled at the consumer level through timestamps), fire-and-forget is acceptable (tracking completion through side effects rather than queue state), you need parallel processing (multiple consumers handling messages concurrently), and latency is tolerable (seconds or minutes of delay between send and processing).
Queues struggle when steps depend on previous results, you need visibility into progress, failed steps require compensation logic, or order matters within a process. Any of these signals suggests Workflows, and all of them together make it certain.
Use waitUntil() when work is truly fire-and-forget: no retry needed, no confirmation needed, and failure acceptable. The work is nice-to-have, and you can't afford even minimal queue dispatch overhead.
Use Durable Objects when you need real-time coordination (rate limiting, presence detection, live updates) where state must be immediately consistent and multiple requests must see that consistent state.
Use Workflows when steps depend on each other, when you need visibility into progress, when failure requires compensation, or when the process spans hours or days and must survive infrastructure failures.
Warning signs emerge gradually as you start querying "what's the status of message X?" and building tracking systems, as business logic requires message inspection before processing, and as you add ordering logic, compensation logic, and status tracking. At that point, you've reimplemented Workflows badly. The earlier you recognise orchestration rather than queuing, the cheaper the correction.
Delivery guarantees and their implications
Queues provide specific delivery guarantees that shape how you design consumers, and understanding these guarantees prevents architectural mistakes that surface only under failure conditions.
At-least-once delivery
Queues guarantee at-least-once delivery: every message will reach a consumer at least once, though the guarantee is "at least" rather than "exactly" because failures cause redelivery. If a consumer crashes while processing, the queue redelivers to another consumer, and if a consumer processes successfully but crashes before acknowledging, the queue doesn't know it succeeded and redelivers anyway.
At-least-once is the only honest guarantee a distributed queue can make; exactly-once delivery is a distributed systems myth that requires coordination defeating the purpose of a queue. What you can achieve is at-least-once delivery with exactly-once processing through idempotent consumer code.
The idempotency requirement
Queues provide at-least-once delivery. Messages will be redelivered after failures, timeouts, or infrastructure events. Design every consumer to handle duplicate messages safely.
Messages will be redelivered. A consumer crash, network failure, or acknowledgment timeout causes redelivery even when processing succeeded. Design every consumer to handle duplicate messages correctly from day one. Retrofitting idempotency after production incidents is expensive and error-prone.
Message acknowledgment transfers ownership. The queue keeps the message safe and redelivers if needed until the consumer calls ack(). That call transfers responsibility to your code. If your code crashes after acknowledgment, the message is gone. Acknowledge only after processing completes successfully.
Idempotent processing means the same message can be processed multiple times with the same outcome. The simplest pattern checks whether work is already done before doing it:
const alreadySent = await env.DB.prepare(
"SELECT 1 FROM sent_emails WHERE message_id = ?"
).bind(task.messageId).first();
if (!alreadySent) {
await sendEmail(task);
await env.DB.prepare("INSERT INTO sent_emails (message_id) VALUES (?)").bind(task.messageId).run();
}
message.ack();
The check-then-act pattern works for most cases. Naturally idempotent operations (setting a value rather than incrementing, replacing a record rather than appending) need no explicit check. The danger lies in operations that seem idempotent but aren't: sending emails, charging credit cards, incrementing counters, appending to logs. Design for redelivery from the start.
Why no ordering guarantee
Queues do not guarantee message ordering. Messages sent in sequence may arrive out of sequence. This isn't a bug; it's a fundamental consequence of how queues achieve throughput and availability.
Queue messages may arrive in any order. Don't build elaborate schemes to preserve ordering atop an unordered queue; if you need ordering, you need a different primitive.
Don't build elaborate schemes to preserve ordering atop an unordered queue. If you need ordering, you need Workflows or Durable Objects, not a queue with ordering logic bolted on. The moment you're tracking sequence numbers and requeueing out-of-order messages, you've chosen the wrong primitive.
Guaranteeing order requires serialisation. If message B must wait for message A, the system processes one message at a time. Ensuring A completes before B starts requires coordination that eliminates the parallelism making queues useful.
Cloudflare could offer a FIFO option like SQS, but FIFO queues are a different product with different tradeoffs: they sacrifice throughput for ordering, require careful partition key design, and cost more. If you need ordering, you're usually describing orchestration rather than queuing: a sequence of dependent operations rather than independent tasks. That's what Workflows provides.
For most background tasks, ordering doesn't matter. Processing user signups, sending notifications, and syncing records are independent operations. When ordering matters within a specific context, route related messages to a single Durable Object that processes them sequentially, or recognise you need Workflows.
Designing reliable producers
Any Worker can write messages to a queue through a binding. Call send() with a JSON-serialisable object up to 128 KB. For larger payloads, store in R2 and send a reference. But mechanics aren't where production systems fail.
The architectural questions matter more: When should you batch sends versus sending individually? How do you handle queue unavailability? How do you evolve message schemas?
Batching sends reduces overhead when you have multiple messages ready simultaneously (up to 100 messages or 256 KB per batch). But batching changes failure semantics. If a batch send fails, do you retry the whole batch? If some messages are more critical than others, should they be batched together? For independent messages of equal importance, batch aggressively. For messages with different criticality, send individually or implement batch-level retry logic.
Queue unavailability is rare but possible. The send() call can fail. The options mirror Chapter 7's retry patterns: retry with backoff for transient failures, fail fast and alert for persistent unavailability, or buffer locally and retry later for critical messages. For most applications, simple retry with backoff suffices. For high-criticality messages, consider writing to D1 as a fallback.
Message schema evolution is the concern most producers ignore until an incident. You add a field, but old messages in the queue lack it. You rename a field, but consumers see both names during transition. You remove a field that consumers still expect.
The safest approach is additive-only changes with consumer tolerance for missing fields. New fields get default values; removed fields are ignored rather than causing errors. For breaking changes, version your message schema explicitly and have consumers handle multiple versions during transitions.
Messages can be delayed up to 12 hours before becoming visible to consumers; for longer delays, use Workflows with step.sleep(). Delay is useful for simple scheduling (send a reminder in an hour, retry after a cooldown), not as a general-purpose scheduler.
Configuring consumers
Consumers receive messages in batches, process each one, and acknowledge completion. Unacknowledged messages retry after a visibility timeout. The interesting decisions are in the configuration.
[[queues.consumers]]
queue = "my-queue"
max_batch_size = 10
max_batch_timeout = 5
max_retries = 3
dead_letter_queue = "my-dlq"
max_concurrency = 10
These settings interact in ways that matter for your workload. Consider three profiles representing common patterns.
High-throughput, latency-tolerant processing suits analytics pipelines, log processing, or bulk data synchronisation where processing millions of messages efficiently matters more than individual message latency.
max_batch_size = 100
max_batch_timeout = 30
max_concurrency = 20
Large batches amortise per-invocation overhead. Long timeouts allow batches to fill during variable load. High concurrency processes backlogs quickly. The tradeoff: failed processing retries 100 messages rather than 10, and individual message latency can stretch to 30 seconds waiting for a batch to fill.
Low-latency notifications suits user-facing features where a message means a user is waiting: password reset emails, real-time alerts, webhook deliveries.
max_batch_size = 5
max_batch_timeout = 1
max_concurrency = 10
Small batches and short timeouts minimise latency. Concurrency stays high enough to handle bursts without overwhelming downstream services. The tradeoff: more consumer invocations, higher cost per message, less efficiency during sustained load.
Unreliable downstream processing suits consumers calling external APIs prone to rate limiting, temporary unavailability, or slow responses: third-party integrations, legacy systems, services with aggressive throttling.
max_batch_size = 10
max_batch_timeout = 10
max_retries = 10
max_concurrency = 5
Moderate batch size limits blast radius when downstream fails. More retries with longer backoff give transient issues time to resolve. Conservative concurrency avoids overwhelming a struggling downstream. The tradeoff: backlogs grow faster during outages, recovery takes longer.
Questions for choosing your profile: What's your message arrival pattern, steady or bursty? What's acceptable processing latency? How reliable is your downstream? How expensive is each message to process?
Retry strategy
Default retry behaviour (three attempts with exponential backoff) works for transient failures against generally reliable systems. Adjust based on what your consumer calls.
As Chapter 7 discussed, exponential backoff reflects hope that a transient condition will clear; immediate acknowledgment of permanent failures reflects certainty that retrying won't help. The critical decision is distinguishing between them:
try {
await processTask(message.body, env);
message.ack();
} catch (error) {
if (isTransient(error)) {
message.retry({ delaySeconds: 60 });
} else {
await logPermanentFailure(message.body, error);
message.ack(); // Stop retrying
}
}
Transient failures (network timeouts, rate limits, temporary unavailability) should retry with increasing delays. Permanent failures (malformed data, business rule violations, missing dependencies) should not retry. Acknowledging a permanently failed message prevents infinite retry loops; logging ensures visibility.
The isTransient() function encodes your understanding of downstream failure modes. HTTP 429 (rate limited) is transient; HTTP 400 (bad request) is permanent; HTTP 500 might be either. Connection timeouts are usually transient; validation errors are always permanent. Getting this classification wrong wastes resources on hopeless retries or drops recoverable messages.
Per-message acknowledgement for etl pipelines
For ETL pipelines where operations aren't idempotent, individual message acknowledgement prevents reprocessing of successfully handled items when subsequent messages fail. Rather than acknowledging the entire batch at the end, acknowledge each message immediately after its work completes:
export default {
async queue(batch: MessageBatch<ETLTask>, env: Env) {
for (const message of batch.messages) {
try {
await processRecord(message.body, env);
message.ack(); // Acknowledge immediately on success
} catch (error) {
if (isTransient(error)) {
message.retry({ delaySeconds: Math.pow(2, message.attempts) * 10 });
} else {
await logPermanentFailure(message.body, error, env);
message.ack(); // Stop retrying permanent failures
}
}
}
}
};
This pattern is essential when downstream writes aren't idempotent. If you write to an external API that doesn't support idempotent operations, batch-level retry would resend all records, duplicating successful writes. Per-message acknowledgement ensures only failed records retry.
The trade-off is overhead. Per-message acknowledgement means more operations than batch acknowledgement. For high-volume pipelines with idempotent operations, batch acknowledgement is more efficient. For pipelines with non-idempotent operations or expensive processing, per-message acknowledgement prevents costly reprocessing.
Pull-based consumers
Everything discussed so far assumes push-based consumption: Cloudflare invokes your Worker when messages arrive. Queues also support pull-based consumption, where any HTTP client fetches messages on its own schedule. A queue supports one or the other, not both.
Pull consumers exist for when the processor cannot be a Worker: message processing runs on Kubernetes, your consumer is a Go binary that won't transpile to JavaScript, or your legacy system needs queue integration but can't adopt Workers yet. Pull consumers bridge Cloudflare's queue infrastructure to wherever your processing runs.
The mental model shifts from "Cloudflare calls you" to "you call Cloudflare." Your consumer polls the queue's HTTP endpoint, receives a batch of messages, processes them, then acknowledges through another HTTP call. The queue doesn't care what makes these calls: a Lambda function, a container in ECS, a cron job on a VM, or a developer's laptop during debugging.
# Pull up to 100 messages with a 30-second visibility timeout
curl -X POST "https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/queues/${QUEUE_ID}/messages/pull" \
-H "Authorization: Bearer ${API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"batch_size": 100, "visibility_timeout": 30000}'
# Acknowledge processed messages; mark others for retry
curl -X POST "https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/queues/${QUEUE_ID}/messages/ack" \
-H "Authorization: Bearer ${API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"acks": [{"lease_id": "..."}], "retries": [{"lease_id": "..."}]}'
The visibility timeout matters more here than with push consumers. When you pull a message, other consumers can't see it until the timeout expires or you explicitly acknowledge or retry. Set the timeout longer than expected processing time but short enough that stuck messages don't block indefinitely. Thirty seconds works for most workloads.
Pull consumers use short polling. If no messages exist, the API returns immediately with an empty response. Your consumer must implement its own polling loop with appropriate backoff during quiet periods. Poll every 100 milliseconds when messages flow, back off to several seconds when the queue is empty.
When to use pull consumers
Default to push consumers. They're simpler to operate, scale automatically, and integrate naturally with your Cloudflare architecture.
Choose pull consumers when Workers genuinely cannot handle your processing: the consumer needs capabilities Workers don't provide (arbitrary binaries, GPU access, massive memory, hours-long processing), must run in a specific location for compliance or latency, or integrates with existing infrastructure that isn't moving to Cloudflare. These are legitimate constraints, not preferences.
Pull consumers also suit controlled consumption rates. If your downstream handles only 10 requests per second and you need precise enforcement, pull consumers let you control exactly when and how fast you consume. Push consumers offer concurrency limits; pull consumers give complete control over consumption timing.
The hybrid pattern works well during migrations: produce to Cloudflare Queues from Workers, consume from existing infrastructure via pull, then migrate consumers to Workers once producers are stable. The queue decouples producer migration from consumer migration.
What pull consumers sacrifice
Pull consumers give up automatic scaling. Push consumers scale with queue depth. Pull consumers scale only if you build that scaling yourself.
Pull consumers give up Cloudflare's retry orchestration. Push consumers use message.retry() with platform-managed delays. Pull consumers must implement their own retry logic or rely on the visibility timeout.
Pull consumers require API token management. Push consumers authenticate implicitly through bindings; pull consumers need tokens with queue read and write permissions, rotated and secured like any other credential.
If Workers can process your messages, push consumers' operational simplicity outweighs pull's theoretical flexibility.
Dead letter Queues
Messages that fail repeatedly need somewhere to go. Without a dead letter queue, they're deleted after exhausting retries.
dead_letter_queue = "orders-dlq"
The dead letter queue is just another queue with its own consumer. That consumer might alert on-call engineers, log details for investigation, attempt processing with different logic, or queue for manual review.
Messages in a DLQ signal something is wrong with your consumer code, upstream data, or downstream dependencies. A growing DLQ indicates a problem that won't fix itself. Monitor DLQ depth and alert when messages accumulate; don't let it become a graveyard of ignored failures.
Failure modes
Every queue failure mode traces back to the same root: the producer-consumer relationship is deliberately loose. That looseness enables scaling but prevents answering questions about individual messages.
Poison message loop occurs when a malformed message repeatedly fails. With per-message acknowledgment, poison messages eventually route to the DLQ while healthy messages continue processing. With batch acknowledgment, a single poison message causes the entire batch to retry indefinitely, trapping healthy messages in the loop. Prefer per-message acknowledgment unless all messages in a batch succeed or fail together. The fix: distinguish permanent failures from transient ones and acknowledge messages that will never succeed.
Partial batch failure manifests when some messages in a batch succeed while others fail. If you acknowledge the entire batch only when all succeed, successful messages reprocess on retry, potentially causing duplicate side effects if the idempotency check and side effect aren't atomic. Per-message acknowledgment means successful ones don't reprocess, but adds complexity of partial completion. The safest pattern: per-message acknowledgment with truly idempotent processing, acknowledging each message immediately after success.
Consumer starvation happens when a slow message blocks batch processing. If your batch contains nine fast messages and one slow one, all nine complete quickly but the consumer invocation doesn't finish until the slow message completes. High concurrency helps (other invocations process other batches), but within a single batch, slow messages create head-of-line blocking.
Solutions depend on why messages are slow. If processing time varies predictably by message type, route slow types to a separate queue. If slowness is unpredictable, smaller batches reduce blocking impact. If slowness indicates a downstream problem, circuit breakers prevent one slow dependency from blocking all processing.
Backlog runaway during outages is expected since queues buffer work when consumers can't keep up. But unbounded accumulation indicates a fundamental mismatch between production rate and consumption capacity. Monitor backlog depth and alert on sustained growth. A backlog that grows during an outage and drains after recovery is healthy; indefinite growth indicates architectural problems.
Monitoring queue health
Debugging queue problems requires visibility into the full pipeline.
Essential metrics: backlog depth (messages waiting), consumer error rate (failures per message), processing latency (time from send to acknowledgment), and DLQ inflow rate (messages failing to dead letter queue per period).
Backlog depth trending upward means insufficient consumer capacity: not enough concurrency, processing too slow, or too many retries. Error rate spikes indicate bad messages or downstream problems. Processing latency degradation suggests consumer slowness or batch configuration issues. DLQ inflow indicates persistent failures needing investigation.
Set alerts at thresholds that give time to respond. A backlog that drains in an hour isn't urgent; a backlog growing faster than it drains is. Error rates above your retry budget mean messages are hitting the DLQ; rates below mean the system is self-healing. DLQ depth above zero always warrants investigation.
Cloudflare provides basic analytics through the dashboard. For production systems, supplement with your own logging: capture message types, processing duration, failure reasons, and retry counts.
Cost considerations
Queues are available on both paid and free Workers plans. The free plan includes 10,000 operations per day across reads, writes, and deletes, with all features available including event subscriptions. The meaningful restriction is retention: free plan messages expire after 24 hours rather than the 14 days available on paid plans. For prototyping, learning, and low-volume production workloads, the free tier is genuinely usable rather than merely decorative.
Queue costs on paid plans scale with operations: writes, reads, and deletes. Each message typically incurs three operations minimum (write from producer, read by consumer, delete on acknowledgment). Retries add reads; DLQ routing adds writes and reads.
At low volumes, queue costs are negligible. Don't optimise.
At moderate volumes (millions of messages per day), costs become meaningful but are still dominated by what consumers do, not queue operations.
At high volumes, architecture matters more than configuration. Batch size doesn't reduce queue operations (10 messages still means 10 writes, 10 reads, 10 deletes), but batching reduces consumer invocations, lowering Workers costs if your consumer has significant per-invocation overhead. The optimisation that matters most is reducing message volume: aggregate events into single messages where semantics allow, use waitUntil() for fire-and-forget work that doesn't need retry guarantees, and question whether every event actually needs to be a separate message.
Scalability boundaries
Queues have limits that matter for architecture decisions.
Message size caps at 128 KB. Many payloads exceed this. Store large data in R2 and send a reference. Design for indirection from the start if your payloads might grow.
Throughput caps at approximately 5,000 messages per second per queue. For higher throughput, shard across multiple queues. Most applications never approach this limit, but high-volume event streaming or IoT scenarios might require sharding.
Consumer wall time caps at 15 minutes; CPU time at 30 seconds by default, extendable to 5 minutes on paid plans. Most queue consumers complete in milliseconds or seconds. If yours routinely approaches these limits, you might need Containers, covered in Chapter 9.
When Queues aren't enough
The evolution from simple queues to more sophisticated patterns follows a predictable arc. You start with fire-and-forget background work. Then you add a status table to track completion. Then sequence numbers for ordering. Then per-message-type monitoring for visibility into stuck messages. Then correlation IDs and state machines for coordination.
At some point, you've rebuilt Workflows badly. Recognition usually comes too late.
Warning signs you've outgrown queues: building status tracking, adding sequence numbers, implementing compensating actions. Any single signal suggests evaluating Workflows. Multiple signals make it certain.
Choosing between Cloudflare Queues and hyperscaler alternatives
Most background processing doesn't require advanced messaging features. If your requirements are "process this later, retry on failure, dead-letter if it keeps failing," Cloudflare Queues handles that with less operational complexity than SQS or Service Bus.
But some requirements do demand those features:
If you need strict ordering within a partition, use SQS FIFO. Cloudflare Queues has no ordering primitive; building ordering on top of an unordered queue is fragile and expensive.
If you need exactly-once processing at the queue level, use SQS FIFO with deduplication. Cloudflare Queues requires consumer-side idempotency, putting the complexity in your code rather than infrastructure.
If you need messages larger than 128 KB without indirection, use Azure Service Bus (up to 100 MB on premium tiers). The R2 indirection pattern works, but native support is cleaner if large messages are common.
If you need sophisticated routing, dead-letter policies, or message sessions, Azure Service Bus offers capabilities Cloudflare Queues doesn't attempt.
If your requirements are simpler (independent tasks, at-least-once is fine, ordering doesn't matter, messages are small), Cloudflare Queues integrates natively with Workers, deploys as part of your Worker project, and operates with less configuration surface.
Hybrid approaches work. Use Cloudflare Queues for simple async tasks within your Cloudflare stack; use SQS or Service Bus for workloads needing advanced features. Design producers and consumers to be queue-agnostic where practical.
What comes next
Queues handle asynchronous work within Workers' constraints. But some workloads exceed those constraints, requiring more memory, longer processing, or arbitrary runtimes.
Chapter 9 covers Containers: escaping the V8 isolate model when necessary. A queue consumer needing 4 GB of memory or running a Go binary can't be a Worker. Containers fill that gap, accessible from Workers and Durable Objects through a coordination layer that maintains Cloudflare's routing model while breaking free of isolate limitations.
The decision framework (queues for independent tasks, Workflows for dependent steps, Durable Objects for coordination) recurs throughout Cloudflare architecture. Understanding which primitive fits which problem matters more than mastering any single primitive's features.