Chapter 14: KV and Hyperdrive
How do I cache effectively and connect to existing databases?
Not all storage needs relational queries or object semantics. Sometimes you need fast key-value lookups for configuration, cached responses, or session data; sometimes you already have a PostgreSQL or MySQL database and migration isn't practical.
KV and Hyperdrive serve these needs, but they require shifting mental models. Think of KV not as a database but as a globally distributed cache you can write to. Think of Hyperdrive not as a database proxy but as a geography tax collector that pays once so your queries don't have to.
Workers KV: a CDN you can write to
Think of KV as a CDN for data you control. Same global caching semantics, same eventual consistency, same sweet spot: content read far more often than it changes.
When you write a value to KV, Cloudflare stores it and begins propagating it to edge locations worldwide; reads then serve from the nearest edge cache, typically at 2-10 milliseconds. Like any cache, what you read might not reflect recent writes, which is eventual consistency and not a bug but the fundamental characteristic that makes global sub-10ms reads possible.
KV trades consistency for geography, delivering sub-10ms reads everywhere precisely because nowhere has to agree with anywhere else first.
The API reflects this simplicity:
// Read with type coercion and cache tuning
const flags = await env.CONFIG.get("feature-flags", { type: "json", cacheTtl: 300 });
// Write with expiration
await env.CONFIG.put("session:abc123", JSON.stringify(session), { expirationTtl: 3600 });
The cacheTtl parameter controls how long the edge caches a value before checking for updates; you can set it as low as 30 seconds for applications needing fresher data, though the default of 60 seconds suits most use cases. Lower values mean more frequent checks against the coordination layer, trading read performance for freshness. The expirationTtl controls when the value itself expires and is deleted. Confusing them is a common source of unexpected behaviour.
What eventual consistency actually means
Eventual consistency is often described as "writes take up to 60 seconds to propagate," which undersells the architectural implications. Eventual consistency isn't a delay; it's a design constraint. KV cannot coordinate, cannot count, and cannot guarantee read-your-writes. If you need any of those capabilities, you need a different primitive.
KV provides eventual consistency without monotonic reads; replicas converge given sufficient time without new writes, but a client might see older data after seeing newer data if requests route to different edge locations. User A in London might read value "2", and their next request to Paris might return an older cached value "1", showing regression not because anything went wrong but because KV makes no ordering guarantees across edge locations.
Propagation timing varies by geography and value size: one to five seconds for nearby edges, five to fifteen seconds same-continent, up to sixty seconds globally. But you cannot query "is this value consistent everywhere?" The answer is always "eventually."
Any pattern requiring two requests to agree on a current value is impossible with KV. Counters fail because two requests might both read "5", both write "6", and you lose an increment. Inventory checks fail because two requests might both see "1 remaining" and both decrement. Rate limiting fails because concurrent requests can't coordinate counts. For coordination, counters, or distributed agreement, use Durable Objects.
KV failure modes
Three failure modes recur frequently enough to deserve explicit names.
Stale read after write confuses teams regularly. You write a value, then immediately read it, and the read might return the old value because it was served from an edge cache that hasn't received the write yet. This isn't a race condition; it's expected. The write went to the coordination layer whilst the read came from edge cache; they don't talk to each other.
If you need to display what you just wrote, return the written value directly rather than re-reading it. If you need another request to see the write, consider whether D1 or Durable Objects better fits your consistency requirements. Adding delays is fragile because propagation timing isn't guaranteed.
Negative caching trap catches teams who check for key existence. When you read a key that doesn't exist, KV caches that non-existence at the edge; if you then create the key, some edges continue returning "not found" until their negative cache expires. There's no way to explicitly clear a negative cache entry, so the design-level workaround is not relying on key existence checks for correctness.
Read skew across locations can violate intuitions about consistency. User A in Sydney might see the new value whilst User B in Singapore sees the old value at the same moment, with each edge location serving its local cache state. For data where this inconsistency causes user-visible problems, consider D1 or Durable Objects.
Write failures and retry semantics
KV write operations can fail, and the failure semantics matter.
Writes are acknowledged when the coordination layer accepts them, not when propagation completes, so a successful write means "we've recorded this and will propagate it." A failed write returns an error immediately; writes never silently fail.
Transient failures (network issues, temporary overload) deserve retry with exponential backoff. Permanent failures (value too large, namespace doesn't exist) won't succeed on retry. Transient failures manifest as network errors or 5xx responses; permanent failures return specific error codes.
For critical writes, consider writing to both KV and D1. KV provides speed; D1 provides certainty. If the KV write fails, accept slightly slower reads from D1. If it succeeds, subsequent reads benefit from edge caching.
What goes wrong: immediate consistency expectations
A team builds a user settings feature where users update preferences through a settings page that writes to KV, and the confirmation page reads from KV to display the new settings. In development, it works; in production, users report their changes "don't save."
The settings write goes to KV's coordination layer whilst the settings read comes from edge cache, which hasn't received the write yet. Users see their old settings, assume the save failed, and try again, with support tickets accumulating.
The team tries solutions that don't help: shorter cache TTLs (which don't affect propagation), explicit cache invalidation (which doesn't exist in KV), or read-after-write delays (which help sometimes but not reliably). The real fix is placing user-visible data requiring immediate consistency in D1 or Durable Objects.
The correct architecture stores user settings in D1, which provides read-your-writes consistency. If sub-10ms global reads matter, cache settings in KV as a read-through cache, accepting staleness. For a settings page that users visit occasionally, the difference between 10ms and 50ms is imperceptible.
When KV is the right choice
KV excels at data written rarely, read constantly, where 60 seconds of staleness causes no harm.
Configuration and feature flags fit perfectly. A feature flag checked on every request benefits from sub-10ms reads. When you change the flag, 60-second propagation is acceptable; gradual rollout across edge locations is often desirable.
Cached API responses work well. External API results that don't change frequently can live in KV. The staleness window is explicit and controllable through TTL.
Session tokens are a good fit. You read the token on every authenticated request; you write it only on login and logout. Eventual consistency rarely matters because session changes are infrequent and users typically interact with one edge location.
Static content metadata (types, permissions, version numbers) changes rarely and needs fast global reads.
When KV is the wrong choice
The inverse pattern (frequent writes, consistency requirements, coordination needs) makes KV actively harmful.
Counters and coordination are impossible. Eventual consistency breaks any pattern requiring atomic read-modify-write cycles. If you're incrementing counters, tracking inventory, or implementing accurate rate limits, use Durable Objects.
User-visible data that changes frequently creates bugs. Shopping carts, user preferences, or profile updates that users expect to see immediately will disappoint when served from stale caches. Users add an item, refresh, don't see it, add it again. Now they have duplicates.
Write-heavy workloads pay a premium. KV writes cost ten times more than reads: 5 per million writes versus 0.50 per million reads. If writes exceed 20% of your operations, calculate both KV and D1 costs before committing.
Large values defeat the purpose. KV accepts values up to 25 MB, but large values undermine caching benefits and slow propagation. Use R2 for substantial objects. KV shines with small values typically under 100 KB.
The namespace strategy
You get 100 namespaces per account. Use them. A namespace per concern (configuration, sessions, cache, feature flags) makes limits comprehensible and billing attributable.
The alternative (one namespace with key prefixes) works but obscures. Prefixes provide logical separation but not operational separation: you can't set different TTL defaults, monitor usage separately, or delete a concern's data without scanning all keys.
For multi-tenant applications, consider a namespace per tenant if you have fewer than 100. The isolation simplifies tenant offboarding (delete the namespace) and prevents one tenant's usage from affecting another's limits.
Cache invalidation: the hard truth
Cache invalidation is famously hard. With KV, the problem has a specific shape: you cannot reliably invalidate.
When you delete or update a key, the change propagates eventually. During propagation, some edges serve old values, some serve new values, some serve "not found." You cannot force immediate global invalidation. You cannot query whether invalidation has completed. You can only wait.
KV caching strategies must embrace TTL-based expiration rather than explicit invalidation. Set a TTL that represents your staleness tolerance. Design your application so this staleness causes no harm.
Write-through caching (updating the cache when you update the source) provides an illusion of freshness. You update your database and immediately update KV. But the KV update propagates on its own schedule, potentially slower than your database replication. Users in Tokyo might see fresh data before users in London, despite London being closer to your origin database.
The safest pattern is cache-aside with short TTLs. Read from cache; on miss, read from source and populate cache. Accept that updates take TTL-plus-propagation to appear globally. Don't try to be clever about invalidation; eventual consistency defeats cleverness.
TTL selection: a framework
TTL selection is a tradeoff between freshness and efficiency. Four questions guide the decision.
What's the maximum staleness your application can tolerate? This is the hard ceiling. If users must see updates within five minutes, your TTL cannot exceed five minutes minus propagation time.
What's your cache miss cost? Every miss means a slower request and load on your source. If your source is D1, misses cost 20-50ms. If your source is an external API with rate limits, misses cost both latency and quota. Higher miss costs justify longer TTLs.
How often does the source data actually change? A TTL shorter than your update frequency wastes cache capacity. If you deploy configuration changes daily, a 24-hour TTL aligns with reality.
What's the cost of serving stale data? Sometimes stale data is merely suboptimal (yesterday's exchange rate). Sometimes stale data breaks functionality (showing a user as logged in after logout). The severity determines your tolerance.
Match your TTL to the most restrictive answer. Configuration deserves TTLs matching deployment cadence. Session data deserves TTLs matching session lifetime. When uncertain, err toward shorter TTLs. The cost of a cache miss is a single slower request; the cost of serving stale data might be a confused user.
KV vs hyperscaler caching
For teams comparing KV to AWS, Azure, or GCP caching solutions, the architectural differences matter more than feature lists.
| Aspect | Workers KV | ElastiCache (Redis) | DAX | Azure Cache for Redis |
|---|---|---|---|---|
| Architecture | Global edge cache | Regional cluster | DynamoDB-specific | Regional cluster |
| Read latency | 2-10ms globally | 300-500μs same-AZ, higher cross-region | Microseconds (same region) | 300-500μs same-AZ |
| Consistency | Eventual (60s propagation) | Strong (single cluster) | Eventual | Strong (single cluster) |
| Max value size | 25 MB | 512 MB | 400 KB | 512 MB |
| Pricing model | Per-operation | Instance hours | Instance hours | Instance hours |
| Global distribution | Automatic (300+ PoPs) | Manual (Global Datastore) | None (regional only) | Manual (Geo-replication) |
| Management overhead | Zero | Cluster sizing, failover, patching | Cluster sizing | Cluster sizing, failover, patching |
The fundamental difference: ElastiCache, DAX, and Azure Cache provide sub-millisecond latency within a region but require explicit multi-region configuration for global access. KV provides consistent 2-10ms latency everywhere with zero configuration. If your users are concentrated in one region, ElastiCache is faster. If your users are global, KV is simpler and often faster for distant users.
When ElastiCache wins: Applications requiring sub-millisecond cache access within a region. Complex data structures (Redis sorted sets, lists, pub/sub). Strong consistency requirements where eventual consistency is unacceptable.
When KV wins: Global applications where configuration simplicity matters. Use cases where 2-10ms is acceptable. Teams wanting zero infrastructure management. Cost-sensitive projects where per-operation pricing beats always-on clusters.
DAX is specialised: DAX exists specifically to accelerate DynamoDB reads. If you're using DynamoDB and need microsecond reads, DAX is purpose-built. It's not a general-purpose cache and doesn't compare directly to KV.
Hyperdrive: paying the geography tax once
Every database connection from a Worker pays a geography tax. A Worker in Sydney connecting to PostgreSQL in us-east-1 must complete TCP handshake (one round-trip), TLS negotiation (two round-trips minimum), and PostgreSQL authentication (one or more round-trips) before executing a single query. Each round-trip from Sydney to us-east-1 costs 150-200 milliseconds, so a simple query that takes 5ms to execute might take 600ms to complete, with 99% of that time spent on connection overhead.
Connection overhead isn't latency; it's tax. Hyperdrive doesn't make your database faster; it eliminates the distance penalty.
Workers compound this problem, since each request might run in a different isolate and potentially a different location, with no persistent connection pool across requests. Every query risks paying the full connection tax.
Hyperdrive maintains connection pools near your database. Your Worker connects to the nearest Hyperdrive node (fast, because it's nearby) and Hyperdrive uses an existing pooled connection to your database. The connection setup cost, paid once when the pool was established, is amortised across thousands of queries.
That 600ms query becomes 50ms not because the query got faster but because the ceremony disappeared.
The developer experience
Hyperdrive presents itself as a connection string. Your code doesn't know it's using Hyperdrive:
import postgres from "postgres";
export default {
async fetch(request: Request, env: Env) {
const sql = postgres(env.HYPERDRIVE.connectionString);
const users = await sql`SELECT * FROM users WHERE active = true`;
return Response.json(users);
}
};
The binding provides a connection string routing through Hyperdrive's infrastructure. Your database driver connects to what appears to be a local endpoint. Hyperdrive handles the geographic complexity invisibly.
Existing database code works unchanged. Don't refactor queries, don't learn new APIs, don't adapt to a proprietary interface. Point your connection string at Hyperdrive; everything else stays the same.
Quantifying the improvement
Improvement magnitude depends on geography. Users close to your database see modest gains; users far from it see transformative ones.
For a database in us-east-1 with a 5ms query execution time: a cold connection from New York adds 50-100ms overhead, reduced by Hyperdrive to 30-50ms total (2-3x improvement). From Sydney: 400-600ms overhead reduced to 50-80ms (10x improvement). From São Paulo: 300-500ms reduced to 40-70ms. From Frankfurt: 150-250ms reduced to 30-50ms.
Hyperdrive's value scales with geographic distance. If your users are concentrated near your database, the benefit is modest. If your users are global, the benefit is dramatic.
How connection pooling changes architecture
Without Hyperdrive, architectural decisions bend around latency; you might colocate Workers with your database using placement hints or Smart Placement (sacrificing global distribution for acceptable latency), implement aggressive caching, or accept that certain features are slow for distant users.
Hyperdrive removes this constraint by allowing Workers to run at the edge near users while still querying your database without prohibitive latency, making features that seemed impractical become feasible.
Latency doesn't disappear; your database is still in one region and a query from Sydney still travels to us-east-1 and back. However, the overhead is gone, so you pay actual query latency rather than connection ceremony latency.
Transaction integrity
Connection pooling raises an obvious concern about transactions if connections are shared, but Hyperdrive handles this correctly. When you begin a transaction, that connection is dedicated to your request until the transaction commits or rolls back, so your transaction isolation is maintained.
Long-running transactions hold connections. Under high load with long transactions, you might exhaust the pool. Keep transactions short (seconds, not minutes) and you'll stay within capacity.
Query caching: trading freshness for speed
Hyperdrive can cache query results. Identical queries with identical parameters return cached results without hitting your database.
Not all queries benefit equally:
| Query Pattern | Caching Value | Recommendation |
|---|---|---|
| Lookup by immutable ID | High | Enable, TTL of hours to days |
| Reference data (countries, currencies) | High | Enable, TTL matching update frequency |
| User-specific data queried repeatedly | Medium | Enable with short TTL if staleness acceptable |
| Aggregations for dashboards | Medium | Enable, TTL matching reporting cadence |
| Data with temporal conditions | Low | Disable; conditions change results |
| Write-then-read patterns | Negative | Disable; cache prevents seeing writes |
The fundamental tradeoff mirrors KV in that caching trades freshness for speed; writes through Hyperdrive don't automatically invalidate cached reads, so if you write a row and immediately read it, you might get the cached old value.
For read-heavy workloads with repeating queries and tolerance for staleness, enable caching. For write-heavy workloads or queries requiring current data, disable it. There's no middle ground where caching is "smart enough" to know when to invalidate.
Hyperdrive failure modes
Hyperdrive has characteristic failure modes worth naming.
Origin unreachable is straightforward when the database is down, the network is partitioned, or credentials are invalid; queries fail immediately with connection errors, so implement retry logic for transient failures and surface permanent failures to users.
Pool exhaustion under load manifests subtly when all pooled connections are in use and new queries must wait. You'll see rising latency before you see errors, with a query that normally takes 50ms potentially taking 500ms whilst waiting for a connection.
Detection requires monitoring query duration, not just error rates. If p99 latency spikes while p50 remains stable, you likely have pool pressure. If long transactions hold connections, shorten them. Hyperdrive doesn't expose pool metrics directly, so you infer pool state from query timing.
Stale cache after write is the caching consistency trap when you write a row and then read it; if caching is enabled, the read might return the cached pre-write value, so disable caching for those queries if correctness depends on reading your own writes.
Credential rotation race occurs during database credential updates when you update the database to reject old credentials before Hyperdrive fully adopts new ones, causing connections to fail. The safe rotation sequence is to add new credentials to your database, update Hyperdrive configuration, verify new credentials work, and then remove old credentials.
Origin timeout cascade compounds under load when your database becomes slow and queries time out, with each timeout holding a pooled connection. As the pool fills with timing-out queries, new queries wait and then also timeout. The mitigation is aggressive timeouts and circuit breaking, because if your database is struggling, failing fast protects the system better than waiting hopefully.
Multi-region database topologies
If your database has read replicas in multiple regions, Hyperdrive's interaction with that topology deserves consideration.
Hyperdrive connects to a single database endpoint. It doesn't automatically route to the nearest replica. For read-heavy workloads with existing multi-region replicas, consider multiple Hyperdrive configurations, one per replica region. Route read queries to the geographically appropriate instance; writes go to the primary.
If you're building new infrastructure, consider whether D1 with read replicas better fits your needs. D1 handles global distribution natively; Hyperdrive accelerates access to existing infrastructure but doesn't create geographic distribution.
Supported databases
Hyperdrive supports PostgreSQL and MySQL. PostgreSQL receives primary focus with full feature support. MySQL is supported, but verify your specific variant works because Aurora MySQL, PlanetScale, and vanilla MySQL behave differently in edge cases.
Database connections require TLS. Hyperdrive refuses unencrypted connections. If your database doesn't support TLS, you cannot use Hyperdrive.
Hyperdrive vs RDS Proxy and alternatives
Teams on AWS often compare Hyperdrive to RDS Proxy. The services solve similar problems but with different architectural assumptions.
| Aspect | Hyperdrive | RDS Proxy | PgBouncer (self-managed) |
|---|---|---|---|
| Primary purpose | Reduce edge-to-database latency | Manage Lambda connection storms | Connection pooling |
| Geographic distribution | Pools near database, edge access | Same region as database | Where you deploy it |
| Cold connection overhead | Eliminated (pool maintains connections) | Eliminated | Eliminated |
| Query caching | Built-in (optional) | None | None |
| Latency reduction | 10x for distant users | Minimal (same-region only) | None |
| Management | Fully managed | Fully managed | Self-managed |
| Pricing | Free (included with Workers) | Per-vCPU-hour (~$0.015) | Infrastructure costs |
| Supported databases | PostgreSQL, MySQL | PostgreSQL, MySQL, MariaDB | PostgreSQL |
The architectural distinction: RDS Proxy solves the "too many connections" problem that Lambda creates, where hundreds of concurrent functions each open database connections. Hyperdrive solves that problem plus the geography problem, eliminating repeated connection overhead for edge-deployed Workers worldwide. RDS Proxy assumes your compute runs in the same region as your database; Hyperdrive assumes your compute runs globally.
When RDS Proxy suffices: Compute and database are in the same AWS region. Lambda functions connecting to RDS without edge deployment. Existing AWS infrastructure where adding another service creates operational overhead.
When Hyperdrive wins: Workers deployed globally need database access without prohibitive latency. The query caching feature provides additional performance benefits. You want the same solution for connection pooling and geographic acceleration.
An important note: If you're using RDS or Aurora with Lambda in the same region, RDS Proxy is the natural choice. If you're using Workers with any PostgreSQL or MySQL database, Hyperdrive provides more value because it solves both the connection pooling and geography problems simultaneously.
Choosing between KV, D1, and Hyperdrive
The first question isn't "how fast?" but rather "how stale?" Once you've answered that, the storage choice often makes itself.
Start with consistency
If you need strong consistency (reads always return the latest write), eliminate KV immediately. Choose between D1 (Cloudflare-native, global with read replicas) and Hyperdrive (your existing database, accelerated).
If eventual consistency is acceptable, KV's speed and simplicity win. Don't over-engineer with D1 what KV handles elegantly.
Consider data shape
Relational data with queries across rows belongs in D1 or Hyperdrive. KV is key-value only; you can't query by attributes, join, or filter. If you need "all users in London" or "orders above 100," KV is structurally incapable.
Simple lookups by known key are KV's strength: session by token, configuration by name, cached response by request hash.
Evaluate write frequency
Write-heavy workloads favour D1. KV's 10:1 write-to-read cost ratio makes frequent writes expensive. At current pricing, a workload with 90% reads and 10% writes costs roughly the same in KV and D1. As write percentage increases, D1 becomes cheaper.
Account for existing infrastructure
If you have PostgreSQL or MySQL running elsewhere, Hyperdrive provides immediate value without migration. You keep your existing database, backups, and operational knowledge.
If you're starting fresh, D1 eliminates external dependencies. No database to manage outside Cloudflare, no connection strings to secure, no pool exhaustion to monitor.
Hyperdrive as architecture, not just bridge
The framing of Hyperdrive as a "bridge" undersells its value. Many production systems run permanently with Workers connecting to external databases via Hyperdrive. This isn't a compromise or transition state. It's a valid architectural choice.
Hyperdrive fits as a permanent solution when your database requires PostgreSQL or MySQL capabilities D1 lacks (extensions, stored procedures, specific data types), when your database serves multiple applications beyond Workers, when your team has established operational practices around your existing database, or when migration risk exceeds any benefit D1 would provide. The connection pooling and query caching make external databases perform well from the edge; you're not accepting poor performance as the price of compatibility.
D1 fits better when you're starting fresh without external database dependencies, when you want global read distribution through replicas, when you prefer fully managed operations within Cloudflare's ecosystem, or when your data model aligns with D1's horizontal scaling patterns.
The hybrid approach works well for incremental adoption: new features built on D1, existing features continuing to use PostgreSQL via Hyperdrive. This lets you evaluate D1 with real workloads while maintaining continuity for proven systems. Migration can happen feature by feature, or the hybrid state can persist indefinitely if that's what serves your architecture best.
The decision matrix
| Your Situation | Choose | Because |
|---|---|---|
| Read-heavy, staleness acceptable | KV | Sub-10ms global reads, simple model |
| Need relational queries, new project | D1 | Native integration, no external dependencies |
| Have PostgreSQL/MySQL with established operations | Hyperdrive | Accelerates existing infrastructure permanently |
| Need coordination or counters | Durable Objects | KV can't coordinate; D1 is too slow |
| Require PostgreSQL-specific features | Hyperdrive | D1 is SQLite-based; keep your extensions |
| Database serves multiple applications | Hyperdrive | Shared data layer across Workers and other systems |
| Configuration and feature flags | KV | Perfect match for the pattern |
| User profiles with search | D1 | Need relational queries |
| Session tokens | KV | Read-heavy, TTL matches sessions |
| Write-heavy with consistency needs | D1 or Hyperdrive | KV pricing penalises writes; either relational option works |
| Uncertain about D1 migration | Hyperdrive | Start with existing database, evaluate D1 later |
Operational monitoring
Neither KV nor Hyperdrive exposes detailed internal metrics, but both reveal health through observable behaviour.
For KV, monitor cache hit rates through your application. If hit rates drop unexpectedly, investigate whether write frequency increased, TTLs are misconfigured, or negative caching is trapping you.
For Hyperdrive, monitor query duration percentiles. Stable p50 with rising p99 suggests pool pressure. Rising p50 suggests database issues. Track error rates separately from latency; connection failures and query failures have different causes.
Both services integrate with Cloudflare's analytics, but instrument your application for production alerting. The metrics that matter are application-level: did this request get the data it needed, and how long did it take?
What comes next
This chapter completes Part IV: Data & Storage.
D1 provides relational storage with strong consistency. R2 provides object storage with S3 compatibility and zero egress fees. KV provides global key-value caching for configuration and read-heavy data where staleness is acceptable. Hyperdrive accelerates access to external databases. Durable Objects storage (Chapter 6) provides per-entity state with strong consistency and coordination.
Each primitive has a purpose. Choosing correctly isn't about capability but about matching data characteristics to storage characteristics.
Part V explores AI capabilities: Workers AI for inference at the edge, Vectorize for embeddings, AI Search for managed RAG pipelines, and the Agents SDK for AI applications with state and tool use. These services integrate with the storage options covered here: embeddings in Vectorize, documents in R2, metadata in D1, conversation state in Durable Objects.