Skip to main content

Chapter 11: Choosing the Right Storage

Where should I store this data, and what trade-offs am I making?


Part III covered compute: Workers, Durable Objects, Workflows, Queues, and Containers. Part IV turns to where that compute stores its data.

Cloudflare doesn't offer a 100 GB database option; this isn't a missing feature or a temporary limitation but a design philosophy. D1 databases cap at 10 GB each because Cloudflare wants you to have ten databases, not one large one. Understanding why reveals how to think about all storage choices on the platform.

Why Five Storage Options

Traditional databases try to be everything: relational queries, key-value access, blob storage, real-time subscriptions. This works in a single location but fails at the edge, where "single location" is the opposite of the point. No single storage abstraction performs well for all access patterns globally. Global reads require different architecture than relational queries; large objects need different handling than coordination state.

Cloudflare's answer: specialisation. KV for globally cached reads, D1 for relational queries, R2 for objects and files, Durable Objects for coordination, Hyperdrive for existing external databases. Match the primitive to your access pattern, or pay the tax of fighting the platform.

The five primitives

Each storage option makes different trade-offs. Understanding them lets you predict behaviour in novel situations and reason about any storage question without consulting documentation.

KV trades write speed and consistency for global read performance. Data replicates to edge locations worldwide, enabling sub-10ms reads from anywhere. Writes propagate asynchronously over roughly 60 seconds. Tolerate that staleness window and KV is extraordinarily fast. Require fresher data and KV will cause bugs that only appear under load.

Key Architectural Insight

D1 trades global distribution for relational capabilities and strong consistency. D1 is a Durable Object fronted by a Worker proxy. Each database runs on a single-threaded SQLite instance with optional read replicas. You get SQL queries, joins, indexes, and immediate consistency, but reads from distant locations pay latency to reach the primary. The 10 GB limit isn't arbitrary; it's a consequence of this architecture and a signal that horizontal scaling is the intended pattern.

R2 trades query capabilities for object handling and economics. It stores files and blobs with an S3-compatible API. There are no queries or relations, but objects can be 5 TB each and egress is free. For read-heavy file storage, R2's economics are transformative.

Durable Objects storage trades throughput for coordination guarantees. Each object has private SQLite storage accessible only through single-threaded execution. Perfect consistency within each object, but throughput is limited to one object per entity. You choose Durable Objects for the execution model; the storage comes with it.

Hyperdrive trades native integration for compatibility. It accelerates existing PostgreSQL and MySQL databases without migration: connection pooling at the edge, query caching for repeated queries. This is a valid long-term architecture, not merely a bridge. Many production workloads run permanently with Hyperdrive connecting Workers to external databases, particularly when those databases require PostgreSQL-specific features, exist within established operational practices, or serve as a shared data layer across multiple applications.

The horizontal scaling philosophy

Every Cloudflare storage option is designed for horizontal scaling. Fighting it creates friction; working with it unlocks patterns that would be expensive elsewhere.

D1 databases are limited to 10 GB each because Cloudflare wants you to have multiple databases. KV stores individual keys, not collections; billions of keys, each independent. R2 stores individual objects with no cross-object queries. Durable Objects give you one object per entity, storage scoped to each.

Mental Model Shift

The pattern is many small things, not one big thing. This inverts the traditional mentality of "scale up until you can't, then shard." On Cloudflare, you start sharded. Your application logic handles distribution from day one.

A multi-tenant SaaS application creates a database per tenant. A user-facing application distributes sessions across KV keys. A file storage system stores objects independently in R2.

If you're asking "how do I make D1 store more than 10 GB?", you're asking the wrong question. Ask instead: "how do I partition my data across multiple databases?" Chapter 12 covers these patterns.

Mental model shifts

Coming from AWS, Azure, or GCP, some assumptions need adjustment.

If you think...Cloudflare thinks...
"I need one big database""I need many small databases"
"Caching is an optimisation I add later""KV is the primary store for read-heavy data"
"I'll pay egress costs on storage""R2 eliminates them entirely"
"Connection pooling is a backend concern""Hyperdrive pools at the edge"
"I'll add coordination logic around my database""Durable Objects provide coordination natively"
"Eventual consistency is a fallback""KV's eventual consistency enables global performance"

The shift from "one big database" to "many small databases" matters most. Traditional cloud architectures centralise data and distribute compute. Cloudflare distributes both. Your data architecture must accommodate this from the start.

The decision framework

Three questions determine which storage fits your data: access pattern, consistency requirements, and coordination needs. Work through them in order. The first clear answer is usually correct.

Access pattern

Start with how the data will be read and written.

Read-heavy workloads with rare changes belong in KV. Data propagates to edge locations worldwide. Reads complete in under 10ms from anywhere. Writes are slow (60 seconds to propagate), but for data that changes hourly or daily, this is irrelevant. Configuration, feature flags, and cached responses fit perfectly.

Read-heavy workloads with frequent changes need D1, potentially with read replicas. User profiles, product catalogues, and content that changes throughout the day fit here. Without replicas, reads hit the primary with strong consistency but variable latency by geography. With replicas, reads are fast globally but may be slightly stale; the sessions API ensures users see their own recent writes while tolerating staleness from others.

Write-heavy workloads need D1 or Durable Objects. KV's write propagation makes write-heavy workloads expensive and eventually consistent in problematic ways. D1 handles writes efficiently with strong consistency. Durable Objects handle writes with coordination guarantees when multiple requests might conflict.

Never Store Files in D1

Large objects belong in R2 regardless of access pattern. Files, images, documents, exports; anything measured in megabytes; go to R2. SQLite handles blobs poorly, and you'll hit row size limits.

Ask your team: What percentage of your current database operations are reads versus writes? If reads dominate by 10:1 or more, KV may handle more of your workload than you expect.

Consistency requirements

The trade-offs here reflect a fundamental tension in distributed systems: safety versus liveness. Safety guarantees nothing bad happens; no inconsistent reads, no lost writes. Liveness guarantees something good eventually happens; requests complete, systems make progress. You cannot maximise both. KV prioritises liveness: requests always succeed, even if stale. Durable Objects prioritise safety: requests may wait, but never see inconsistent state. Neither is better; they serve different needs.

KV Consistency Behaviour

KV provides eventual consistency. Writes acknowledge immediately but haven't propagated everywhere. For up to 60 seconds, different edge locations may return different values. KV doesn't guarantee monotonic reads: a user routed to Sydney might see a new value, then their next request routed to Singapore might see an older one.

Safe for data where brief staleness doesn't cause bugs: configuration, cached API responses, feature flags. Dangerous for data where reads must reflect recent writes: counters, balances, inventory levels, anything where concurrent operations must coordinate.

D1 provides strong consistency within each database. Writes are immediately visible to subsequent reads on the primary. Read replicas introduce eventual consistency: replicas may lag slightly behind the primary, though the sessions API ensures read-your-writes consistency for individual users.

Durable Objects provide linearisability per object: the strongest consistency guarantee. Operations appear to occur instantaneously at some point between invocation and response. Output gating ensures external visibility only after durability is confirmed. But this consistency is scoped to individual objects; operations on different objects are independent.

Ask your team: Where do you rely on reading data immediately after writing it? Those paths need D1 or Durable Objects, not KV.

Coordination requirements

Many workloads need more than storage; they need coordination. "Has this user exceeded their rate limit?" requires reading a counter, comparing to a limit, and incrementing; atomically, with no race condition. "What's in this user's shopping cart?" requires immediate consistency as items change. "Who's currently in this chat room?" requires tracking connections and broadcasting messages.

Race Condition

D1 can store this data but can't coordinate access. Two Workers handling concurrent requests could both read a counter from D1, both see "99", both allow the request, and both write "100"; violating a limit of 100. Transactions help but don't eliminate the problem; you'd need pessimistic locking, which D1 doesn't provide.

Durable Objects eliminate this class of bug. Each object processes requests serially. Two requests to the same rate limiter cannot interleave; the first completes entirely before the second begins. No locks, no transactions, no race conditions. Just sequential execution.

The trade-off is throughput. A single Durable Object handles one request at a time. Higher throughput requires more objects, which means partitioning data across them. For rate limiting, one object per user works naturally. For a global counter, partition across multiple counter objects and aggregate. Chapter 6 covers these patterns.

Ask your team: Where do you use locks, transactions with retries, or optimistic concurrency with version numbers? Those are coordination problems Durable Objects solve more cleanly.

Durable object storage versus D1

When building with Durable Objects, should the object store state internally (using private SQLite storage) or externally (in D1)?

Use internal storage when data belongs exclusively to that object and doesn't need cross-entity queries. A rate limiter's counter, a chat room's message history, a user session's state; these are scoped to a single object and never queried across objects. Internal storage keeps coordination and data together, simplifying architecture.

Use D1 when you need cross-entity queries or when data outlives the coordination pattern. "Find all chat rooms with more than 100 messages" or "list all sessions for this user" require D1's query capabilities. If you want historical data queryable after real-time coordination ends, flush state to D1 periodically.

The hybrid pattern works well: Durable Objects hold authoritative live state with internal storage, periodically flushing to D1 for queryability and historical analysis. The object is the source of truth for now; D1 is the source of truth for before.

Geographic considerations

Durable Objects have different geographic characteristics than Workers. Workers run in over 300 cities worldwide. Durable Objects require heavier infrastructure for storage and coordination guarantees, so they're deployed to fewer locations.

For users in North America, Europe, and much of Asia, this rarely matters; Durable Object locations are dense enough that latency stays low. For users in parts of Africa, South America, and other regions with sparser infrastructure, Durable Objects may add noticeable latency.

If your users concentrate in well-served regions, Durable Objects work well for latency-sensitive coordination. If you're serving users globally including underserved regions, verify latency characteristics before committing to a Durable Objects-heavy architecture.

D1 and KV don't share this constraint. KV replicates to all edge locations. D1's primary may be distant, but read replicas can be placed closer to users.

Common patterns

Abstract frameworks clarify; concrete examples confirm. These patterns show the decision framework applied to real scenarios.

Configuration and feature flags

Configuration changes rarely (perhaps daily at most), but every request might read it. This purest read-heavy, rarely-written pattern makes KV the right choice. Edge caching delivers sub-10ms reads globally. The 60-second propagation delay is irrelevant; nobody expects a feature flag change to take effect instantly worldwide.

Using D1 for configuration would execute a database query on every request, paying for latency and operations when a cached read would suffice. D1's capabilities are wasted on essentially static data. At scale, millions of KV reads cost far less than millions of D1 queries.

User sessions

Sessions are read-heavy (every authenticated request reads the session) with rare writes on login, logout, and refresh. This suggests KV, and for most applications, KV works well.

The nuance: security implications of eventual consistency. KV's 60-second propagation means a logged-out session remains valid at some edge locations temporarily. For most applications, this brief post-logout window is acceptable. For high-security applications requiring instant global logout, consider Durable Objects for session storage (immediate consistency, more complexity) or short TTLs with frequent refresh (sessions expire quickly regardless of logout propagation).

The right choice depends on threat model. Consumer applications can tolerate 60 seconds of post-logout validity. Banking applications probably can't.

User profiles and application data

Profiles need capabilities only D1 provides: queries to find users by email, filters by status, pagination for lists, updates as users change settings, relations connecting users to posts, orders, and subscriptions. D1 exists to serve this relational pattern.

KV can't query across keys. R2 is for files, not records. Durable Objects are overkill for data that doesn't need coordination. D1 is the relational database for the platform; relational data belongs there.

For global applications where read performance matters, D1 with read replicas provides the best balance: strong consistency for writes, low-latency reads from nearby replicas, and the sessions API ensuring users see their own changes immediately.

File storage

Files (user uploads, generated exports, media, documents) belong in R2 without exception. The S3-compatible API means existing code often works with minimal changes. Presigned URLs enable direct browser-to-R2 uploads, bypassing Workers entirely for large files and avoiding the 128 MB memory constraint.

R2's zero egress fees transform read-heavy file storage economics. A workload serving 10 TB monthly costs roughly $900 on S3 but only storage costs on R2; a 98% reduction. Chapter 13 quantifies this.

Never store files in D1. SQLite handles blobs poorly, row size limits constrain you, and database backups balloon with binary data. Store file metadata in D1 if you need to query it; store files themselves in R2.

Rate limiting and counters

Counters seem like simple key-value data but require coordination. A rate limiter must atomically check the current count, compare against a limit, and increment; no race condition between steps. KV's eventual consistency creates races: two concurrent requests both read "99", both allow their operations, both write "100", exceeding a limit of 100 at 101.

Durable Objects eliminate this failure mode. Single-threaded execution means increment operations cannot interleave. One request completes entirely before the next begins. The storage is incidental; you choose Durable Objects for coordination, and storage comes with it.

Pattern: one object per entity being limited. One rate limiter per user, one counter per resource. Each object handles all operations on its entity serially.

Real-time features

Chat rooms, collaborative documents, and multiplayer games need to track connections, broadcast messages, and maintain shared state. This means Durable Objects. The hibernatable WebSockets API holds thousands of connections efficiently while single-threaded execution keeps state consistent as messages arrive and users join or leave.

This isn't primarily a storage decision; it's a compute decision. You choose Durable Objects for the execution model and WebSocket support. Storage serves the coordination.

The "just use Postgres" question

Technical leaders will ask: why not use Aurora or Cloud SQL via Hyperdrive for everything? PostgreSQL is battle-tested, teams know it, and Hyperdrive makes it accessible from Workers.

This is a legitimate architecture. Many production systems run Workers with Hyperdrive connecting to external PostgreSQL or MySQL databases as their permanent data layer. The question isn't whether this works; it does, and well; but what trade-offs you're making.

Latency considerations. Hyperdrive eliminates connection overhead through pooling and reduces round-trips through query caching, but your database still lives in one region. A D1 database with read replicas distributes reads globally; a PostgreSQL database in us-east-1 serves reads from us-east-1 regardless of where your user sits. For read-heavy workloads with global users, this latency difference matters. For workloads concentrated near your database region, or where database queries aren't in the critical path, it may not matter at all.

Cost trade-offs. External databases have their own pricing: instance hours, storage, I/O operations, potentially egress. D1's pricing scales with actual usage without idle instance costs. For variable workloads, D1's model is often cheaper. For steady workloads with reserved capacity, the economics may favour your existing database.

Operational considerations. An external database means managing upgrades, backups, monitoring, and scaling decisions; but your team may already have established practices for this. D1 eliminates this operational burden but introduces a new system to learn. The question is whether operational simplicity or operational continuity matters more for your team.

Capability differences. PostgreSQL offers features D1 lacks: extensions (PostGIS, pg_trgm, pgvector), stored procedures, advanced data types, and decades of ecosystem tooling. If your workload uses these capabilities, Hyperdrive lets you keep them. If you're using PostgreSQL as "SQL that happens to be PostgreSQL," D1 may serve equally well.

Hyperdrive is the right permanent choice when your database requires PostgreSQL-specific features, when your team has established operational practices around PostgreSQL, when your database serves multiple applications (not just Workers), or when migration risk exceeds migration benefit. It's also the right starting point when you're uncertain; you can always migrate to D1 later if the trade-offs favour it.

D1 is the right choice for new projects without PostgreSQL dependencies, for applications benefiting from global read distribution, and for teams wanting fully managed operations within Cloudflare's ecosystem.

Ask your team: What PostgreSQL features do you actually use that SQLite lacks? The answer guides whether Hyperdrive or D1 fits better; not which is "better" in the abstract.

A concrete architecture

Consider a project management SaaS application. How would its data distribute across Cloudflare's storage options?

Tenant configuration lives in KV. Each tenant's settings (feature flags, plan limits, customisation options) are read on every request but change rarely. A key pattern like tenant:{id}:config provides fast global reads. Configuration updates propagate within 60 seconds, acceptable for this use case.

Project and task data lives in per-tenant D1 databases. Each tenant gets their own database, enabling queries within a tenant while maintaining isolation. The 10 GB limit is generous for most tenants; large enterprises might shard by project or time period. Read replicas provide fast global reads for distributed teams.

File attachments live in R2. Task attachments, project documents, and exports are stored as objects with presigned URLs for direct upload and download. D1 stores metadata for querying; R2 stores the bytes.

Real-time presence (who's viewing this project, where their cursor is) lives in Durable Objects. One object per project maintains WebSocket connections, tracks active users, and broadcasts cursor positions. This state is ephemeral; when everyone disconnects, the object sleeps. For collaboration features like simultaneous document editing, the Durable Object coordinates changes before persisting results to D1.

External integrations might use Hyperdrive. If the application integrates with a customer's existing PostgreSQL database for data synchronisation, Hyperdrive accelerates those queries.

This architecture plays to each primitive's strengths. No single primitive handles everything; composition handles everything.

Cost architecture

Cloudflare prices storage based on what costs them resources.

KV charges more for writes than reads (roughly 10:1) because writes propagate globally. A read hits cached data at the edge, cheap to serve. A write flows to the coordination layer and propagates worldwide, expensive to execute. Pricing aligns your incentives with the system's economics.

D1 charges per row read and written, not per query. A query scanning 10,000 rows costs ten times more than one scanning 1,000. Index optimisation becomes a cost concern, not just performance. An unindexed query scanning your entire table is expensive twice over: slow and costly.

R2 charges for storage and operations, not egress. This inverts the S3 model. For read-heavy workloads with significant egress, R2 saves dramatically.

Durable Objects charge for requests, compute duration, and storage. Pricing reflects the coordination guarantees.

Worked example

An application serving 10 million daily requests, each requiring one configuration read and one database query averaging 100 rows.

KV for configuration: 10 million reads at $0.50 per million = $5/day. D1 instead: 10 million queries reading 100 rows each = 1 billion rows at $0.001 per million = $1/day, but with higher latency on every request.

D1's cost appears lower, but KV's sub-10ms reads versus D1's variable latency affects user experience. For configuration data that rarely changes, KV's caching model delivers better performance at acceptable cost.

For database queries: 10 million queries averaging 100 rows = 1 billion rows daily = roughly $30/month on D1. Compared to a provisioned Aurora instance running continuously, D1's per-query pricing often wins for variable workloads.

The meta-lesson: storage choices affect both performance and cost. Optimise for actual access patterns, not abstract capabilities.

Combining storage options

Real applications use multiple storage types. The architectural question: how do they interact?

Configuration in KV, everything else in D1 is the simplest combination. KV holds feature flags, settings, and cached values that rarely change. D1 holds application data. No coordination needed; they serve different purposes.

Metadata in D1, objects in R2 separates queryable attributes from blob storage. Store file references, permissions, and descriptive fields in D1. Store actual files in R2. Query D1 to find files; retrieve files directly from R2 via presigned URLs.

Coordination in Durable Objects, persistence in D1 handles real-time requirements. Durable Objects manage WebSocket connections, presence detection, and live updates. Periodically flush state to D1 for durability and queryability. The Durable Object is authoritative for live state; D1 for historical data.

Cache in KV, source in D1 implements read-through caching. Check KV first, fall back to D1 on miss, populate KV on miss. Invalidate on write: update D1, delete or update KV. Accept brief staleness.

The key principle: clear ownership. One store is authoritative; others are caches or derivatives. Ambiguous ownership causes consistency bugs.

Storage mismatch failures

Choosing the wrong storage produces characteristic failures. Naming them helps diagnose problems.

Consistency collision: using KV for data requiring immediate read-after-write visibility. You update a user's subscription status in KV, redirect them to the premium page, and it reads the old value because propagation hasn't completed. The user sees "not subscribed" seconds after paying. Fix: subscription status needs D1's consistency, not KV's performance.

Coordination gap: using D1 for data requiring serialised access. Two concurrent requests read a counter from D1, both see "99", both allow their operations, both write "100". Your rate limit of 100 was exceeded at 101. Fix: Durable Objects, where sequential execution eliminates the race by design.

Query impedance: storing structured data in KV or R2 then needing to query it. You stored user preferences as KV key-value pairs because reads are fast. Now you need "find all users with dark mode enabled." KV can't query; you're stuck iterating all keys. Fix: D1 for queryable data, with KV as a cache layer if needed.

Blob bloat: storing large objects in D1 or KV instead of R2. A 50 MB file in D1 makes your database unwieldy and hits size limits. Large values in KV add latency and hit the 25 MB limit. Fix: R2 for objects, with references stored in D1 or KV.

Partition avoidance: failing to embrace horizontal scaling. You try to store everything in one D1 database and hit the 10 GB limit. You create one Durable Object for all coordination and hit throughput limits. Fix: accept Cloudflare's horizontal model. Database-per-tenant. Object-per-entity.

Migration paths

Storage choices aren't permanent. As applications evolve, storage needs change.

KV to D1: when you need queries or stronger consistency. If you're encoding structure into keys (user:123:profile, user:123:settings) and parsing them on read, you want a database. Migrate by reading all keys, transforming into rows, and inserting into D1.

D1 to Durable Objects: when you need coordination. If you're building locks around D1 access, implementing optimistic concurrency with version numbers, or handling race conditions with retry loops, you've outgrown D1's model. Migration means restructuring around the actor model: one object per coordinated entity.

Hyperdrive to D1: an optional migration path, not an inevitable one. If you started with Hyperdrive and D1's model now fits better (perhaps your workload evolved toward multi-tenant patterns, or you want global read distribution), migrate incrementally by table or feature. But many systems stay on Hyperdrive permanently because the external database serves them well.

D1 to Hyperdrive: when you've discovered D1's horizontal model doesn't fit your evolved requirements. If you need cross-database joins, PostgreSQL extensions, or your data model has grown beyond what horizontal partitioning serves well, Hyperdrive connects you to PostgreSQL or MySQL without leaving Cloudflare's compute layer.

Staying on Hyperdrive permanently: a valid architectural choice, not a failure to migrate. If your external database works well, your team has established operational practices, and migration risk exceeds any benefit D1 would provide, keep using Hyperdrive. The connection pooling and query caching make it perform well at the edge.

Decision checklist

When uncertain, work through these questions.

D1 is the default because it handles the widest variety of workloads adequately and is the easiest to migrate from. Data in D1 can move to KV for read performance, Durable Objects for coordination, or stay where it is. Starting with D1 keeps options open.

What comes next

The next three chapters go deep on each storage type:

Chapter 12: D1 covers SQLite at the edge: the 10 GB model, multi-database architectures, read replicas, and when the relational model fits.

Chapter 13: R2 covers object storage: S3 compatibility, egress economics, and operational patterns.

Chapter 14: KV and Hyperdrive covers caching and external database integration: consistency behaviour, cache strategies, and when to bridge versus migrate.

Each chapter assumes you've already decided that storage type fits your needs. If unsure, return to this chapter's framework. The decision matters more than mastering any single option's features.