Best practices for DynamoDB global table design - Amazon DynamoDB

Best practices for DynamoDB global table design

Global tables build on Amazon DynamoDB’s global footprint to provide you with a fully managed, multi-Region, and multi-active database that delivers fast, local, read and write performance for massively scaled, global applications. With global tables your data replicates automatically across your choice of AWS Regions. Because global tables use existing DynamoDB APIs, no changes to your application will be needed. There are no upfront costs or commitments for using global tables, and you pay only for the resources you use.

Prescriptive guidance for DynamoDB global table design

Efficient use of global tables requires careful considerations of factors like your preferred write mode, routing model, and evacuation processes. You must instrument your application across every Region and be ready to adjust your routing or perform an evacuation to maintain global health. The reward is having a globally distributed data set with low-latency reads and writes and a 99.999% service level agreement.

Key facts about DynamoDB global table design

  • There are two versions of global tables: the current version Global Tables version 2019.11.21 (Current) (sometimes called "V2"), and Global tables version 2017.11.29 (Legacy) (sometimes called "V1"). This guide focuses exclusively on the current version, V2.

  • Without the use of global tables, DynamoDB is a Regional service. It is highly available and intrinsically resilient to failures of infrastructure with a Region, including the failure of an entire availability zone (AZ). A single-Region DynamoDB table has a 99.99% availability https://aws.amazon.com/dynamodb/sla/Service Level Agreement (SLA).

  • With the use of global tables, DynamoDB allows a table to replicate its data between two or more Regions. A multi-Region DynamoDB table has a 99.999% availability SLA. With proper planning, global tables can help create an architecture that is resilient and resists Regional failures.

  • Global tables employ an active-active replication model. From the perspective of DynamoDB, the table in each Region has equal standing to accept read and write requests. After receiving a write request, the local replica table will replicate the write to other participating Regions in the background.

  • Items are replicated individually. Items updated within a single transaction may not be replicated together.

  • Each table partition in the source Region replicates its writes in parallel with every other partition. The sequence of writes within the remote Region may not match the sequence of writes that happened within the source Region. For more information about table partitions, see the blog post Scaling DynamoDB: How partitions, hot keys, and split for heat impact performance.

  • A newly written item is usually propagated to all replica tables within a second. Nearby Regions tend to propagate faster.

  • Amazon CloudWatch provides a ReplicationLatency metric for each Region pair. It is calculated based on looking at arriving items and comparing their arrival time with their initial write time and computing an average. Timings are stored within CloudWatch in the source Region. Viewing the average and maximum timings can help determine the average and worst-case replication lag. There is no SLA on this latency.

  • If the same item is updated at about the same time (within this ReplicationLatency window) in two different Regions, and the second write happens before the first write was replicated, there's a potential for write conflicts. Global tables resolves such conflicts with a last writer wins mechanism, based on the timestamp of the writes. The first write "loses" to the second write. These conflicts are not recorded in CloudWatch or AWS CloudTrail.

  • Each item has a last write timestamp held as a private system property. The last writer wins approach is implemented by using a conditional write that requires the incoming item’s timestamp be greater than the existing item’s timestamp.

  • A global table will replicate all items to all participating Regions. If you want to have different replication scopes, you can create different tables and give each of the tables different participating Regions.

  • Writes will be accepted to the local Region even if the replica Region is offline or the ReplicationLatency grows. The local table continues to attempt replicating items to the remote table until each item succeeds.

  • In the unlikely event a Region goes fully offline, when it later comes back online all pending outbound and inbound replications will be retried. No special action is required to bring the tables back in sync. The last writer wins mechanism ensures the data will eventually become consistent.

  • You can add a new Region to a DynamoDB table at any time. DynamoDB will handle the initial sync and ongoing replication. If a Region is removed, even if it's the original Region, that will only delete the table for that Region.

  • DynamoDB does not have a global endpoint. All requests are made to a regional endpoint, which then accesses the global table instance that’s local to that Region.

  • Calls to DynamoDB should not go cross-Region. The best practice is for the compute layer in one Region to directly access only the local DynamoDB endpoint for that Region. If problems are detected within a Region, whether those problems are in the DynamoDB layer or in the surrounding stack, then the end user traffic should be routed to a different compute layer hosted in a different Region. Thanks to global table replication, the different Region will already have a local copy of the same data for it to locally work with. In some circumstances the compute layer in one Region may pass the request onward to another Region’s compute layer for processing, but this should not directly access the remote DynamoDB endpoint. For more information on this particular use case see Compute-layer request routing.

Use cases

Global tables provides these common benefits:

  • Lower-latency reads. You can place a copy of the data closer to the end user to reduce network latency during reads. The cache is kept as fresh as the ReplicationLatency value.

  • Lower-latency writes. You can write to a nearby region to reduce network latency and the time taken to achieve the write. The write traffic must be carefully routed to ensure no conflicts. Techniques for routing are discussed in more detail in Request routing with global tables.

  • Increased resiliency and disaster recovery. You can evacuate a Region (move away some or all requests going to that Region) should the Region have degraded performance or a full outage, with a Recovery Point Objective (RPO) and Recovery Time Objective (RTO) measured in seconds. Using global tables also increases the DynamoDB SLA from 99.99% to 99.999%.

  • Seamless Region migration. You can add a new Region and then delete the old Region to migrate a deployment from one Region to another, all without downtime at the data layer. For example, you can use DynamoDB global tables for an order management system achieve reliably low latency processing at a high scale while also maintaining resilience to AZ and Regional failures.