Troubleshooting throttling issues Using CloudWatch metrics

Throttling issues for DynamoDB tables using provisioned capacity mode

If your application exceeds your provisioned throughput capacity on a table or index, it is subject to request throttling. Throttling prevents your application from consuming too many capacity units. When DynamoDB throttles a read or write operation, it returns a ProvisionedThroughputExceededException to the caller. The application can then take appropriate action, such as waiting for a short interval before retrying the request.

This topic discusses how to troubleshoot common throttling issues and how to use CloudWatch to investigate where the issues might be coming from.

Topics

Troubleshooting throttling issues
Using CloudWatch metrics to investigate throttling issues

Troubleshooting throttling issues

For troubleshooting issues that appear to be related to throttling, an important first step is to confirm if the throttling is coming from DynamoDB or from the application.

The following are some common scenarios, and possible steps to help resolve them.

The DynamoDB table appears to have enough provisioned capacity, but requests are being throttled

This can occur when the throughput is below the average per minute, but it exceeds the amount available per second. DynamoDB only reports minute-level metrics to CloudWatch, which are calculated as the sum for one minute and the averaged. But DynamoDB itself applies rate limits per second. So if too much of that throughput occurs within a small portion of that minute, such as few seconds or less, then requests for the rest of that minute can be throttled.

For example, if we have provisioned 60 WCU on a table, then it can do 3600 write operations in one minute. But if all 3600 WCU requests hit in the same second, then the rest of that minute will be throttled.

One way to resolve this scenario can be to add some jitters and exponential back off to the API calls. For more information see this post about backoff and jitter.

Auto scaling is enabled, but tables are still being throttled

This can occur during sudden spikes in traffic. Auto scaling can be triggered when 2 data points breach the configured target utilization value within a one minute span. Therefore, auto scaling can take place because the consumed capacity is above target utilization for two consistent minutes. But if the spikes are more than one minute apart, auto scaling might not be triggered.

Similarly, a scale down event can be triggered when 15 consecutive data points are lower than the target utilization. In either case, after auto scaling is triggered an UpdateTable API operation is invoked. It can then take several minutes to update the provisioned capacity for the table or the index. During this period, any requests that exceed the previous provisioned capacity of the tables will be throttled.

In summary, auto scaling requires consecutive data points where the target utilization value is being breached to scale up a DynamoDB table. For this reason, auto scaling is not recommended as a solution for dealing with spikey workloads. Please refer to the auto scaling cost optimization documentation for more information.

A hot key may be causing throttling issues

In DynamoDB, a partition key that does not have high cardinality can result in many requests which target just a few partitions. If a resulting hot partition goes past the partition limits of 3000 RCU or 1000 WCU per second, this can result in throttling. The diagnostic tool CloudWatch Contributor Insights (CCI) can help debug this by providing CCI graphs for each table’s item access patterns. You can continuously monitor your DynamoDB tables' most frequently accessed keys and other traffic trends. For more information about CloudWatch Contributor Insights see, CloudWatch Contributor Insights for DynamoDB. For more information, see Designing partition keys to distribute your workload and Choosing the Right DynamoDB Partition Key.

Your traffic to the table is exceeding the table-level throughput quota.

The table-level read throughput and table-level write throughput quotas apply at the account level in any Region. These quotas apply for tables with both provisioned capacity mode and on-demand capacity mode. By default, the throughput quota placed on your table is 40,000 read requests units and 40,000 write requests units. If the traffic to your table exceeds this quota, then the table might be throttled. For more information on how to prevent this from happening, see Monitoring DynamoDB for operational awareness.

To resolve this issue, use the Service Quotas console to increase the table-level read or write throughput quota for your account.

Using CloudWatch metrics to investigate throttling issues

Below are some DynamoDB metrics to monitor during throttling events. Use these to help locate which operations are creating throttled requests and identify root issues.

ThrottledRequests
- One throttled request can contain multiple throttled events, so events can be more relevant to example compared to requests. For example, when you update an item in a table with GSIs, there are multiple events: a write operation to the table and a write operation to each index. Even if one or more of these events are throttled, there will only be one ThrottledRequest.
ReadThrottleEvents
- Watch for requests that exceed the provisioned RCU for a table or GSI.
WriteThrottleEvents
- Watch for requests that exceed the provisioned WCU for a table or GSI.
OnlineIndexConsumedWriteCapacity
- Pay attention to the number of WCU consumed when adding a new GSI to a table. Note that ConsumedWriteCapacityUnits for a GSI does not include the WCU consumed during index creation.
- If you've set the WCU for a GSI too low, then incoming write activity during the backfill phase might be throttled.
Provisioned Read/Write
- View how many provisioned read or write capacity units were consumed over the specified time period, for a table or a specified global secondary index.
- Note that the TableName dimension returns ProvisionedReadCapacityUnits for the table only by default. To view the number of provisioned read or write capacity units for a global secondary index, you must specify both TableName and GlobalSecondaryIndexName.
Consumed Read/Write
- View how many read or write capacity units were consumed over the specified time period.

For more information on DynamoDB CloudWatch metrics, see DynamoDB Metrics and dimensions.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Latency

Appendix