Amazon Elastic MapReduce
Developer Guide (API Version 2009-11-30)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

Lowering Costs with Spot Instances

When Amazon EC2 has unused capacity, it offers EC2 instances at a reduced cost, called the Spot Price. This price fluctuates based on availability and demand. You can purchase Spot Instances by placing a request that includes the highest bid price you are willing to pay for those instances. When the Spot Price is below your bid price, your Spot Instances are launched and you are billed the Spot Price. If the Spot Price rises above your bid price, Amazon EC2 terminates your Spot Instances.

For more information about Spot Instances, go to Using Spot Instances in the Amazon Elastic Compute Cloud User Guide.

The following video describes how Spot Instances work in Amazon Elastic MapReduce (Amazon EMR) and walks you through the process of launching a job flow on Spot Instances from the Amazon EMR console.

Additional video instruction includes:

If your workload is flexible in terms of time of completion or required capacity, Spot Instances can significantly reduce the cost of running your job flows. Workloads that are ideal for using Spot Instances include: application testing, time-insensitive workloads, and long-running job flows with fluctuations in load.

[Note]Note

Spot Instances are not recommended for job flows that are time-critical or which need guaranteed capacity. These job flows should be launched using on-demand instance groups.

When Should You Use Spot Instances?

There are several scenarios in which Spot Instances are useful for running an Amazon EMR job flow.

Long-Running Job Flows and Data Warehouses

If you are running a persistent Amazon EMR job flow, such as a data warehouse, that has a predictable variation in computational capacity, you can handle peak demand at lower cost with Spot Instances. Launch your master and core instance groups as on-demand to handle the normal capacity and launch the task instance group as Spot Instances to handle your peak load requirements.

Cost-Driven Workloads

If you are running transient job flows for which lower cost is more important than the time to completion, and losing partial work is acceptable, you can run the entire job flow (master, core, and task instance groups) as Spot Instances to benefit from the largest cost savings.

Data-Critical Workloads

If you are running a job flow for which lower cost is more important than time to completion, but losing partial work is not acceptable, launch the master and core instance groups as on-demand and supplement with a task instance group of Spot Instances. Running the master and core instance groups as on-demand ensures that your data is persisted in HDFS and that the job flow is protected from termination due to Spot market fluctuations, while providing cost savings that accrue from running the task instance group as Spot Instances.

Application Testing

When you are testing a new application in order to prepare it for launch in a production environment, you can run the entire job flow (master, core, and task instance groups) as Spot Instances to reduce your testing costs.