Amazon Elastic MapReduce
Developer Guide (API Version 2009-11-30)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

Choosing What to Launch as Spot Instances

When you launch a job flow in Amazon Elastic MapReduce (Amazon EMR), you can choose to launch any or all of the instance groups (master, core, and task) as Spot Instances. Because each type of instance group plays a different role in the job flow, the implications of launching each instance group as Spot Instances vary.

When you launch an instance group either as on-demand or as Spot Instances, you cannot change its classification while the job flow is running. In order to change an on-demand instance group to Spot Instances or vice versa, you must terminate the job flow and launch a new one.

The following table shows launch configurations for using Spot Instances in various applications.

ProjectMaster Instance GroupCore Instance GroupTask Instance Group
Long-running job flowson-demandon-demandspot
Cost-driven workloadsspotspotspot
Data-critical workloadson-demandon-demandspot
Application testingspotspotspot

Master Instance Group as Spot Instances

The master node controls and directs the job flow. When it terminates, the job flow ends, so you should only launch the master node as a Spot Instance if you are running a job flow where sudden termination is acceptable. This might be the case if you are testing a new application, have a job flow that periodically persists data to an external store such as Amazon S3, or are running a job flow where cost is more important than ensuring the job flow’s completion.

When you launch the master instance group as a Spot Instance, the job flow will not start until that Spot Instance request is fulfilled. This is something to take into consideration when selecting your bid price.

You can only add a Spot Instance master node when you launch the job flow. Master nodes cannot be added or removed from a running job flow.

Typically, you would only run the master node as a Spot Instance if you are running the entire job flow (all instance groups) as Spot Instances.

Core Instance Group as Spot Instances

Core nodes process data and store information using HDFS. Because termination of core nodes can result in data loss and possible termination of the job flow, you would typically only run core nodes as Spot Instances if you are either not running task nodes or running task nodes as Spot Instances.

When you launch the core instance group as Spot Instances, Amazon EMR waits until it can provision all of the requested core instances before launching the instance group. This means that if you request a core instance group with six nodes, the instance group will not launch if there are only five nodes available at or below your bid price. In this case, Amazon EMR will continue to wait until all six core nodes are available at your Spot Price until it is successful or you terminate the job flow.

You can add Spot Instance core nodes either when you launch the job flow or later to add capacity to a running job flow. You cannot remove core nodes from a running job flow.

Task Instance Group as Spot Instances

The task nodes process data but do not hold persistent data in HDFS. If they terminate because the Spot Price has risen above your bid price, no data is lost and the effect on your job flow is minimal.

When you launch the task instance group as Spot Instances, Amazon EMR will provision as many task nodes as it can at your bid price. This means that if you request a task instance group with six nodes, and only five Spot Instances are available at your bid price, Amazon EMR will launch the instance group with five nodes, adding the sixth later if it can.

Launching the task instance group as Spot Instances is a strategic way to expand the capacity of your job flow while minimizing costs. If you launch your master and core instance groups as on-demand instances, their capacity is guaranteed for the run of the job flow and you can add task instances to the instance group as needed to handle peak traffic or to speed up data processing.

You can add and remove Spot Instance task nodes from a running job flow.