Scheduling tasks in the cloud with EC2 APIs

This post is sort of an addendum to our live mapping project, but it should also be of use to anyone looking to run an arbitrary script on a recurring schedule. Originally, we set up a 24/7 instance on Amazon's Elastic Compute Cloud that ran a daily cron job. This works, but its a bit wasteful because we're paying for 24 hours of cloud even though we're only actually using it for maybe 5 minutes a day.

Fortunately, Amazon provides a schmorgesborg of command line interface (CLI) tools that allow us to manage our cloud instances more efficiently. Specifically, we want to schedule an instance to spin up only once a day, execute our script, then shut back down. To accomplish this, we will want three CLI tools:

If you're on a Mac, it's way easier to get these with homebrew than by downloading from Amazon's website:

brew install ec2-ami-tools # For creating an AMI from an existing machine
brew install ec2-api-tools # For registering and launching instances
brew install aws-as # For creating auto scaling groups/defining schedules

As an extra Homebrew bonus, running brew info ec2-ami-tools, brew info ec2-api-tools, and brew info aws-as will now tell us exactly what we need to do to get our authentication and environment variables all set up. First we are told to download the necessary .pem files from the Amazon console and place them into a new hidden directory of our home directory .ec2. Then we tell our command line where everything lives now by inserting the following lines into our .bashrc:

export EC2_PRIVATE_KEY="$(/bin/ls "$HOME"/.ec2/pk-*.pem | /usr/bin/head -1)"
export EC2_CERT="$(/bin/ls "$HOME"/.ec2/cert-*.pem | /usr/bin/head -1)"
export EC2_HOME="/usr/local/Cellar/ec2-api-tools/1.6.12.0/libexec"
export EC2_AMITOOL_HOME="/usr/local/Cellar/ec2-ami-tools/1.4.0.9/libexec"
export EC2_REGION="us-west-2" export EC2_ZONE=${EC2_REGION}a
export EC2_URL=https://$EC2_REGION.ec2.amazonaws.com
export AWS_AUTO_SCALING_URL=https://autoscaling.$EC2_REGION.amazonaws.com

Its pretty simple, but if you have any trouble with this part, refer to the official Amazon documentation for setting up the command line.

Because these environment variables are recognized out of the box by the CLI tools, we won't need to point to our authentication keys or specify a region every time we make an API call and our next commands will be much more succinct. Note that every EC2 instance is physically located at one of several regions; we are using us-west-2 because it happens to be where I spun up the existing instance that currently holds our "update.py" script, but any of them would probably work just fine for the simple job at hand.

Code Region
ap-northeast-1 Asia Pacific (Tokyo) Region
ap-southeast-1 Asia Pacific (Singapore) Region
ap-southeast-2 Asia Pacific (Sydney) Region
eu-west-1 EU (Ireland) Region
sa-east-1 South America (Sao Paulo) Region
us-east-1 US East (Northern Virginia) Region
us-west-1 US West (Northern California) Region
us-west-2 US West (Oregon) Region

So, first things first. We can't just spin up an off-the-rack EC2 instance every day, because we'll run into the same problem that I originally had with my web host: the Python modules that we need won't be installed. We could write a script that would install `pip` plus all of the requisite Python modules and run it first thing after we launch the instance, but there's a better way:

ec2-create-image i-8918e1be -n "Map Update Image"

This command from `ec2-ami-tools` creates an "Amazon Machine Image" of the instance that we previously had running and names it "Map Update Image". A new image ID will now print to your console, `ami-fcdfb9cc` in my case. This is tantamount to cloning the instance, because we can now reference the new image ID when we spin up new instances and all of our modules, scripts, etc. will be there waiting for us. Note that I removed the instance's `cron` job before creating the AMI, because we'll now be handling the task scheduling from outside the instance, via autoscaling.

Next let's write a shell script that will execute our Python map-updating script, shoot us a diagnostic email, then shut down the instance that its running on. The idea here is that once a day we're going to spin up an instance using our shiny new AMI and immediately run this new script (let's call it "update.sh") that will do its business and then promptly commit seppuku and stop charging us money. Eric Hammond has created a great template on his blog which I've modified below. Note the execution of our [familiar]() "update.py" script highlighted on line 4, and the apoptosis command on line 46:

#!/bin/bash -x exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1

/usr/bin/python /home/ubuntu/update.py # Run the script

EMAIL=spencer.g.boucher@gmail.com

# Upgrade and install Postfix so we can send a sample email export
DEBIAN_FRONTEND=noninteractive apt-get update && apt-get upgrade -y && apt-get install -y postfix

# Get some information about the running instance
instance_id=$(wget -qO- instance-data/latest/meta-data/instance-id)
public_ip=$(wget -qO- instance-data/latest/meta-data/public-ipv4)
zone=$(wget -qO- instance-data/latest/meta-data/placement/availability-zone)
region=$(expr match $zone '\(.*\).')
uptime=$(uptime)

# Send status email
/usr/sbin/sendmail -oi -t -f $EMAIL <<EOM
From: $EMAIL
To: $EMAIL
Subject: Results of EC2 scheduled script

This email message was generated on the following EC2 instance:

  instance id: $instance_id region: $region public ip: $public_ip uptime: $uptime

If the instance is still running, you can monitor the output of this job using a command like:

  ssh ubuntu@$public_ip tail -1000f /var/log/user-data.log

  ec2-describe-instances --region $region $instance_id

EOM

# Give the script and email some time to do their thing
sleep 600 # 10 minutes

# This will stop the EBS boot instance, stopping the hourly charges.
# Have Auto Scaling terminate it, stopping the storage charges.

shutdown -h now

exit 0

Note that the user data script that we pass to the launch configuration executes with root permissions, not as the user "ubuntu" that you would typically log in as via `ssh`. Its probably best to be as explicit as possible when specifying path names in the cloud, the tilde operator might turn around and bite you.

Now we need to create launch configuration that will basically do all the button-pushing that we would normally be doing at the AWS console GUI.

Here we specify:

  • "Micro" as our instance type.
  • Our shell script "update.sh" from step 2 as the "user-data-file". User data files are passed into the instance and executed immediately when supplied in the launch configuration. They must be less than 16kb as I suppose they are stored on some ancillary server somewhere.
  • The AMI image that we cloned in step 1 from the instance that included our Python modules.
  • The name of the launch config; let's call it "map-update-launch-config".
as-create-launch-config \
    --instance-type t1.micro \
    --user-data-file ~/Desktop/update.sh \
    --image-id ami-fcdfb9cc \
    --launch-config "map-update-launch-config"
as-describe-launch-configs --headers

Note that the second line provides a list of all the launch configurations that have been created.

We must also create an auto scaling group. These are typically used as a sort of container to which we can add/remove instances on a schedule or in response to heavy traffic, but we can also use it to schedule a single instance to flick on and off. We need to tell it:

  • A name to assign the scaling group ("map-update-scale-group").
  • The name of the launch configuration we created in step 3 ("map-update-launch-config").
  • Which availability zone we want to use (basically irrelevant; we set our environment variable EC2_ZONE to "a" earlier). ec2-describe-available-zones provides a list of the available zones
  • A minimum and maximum number of instances in the group. We'll initialize these to zero.
as-create-auto-scaling-group \
    --auto-scaling-group "map-update-scale-group" \
    --launch-configuration "map-update-launch-config" \
    --availability-zones "$EC2_ZONE" \
    --min-size 0 \
    --max-size 0
as-suspend-processes "map-update-scale-group" --processes ReplaceUnhealthy
as-describe-auto-scaling-groups --headers

In the second line, we are using as-suspend-processes to prevent the instance's default behavior which is to attempt to restart after it is shut down. The third line provides a list of all the auto scaling groups that have been created.

Last but not least, we are ready to assign a schedule to our auto scaling group. Here we create two: one to start the instance and one to terminate the instance. Astute readers will recall that "update.sh" already stops the instance so that we aren't paying to have it running, but we also need to completely terminate the instance so that we aren't paying to store information about it. Each schedule requires:

  • A name (~"map-update-start"~ & ~"map-update-stop"~).
  • The name of the auto scaling group we created in step 4 ("map-update-scale-group").
  • How we want to scale. By setting both min-size and max-size to 1, we are effectively turning on one instance. We later effectively turn that instance back off by setting both to 0.
  • A "recurrence," ie when to occur. This flag uses the same syntax that cron does. Here we set the instance to launch at midnight UTC (0 0 * * *), and terminate 15 minutes later (15 0 * * *). Recall that our script already stops the instance 10 minutes after execution, so 15 minutes is playing it safe.
as-put-scheduled-update-group-action \
    --name "map-update-start" \
    --auto-scaling-group "map-update-scale-group" \
    --min-size 1 \
    --max-size 1 \
    --recurrence "0 0 * * *"
as-put-scheduled-update-group-action \
    --name "map-update-stop" \
    --auto-scaling-group "map-update-scale-group" \
    --min-size 0 \
    --max-size 0 \
    --recurrence "15 0 * * *"
as-describe-scheduled-actions --headers

As before, the third line provides a list of the actions that have been scheduled.

And thats it! We are now only paying for 10 or 15 minutes of cloud per day, as opposed to 1,440 of them. To review the timeline we have created in this example: our auto scaling group boots up an instance up at midnight UTC that immediately executes "update.sh". This automatically executes "update.py" and shoots us a diagnostic email. It then waits 10 minutes to make sure everything has time to run, before stopping the instance. 5 minutes after that the auto scaling group then completely terminates the instance.

Other great resources: