Understanding How ECS Autoscaling Works

If you’ve been using Amazon ECS long enough, you’ve probably at least heard of autoscaling. Horizontally scaling services, ECS or otherwise, is a huge part of building reliable web apps. When traffic spikes, you need more containers, and when it quiets down, you certainly don’t want to be paying for more than you need.

In this article, we’ll jump into how native ECS autoscaling works and examine its limitations. I’ll assume you already have an app running on ECS. If not, check out this guide on how to set up your ECS cluster with Terraform. Now let’s get into it!

Let’s look into native ECS autoscaling

The native ECS autoscaling functionality is pretty good. It does what it says it does, increasing and decreasing capacity according to a few compute metrics. It leans on CloudWatch metrics to decide when your services need a boost or a trim.

How ECS uses CloudWatch metrics

ECS automatically publishes a few service utilization metrics to AWS CloudWatch, which is how the autoscaler gets the metrics it needs to make scaling decisions. Whether the autoscaler is a feature of EC2 or a separate service depends on whether we’re dealing with autoscaling groups or service-level scaling, but we’ll get into that later.

ECS periodically (every 60 seconds) publishes metrics to CloudWatch, a separate AWS service. The ECS autoscaler (either the EC2 autoscaler or application autoscaling) then evaluates those metrics and, if needed, scales a service up or down.

CloudWatch gathers two performance metrics from ECS automatically:

Memory (via the ECSServiceAverageMemoryUtilization metric)
CPU (via the ECSServiceAverageCPUUtilization metric)

Sounds straightforward, right? Well, only sometimes. There are two problems with how ECS uses CloudWatch that affect the quality of the native ECS autoscaling feature.

First, they’re not the best metrics on which to make scaling decisions. Memory and CPU reflect hardware utilization, but won’t always indicate when a web service is struggling or a background job queue is falling behind. In fact, CPU and memory can be totally normal while a background job queue has jobs waiting for orders of magnitude longer than they should be.

Second, ECS only sends these metrics to CloudWatch once per minute. If your traffic spikes one second after the CloudWatch metrics have been sent, your service could be under load without autoscaling for the next 59 seconds before CloudWatch even has updated memory and CPU metrics. This, of course, can lead to your autoscaling being slow to kick in (which is a problem whether you’re scaling up or down).

The first of these is somewhat addressable. You can actually use custom CloudWatch metrics for your autoscaling, so hypothetically you can report a better utilization metric to CloudWatch and autoscale from that. AWS has published an interesting blog post demonstrating this, and you’ll quickly note that it’s not a trivial endeavor. We’ll get into another option, using a third-party autoscaler, a bit later. Before that, let’s look at what the native ECS autoscaler has to offer.

Service-level scaling

The most core part of ECS autoscaling is service-level scaling. Whether you’re using EC2 or Fargate, service-level scaling automatically manages how many tasks you have in an ECS service. It’s a simple way to get more containers running.

An ECS cluster contains one or more ECS services, which contain one or more ECS tasks, which contain one or more containers that make up your application.

When CloudWatch is notified that CPU or memory is above the threshold you’ve set, it will add ECS tasks to the ECS service. The service will share the traffic between the tasks, helping it meet demand.

Setting up service-level autoscaling in the AWS console isn’t too hard. If you already have an ECS service, you can update/edit it. One of the options is for service autoscaling, where you can enter:

The minimum number of tasks
The maximum number of tasks
The metric you’ll scale on
The value of the metric that triggers a scale
Cooldown periods

Service-level scaling horizontally scales individual ECS tasks. If you’re using ECS Fargate, this is your only option. AWS automatically scales the underlying infrastructure, abstracting it away from you. If you’re using EC2, you may want to also scale the number of EC2 instances that host the ECS tasks. Let’s talk about that next.

EC2 autoscaling groups

Okay, so your tasks are scaling up—but wait, do you have enough infrastructure for those tasks to run on? If you’re using EC2 (and not Fargate), you’ll likely want to scale the EC2 instances so that there are enough resources to host a growing amount of ECS tasks.

👀 Note

EC2 != Fargate

AWS ECS can be used with EC2 or Fargate, which are similar but not the same. EC2 requires that you manage instances, or virtual machines to run your code, even if it’s inside a container. Fargate abstracts that away if you only need something to run your containers.

Autoscaling groups don’t apply to Fargate because the underlying infrastructure is managed by AWS.

EC2 Autoscaling groups are all about managing EC2 instance capacity. If you’ve specified that you want to scale up tasks but then run out of EC2 instances, you’ll be in trouble.

Autoscaling groups watch the capacity of your EC2 instances, making sure there’s enough underlying compute power to accommodate the scaling needs of your ECS tasks.

If you’re not using Fargate, EC2 autoscaling groups are an important part of autoscaling. If you spin up more ECS tasks without the compute to support them, you’ll still have performance problems!

The service-level scaling and autoscaling groups are independent of each other - You can do one without the other. If you do service-level autoscaling, you’re only adding tasks to your ECS service. On EC2, you should also scale the ECS instance count to keep up.

Autoscaling groups vs service level only

Setting up autoscaling groups on AWS is more involved than service-level autoscaling, so you’ll probably be best served by using the AWS CLI. Amazon has great documentation on setting this up using the CLI, and I’ll point you to there as the instructions are likely to change over time.

What if native ECS autoscaling isn’t working for you?

We’ve already mentioned that the native ECS autoscaler relies on metrics that aren’t ideal. The default ECS scaling mechanisms rely on metrics like CPU and memory utilization, which can be difficult to tune if they don’t correlate perfectly with traffic spikes. If you compound this with CloudWatch metrics being reported at one-minute intervals, your autoscaler can be slow to react. By the time ECS sees the increase, users may already be facing performance issues.

Lucky for us, there’s a better way!

Scaling based on request queue time and job queue latency

Using CPU or even memory metrics to decide when an application needs increased capacity is not ideal. These metrics tell you some characteristics about your hardware, but aren’t always direct measures of how load is affecting users. They’re also slow, and using queue time (for requests and background jobs!) can help you scale faster.

Request queue time measures the time between a request hitting a load balancer and getting processed by the application. If you want to know more about scaling based on queue time, here’s a deep dive into how it works.

Using queue time as a metric gives you a direct indication of how loaded your application is, which can allow you to scale much faster and with greater precision. This approach can drastically reduce response times under load since the scaling decisions are directly tied to the actual queue waiting times, rather than just indirect resource consumption metrics.

Job queue latency reflects how long background jobs wait in a queue before being processed. It’s a direct measure of how backlogged a background job worker is, which makes it a great metric to scale background workers.

The native ECS autoscaler won’t scale based on request or job queue time, so you’ll have to integrate a third-party autoscaler like Judoscale. Judoscale has quick integrations for ECS, so it’s not harder to set up than the native autoscaler.

Not only does Judoscale use queue time, but it checks that queue time every ten seconds (as opposed to every 60 seconds). Checking more often lets you scale faster, which increases the power of autoscaling.

Autoscaling faster with more frequent checks

Judoscale can autoscale both web services and background jobs, giving you the chance to scale applications up and down MUCH faster than the native ECS autoscaler, helping your app stay ahead of demand. If the ECS autoscaler is too sluggish to meet your needs, consider whether scaling based on queue time and checking that metric more often would help. If so, try out Judoscale!