Understanding Web Utilization Metric

Let’s dive into what the utilization metric is, how we track it, and how to configure your app to autoscale using it.

What is Utilization?

Utilization is a percentage representation of active processes handling web requests in your app. The closer the number of processes handling requests is to the total of processes available across all your instances/dynos, the higher your utilization will be.

Nearing 100% utilization means almost all processes are actively handling requests, which also means the app has very little room for handling traffic spikes without bringing in more instances / processes.

👀 Note

Utilization is a metric available only for web processes, not for workers. For them we recommend using queue time. (or queue depth as an alternative.)

Who is Utilization for?

Utilization optimizes for quality of service, not cost. Some applications may have huge traffic spikes at certain times, and want to avoid slowdowns or timeouts. Autoscaling based on request queue time might seem too reactive in those circumstances. Utilization will keep enough instances running to accommodate a configurable threshold, preventing the app from being near that 100%, allowing to have extra processes available all the time to handle potential traffic spikes, and try to autoscale more proactively when such spikes happen.

You should never start with utilization as your autoscaling metric, our default recommendation is request queue time for all apps. Once you have a good understanding of your app behavior and autoscaling needs, you may try adding (or switching to) utilization and adjusting the thresholds to see how it works with your app.

👀 Note

Read more about proactive vs reactive autoscaling and how utilization works in our blog.

Utilization by Example

Consider that a single instance/dyno has 4 processes: if 2 have active requests, it’s overall utilization is 50%.

Let’s expand this example to an app running 3 instances/dynos, each configured with the same 4 processes. At one point in time, we see the following:

Instance 1: 3 processes handling requests, 75% utilization
Instance 2: 1 process handling requests,   25% utilization
Instance 3: 2 processes handling requests, 50% utilization

What would be the overall utilization in this scenario? The same 50% as before. We have more instances (and thus, more processes: 3 x 4 = 12 total), but on average they’re about the same utilization.

And if we eliminated the last instance, while still keeping the same number of processes handling requests? It’d result in 75% utilization:

Instance 1: 3 processes handling requests, 75% utilization
Instance 2: 3 processes handling requests, 75% utilization
# or
Instance 1: 4 processes handling requests, 100% utilization
Instance 2: 2 processes handling requests, 50% utilization

Once we know that the application has around 50% utilization almost all the time like the first examples above, we could configure it’s autoscaling thresholds to be somewhere between 40-60%, 40-70%, or 45-55%, or increase it to 60-80% like in the last example, depending on how aggressive we want to be with autoscaling, and/or how much extra room we want for handling spikes.

Note that this does not factor in threads, in threaded web servers like Puma. We’re still exploring whether to factor those in, and how to do it in a meaningful and non-confusing way.

How is Utilization Tracked?

Our request middleware keeps track of active requests for each running web process, to determine which processes are active or idle at any given point in time.

Those metrics are collected & reported to Judoscale at certain intervals (together with request queue time), which allows us to aggregate an average of your overall web process utilization, used for autoscaling.

How to Setup Utilization Metrics

The get started, your app needs to collect and report the new metric to be able to enable and view it in Judoscale.

How to Start Collecting Utilization Metrics?

Utilization metrics collection is available with all the latest Judoscale packages, make sure you install the latest version on your app:

judoscale-ruby: version 1.12.0 or greater;
judoscale-python: version 1.10.0 or greater;
judoscale-node: judoscale-fastify version 2.3.0 or greater, judoscale-express version 2.2.0 or greater;

Once your app is deployed with the latest package, it will start collecting & reporting those metrics, and you should be ready to enable Utilization.

How to Enable Utilization for Autoscaling?

At this point you are still autoscaling your web process based on Queue Time. To enable autoscaling with Utilization and view metrics, use the Setup button near the target range settings:

Clicking it will launch the Judoscale Setup modal, showing the initial setup steps for reviewing (in case you need a refresher, you can always revisit it). Clicking Finished and Deployed will take you to a new screen that explains about Utilization and Queue Time:

You can choose to autoscale with either one of Queue Time or Utilization, or autoscale using both. We recommend enabling both initially, so you can get a better understanding of how they may affect each other, and how your own app’s web utilization behaves, to help adjusting the utilization thresholds for your needs in the long run.

Check both Queue Time and Utilization, and click Confirm Autoscale Metrics, then Done. You’ll notice that we now show both Queue Time and Utilization charts, as well as individual thresholds for configuring each.

You can adjust your target utilization thresholds as needed.

👀 Note

When autoscaling is enabled with multiple metrics, we’ll upscale when either one of them reaches or exceeds the configured threshold, and downscale when both settle below the threshold.

You’re all set! From now on, just keep an eye on things, adjust the thresholds as needed, and let us know if you run into any issue or have any feedback. Remember you can always jump back into the Setup to enable/disable Queue Time or Utilization metrics used for autoscaling.