Maximizing Performance with Judoscale: Setting Sensitivities

Jon Sully headshot

Jon Sully

@jon-sully

Alright, let’s dive into this third chapter of Maximizing Performance with Judoscale: Setting Sensitivities. And that title works both ways! We’re both setting (the verb) the sensitivities for how Judoscale scales your application, but that is itself the setting (noun) sensitivities 😉. Let’s make some jumps, talk about some frequencies, and pause for a few delays…

Prepare your snorkels! 🤿


Maximizing Performance with Judoscale is a series that covers all of Judoscale’s features and options. Jump to other posts here:

  1. Target Queue Time Range
  2. Scheduling your Scaling
  3. Setting Sensitivities (this page!)

👀 Note

Just FYI, we’re going to use the term “dyno” or “dynos” in most places here; this is Heroku’s word to represent an application container instance. Render calls these “Services” while Amazon ECS calls them “Tasks” wrapped up under a “Service”. All of the theory in this article is just the same between all three, “dyno” just remains our default phrasing.

Sensitivity

This article is all about the last group of settings available in Judoscale’s control panel UI: “Upscale Jumps”, “Upscale Frequency”, and “Downscale Delay”, and how this powerful trio works together with your Target Queue Time Range and Scaling Schedule to give you the most flexible and tailored autoscaling possible.

image-20240515082632080

Upscale Jumps

Simply put, this setting controls how many dynos your application will scale up at-a-time. For example, when set to 4, Judoscale will add 4 dynos to your app’s current scale the moment it detects a queue-time breach, rather than just 1 (the default):

image-20240525072752160

For certain types of traffic (and other) spikes, setting this value higher can help immediately quash issues and slowness! There are actually a few key types of situations and applications that would benefit this type of adjustment.

For web dynos, jumping by more than 1 dyno at a time is extremely helpful if you know ahead of time that your application’s traffic pattern is inherently spiky. That is, if you work in some kind of domain where traffic will, as an expected part of the business, go from low to very-high very quickly, often. This is slightly different than a schedule, as we covered in Maximizing Performance Part 2. We can think of schedules as a pre-set jump to a particular number of dynos at a given, known time. Upscale jumping is instead better when there’s not a known time. If instead you know you have spiky traffic as part of your business but it’s not just once a day at a particular time, increasing your upscale jumps may be the answer.

For worker dynos, turning up your upscale jumps can be super helpful if you do a lot of batch work. That is, any time you generate a large number of jobs that are suddenly waiting to run:

image-20240525072640034

While a huge batch of background jobs getting created will follow the normal autoscaling process and work its way up to more dynos over time, we can crunch through all those jobs faster if we leverage upscale jumps. The idea is simple: when a bunch of jobs are created and autoscaling kicks in, it’ll spin up several new dynos instead of just one-at-a-time to crush all those jobs ASAP:

image-20240525072731130

👀 Note

When it comes to upscale jumps with background jobs, remember that Judoscale maintains settings for each of your processes separately. So you can easily turn up you upscale jump count for just your batch-worker process, if you have one, while keeping your other workers more normalized! In fact, if you do a good amount of batch work but don’t have a process dedicated to processing batch jobs, it may be worth splitting your batch work out to a separate process specific for Judoscale to be able to upscale with big jumps!

Finally, the last condition we want to call out where increasing your upscale jump count can be beneficial is any time you’re running lots of dynos. This signals a setup where your dynos are small relative to your traffic. That could be running Std-1X’s with 3,000 (super-efficient-) Requests per Second or Perf-L’s with 1,000 (not-super-efficient-) Requests per Second. Every app varies. But if you’re running >25 dynos on any given process (web or workers), you may want to consider increasing your upscale jump count. This is less about spikiness and more about the impact that scaling up by a single dyno may have. Simple math here: when your process runs 10 dynos, adding 1 more is a 10% capacity increase. When you run 25 dynos, it’s only a 4% capacity increase. Some of our customers run 50-100 dynos on a single process! This is easily understood visually — look at the relative difference in capacity increase between these two high-count processes!

image-20240525074157154

So if you run >25 dynos, you’ll likely want to play with your upscale jump count. It’ll make your upscaling moments more impactful, relative to your scale level and traffic (but there is no concrete correct answer here!).

Upscale Frequency

A close sibling to upscale jump count is upscale frequency! Where the former is “how many to upscale by”, upscale frequency is “how fast should I upscale again?” And the ‘again’ there is key, since upscale frequency doesn’t matter the first time your application needs to upscale! It only matters if that first upscale didn’t resolve the capacity issue and your queue time remains high. An easy way we like to think about this is the width of our dyno ‘boxes’ in our charts — an idea we briefly talked about in our Ultimate Guide to Autoscaling on Heroku:

image-20240525081324330

In this view, the width of each column of boxes (dynos) shows how fast Judoscale can scale up your process. With the wider boxes on the left side of the chart, regardless of what happens with traffic and queue time, Judoscale must wait until the end of the box before scaling again. If we zoom in, it looks like this:

image-20240525081537465

But the key feature here is the difference in the width once we bring down our upscale frequency time (so that upscales can happen more frequently):

image-20240525081747266

All of that to say, using a faster upscale frequency can make your application more reactive to capacity needs that aren’t resolved by the first upscale. But, as with so many things in life, there’s a catch!

The primary reason we default upscale frequency to 120 seconds for new applications is that we need to wait to see if the first upscale is already resolving the capacity issue. Remember that it takes a little while for a new Heroku dyno to spin up and start adding functional capacity to your application. Additionally, once that dyno is up and servicing requests/jobs, queue time still might be high but decreasing. That means the additional capacity is solving the problem, even if your queue time is still above your upscale threshold.

image-20240701213043804

At that point, there’s no need to scale up again! That first upscale added enough capacity to fix the issue and queue time will soon be zero (or close to it) again — we just need to wait for the additional capacity to help burn through the backlog that’d built up. That’s the essence of the upscale frequency delay! Waiting to see if the additional capacity that was already added solves the problem and alleviates the queue-time backlog, or if even more capacity needs to be added.

image-20240701213206805

But of course, you don’t want to wait too long! The goal of autoscaling is to quickly alleviate capacity issues that may be slowing down your users’ requests! So we don’t want to go too high on our upscale frequency.

On the other hand, if you set your upscale delay too low, your process will continue to scale up and up before even knowing if the early upscales helped! That means added cost over time and lots of subsequent downscales after the queue-time has been alleviated. This is a type of ping-pong scaling we want to avoid!

We’ve found that an upscale frequency value between 90 and 120 works well for most apps. As with all Judoscale settings, we encourage all of our customers to experiment and see how their apps behave with various setting values. As you make changes and watch the scaling and queue time charts over time, you’ll find a sweet-spot!

👀 Note

We discussed ping-pong autoscaling in Maximizing Performance Part 1 but the keen-eyed among you will note that the discussion in part 1 focused on ping-pong autoscaling as a function of queue-time parameters. In contrast, here we discuss ping-pong autoscaling as a function of upscale frequency (and below, downscale delay)! You can ping-pong in multiple different ways?

Actually, these three separate controls are all working together! The target queue time range discussed in Part 1 essentially determines if the ping-ponging will happen, whereas the upscale frequency and downscale delay will control how fast the ‘ping’ and ‘pong’ (respectively) occur! Neat, right?

Of course, there’s the loop-back effect of the ‘ping’ and ‘pong’: as you change your capacity, that will likely impact your queue time at that moment. That new queue time may now be above your upscale threshold or below your downscale threshold, which will then cause another ‘ping’ or ‘pong’. Thus the endless ping-ponging loop continues!

Downscale Delay

Finally, we have downscale delay. Similar to our upscale delay, this is a purely time-based gate… but it too has its own interesting implications. For starters, we can note that Judoscale’s default downscale delay is significantly longer than the upscale side of things! In general, we want to upscale quickly to resolve capacity problems but downscale cautiously as to not cause any capacity problems. This can be a tricky balancing act!

The core premise of automatic downscaling is to cut capacity costs when we’re over-provisioned at any given time. This is ultimately how we save money and retain application stability in the long-term! But how do we actually know that we’re over-provisioned? How do we know that it’s okay to down-scale at all?!

The answer is simply by giving the system time to settle into its current state before changing things. And, since it’s much less risky to stay at a higher capacity for longer than necessary than it is to stay at a lower capacity for longer than necessary (that would mean failing requests!), we opted to make the downscaling side of things the longer delay. By waiting several minutes (ten, by default) and ensuring that queue time never breaches the downscale threshold (which should be very low), we’re confident that the application can be downscaled without issue. This exact logic is actually why it’s so very important to make sure your downscale threshold is accurately tuned! Any jitter in your queue time, even while remaining generally very low, is an indicator that you’re at the right capacity level and shouldn’t change. Having nearly no jitter at a given capacity (for several minutes) is the indicator that we can safely down-scale.

image-20240701213634761

vs…

image-20240701213821908

But, of course, defaults aren’t perfect for everyone!

So when is the right time to change the downscale delay? There are a few cases we often see that benefit from a shorter, or longer, delay.

The first is applications running a lot of dynos on a given process. Again, generally >=25, but the rule isn’t hard-set. Essentially any situation where a single dyno scale-down would only change your capacity a few percentage-points. In those cases, you may want to decrease your downscale delay. Especially if you’ve increased your upscale jump count, this helps your app fluctuate around its large cohort of dynos faster and remain more reactive, both on the up-side and down-side, to your traffic patterns.

The second occasion you may want to adjust your downscale delay is the opposite — when a single dyno scaling down would have a major impact on your overall capacity. Again, that could be because you’re only running a handful of dynos at a time, or it could also be that you’re running several dynos, but they’re each handling massive amounts of traffic. In both cases, you want to be extra cautious when scaling down to ensure that your current traffic load is safely handleable by your current dyno count minus one. The best way to ensure that safety is to wait longer before downscaling. It’s always safer to temporarily run more dynos than you need! You just want to make sure it is indeed temporary, or you won’t end up saving all that much with autoscaling!

Setting Your Settings!

So there we have it. Upscale jump count, upscale frequency, and downscale delay. The three knobs to tune Judoscale’s autoscaling algorithm to your specific app’s traffic and time needs! Just remember, small steps / tweaks and time spent monitoring those tweaks before adjusting further is the path to happy days!

Most of Judoscale’s default settings will work for almost all apps, but getting things dialed in for your particular app will inevitably yield even better autoscaling performance and happier customers. 😁

Also, keep ping-pong scaling in the back of your mind! The more reactive you setup your upscaling and downscaling, the more possible it’ll be to end up with ping-pong scaling! Just try to be careful to adjust them both within reason. We know you’ll find a happy medium for your application!

Bonus: Dyno Sniper

We mentioned a few months back in our monthly newsletter, and also in our recent blog post, How to Fix Heroku’s Noisy Neighbors, that Dyno Sniper is the answer to noisy neighbors. We also mentioned that it’s currently in an opt-in status (let us know if you want to give it a try)… but since we’re talking about the bottom of the Judoscale settings UI, I didn’t want to leave it out! We’ll write another post soon about the impacts of noisy neighbors, but suffice it to say that, in our own experience, a day without dyno sniping enabled can average twice as many dynos as a day with dyno sniping enabled. One simple checkbox… but so much power! ⚡️

image-20240701211752168