Autoscaling worker dynos

Updated

Most web apps use one or more “worker” dynos for handling asynchronous job processing.

On Heroku, every app gets a “web” dyno by default. If you have dynos running for other processes listed in your Procfile, those will be considered “worker” dynos. Worker dynos do not (and cannot) handle web requests. They are used for asynchronous jobs—either on a schedule or pulled from a queue.

Judoscale can autoscale worker dynos for many languages and job backends. The current list of supported languages and frameworks is here.

Install an adapter library for your job backend

When choosing a worker dyno from the dyno selector, you will be prompted automatically to install an adapter library.

Dyno selector

Prompt to install worker adapter

The adapter installation modal will guide you through the steps needed for your language and job backend. Once you’ve installed the adapter and deployed your code, you will see job queue metrics on your worker dyno page.

Job queue time

Choose job queues to monitor

Often you don’t want to autoscale based on all of your job queues. You might have some low-priority queues where high queue time is acceptable, or you might have your queues split over multiple worker dyno processes.

This why you need to tell Judoscale which queues to monitor. In your worker dyno settings, you’ll see the “Queues” option.

Select worker queues

With multiple queues selected, Judoscale will monitor the maximum queue time of those queues.

Worker queue time

Again, this is why you might want to omit a low priority queue where high queue time is acceptable—you don’t want that queue to trigger autoscaling.

What is job queue time?

Job queue time is the time elapsed between when the job is enqueued and when it is “picked up” for processing.

Job queue time visualization

Some job backends (such as Sidekiq) call this “latency” instead of queue time, but they mean the same.

Sidekiq example

Judoscale collects queue time for all of your job queues. The queue time you see in Judoscale is the maximum queue time for all selected queues in 10-second buckets.

Queue time vs. queue depth

Queue depth is the number of jobs waiting in a queue, and it has a direct relationship with queue time—as queue depth increases, so does queue time.

Judoscale only uses queue time for autoscaling because it aligns directly with service level expectations. If users are waiting on certain jobs to complete, for example, you can use that knowledge to set a corresonding queue time threshold for autoscaling.

Queue depth doesn’t translate as easily. Is 1,000 jobs too many? 10,000? What if the entire queue can be completed in less than a second? It’s much clearer to say “we expect these jobs to be performed within 5 seconds”, and configure your autoscaling accordingly.

Note that if your jobs take a long time to run, that’s outside the scope of queue time and autoscaling (because autoscaling will not improve job execution time). You will want separate monitoring in place for job performance.