The Ultimate Guide to Scaling Sidekiq

👀 Note

Editor’s note: Adam first drafted and published this article on Sidekiq’s own Wiki after chatting with Mike Perham about the value of adding docs to Sidekiq specifying guidance around actually scaling Sidekiq once it’s running in production. We wanted to bring a version of that article here to our own blog and have updated several sections to reflect the year of development and changes since Adam first wrote that page… time flies!

Sidekiq’s architecture makes it easy to scale up to thousands of jobs per second and millions of jobs per day. Scaling Sidekiq can simply be a matter of “adding more servers”, but how do you optimize each server, how “big” do the servers need to be, and how do you know when to add more? Those are the questions this guide will answer.

Concepts and terms

Before we dive into concrete guidance and advice, let’s start with an overview of Sidekiq’s architecture and the various “levers” we have available to us. We’ll also define some terms we’ll use throughout this guide.

Concurrency - The Sidekiq setting that controls the number of threads available to a single Sidekiq process.
Swarm - A feature of Sidekiq Enterprise that supports running multiple Sidekiq processes on a single container.
Container - A container instance running one or more Sidekiq processes. You might call this a server, service, dyno, pod, etc. We’ll just call them containers.
Total concurrency - The total number of Sidekiq threads across all containers and processes.

Here’s a diagram that shows the relationship between these concepts:

Relationship between concurrency, containers, and process swarms

Sidekiq is all about queues, of course, so let’s clarify some terms here too.

Queues - You put your jobs into queues (which live in Redis), and Sidekiq processes the jobs in the queue, oldest first (FIFO). When starting a Sidekiq process, you tell it which queues to monitor and how to prioritize them.
Queue assignment - You can assign queues (or groups of queues) to specific Sidekiq processes, or you can have a single queue assignment used by all Sidekiq processes.
Queue priority - When assigning multiple queues to a process, Sidekiq has a couple fetch algorithms that dictate how it pulls jobs from those queues: strict and weighted. We’ll call those the queue priority.

Relationship between Sidekiq queue assignments and priority

And finally we have our connection pools. Yes, multiple connection pools.

Database connection pool - A pool of database connections shared by all Sidekiq threads within a process. Since a Sidekiq process is ultimately spinning up a full Rails stack, the database connection pool for a Sidekiq process is actually just the same connection pool configured for each Rails process — dictated by database.yml.
Redis connection pool - A pool of Redis connections shared by all Sidekiq threads and Sidekiq internals within a process. This is managed by redis-client and is configured automatically by Sidekiq based on your concurrency.

Diagram of connection pools used by Sidekiq

This covers our terms and concepts, and while there are quite a few, the good news is that many of them are handled for us! The rest are straightforward to configure ourselves given an ounce of understanding. Let’s dive in!

A Sidekiq starting point

These are some general recommendations that will help things run smoothly in the beginning of an app and prepare you to scale later.

The fewer queues the better. Don’t make your life harder than it needs to be. Two or three queues are plenty for a new app. We’ll talk later about when it makes sense to add more queues, but scaling will generally be more challenging the more queues you have.

Name your queues based on priority or urgency. Some teams name their queues using domain specific terms that are no help at all when it comes to planning queue priority or latency requirements. “Urgent”, “default”, and “low” are much easier to work with. You might take a step further and embrace our recommendation of latency-based queue names such as “within_30_seconds”, “within_5_minutes”, etc. This approach makes it very clear which queues have priority and when queue latency is unacceptable.

Keep your jobs as small as possible! Fan out large jobs into many small jobs. Smaller jobs are much easier to scale, but we’ll talk later about strategies to use when this isn’t possible.

Run a single Sidekiq process per container. You can add Sidekiq Swarm later, but don’t assume you’ll need it. This is one less variable to juggle when scaling. Keep it simple.

Choose a container size based on memory. If you’re working with a lot of large files, such as generating PDF’s or importing large CSV files, you’ll need more memory. If you’re not doing that, you can probably get away with 1GB or less.

Start with five threads per process (concurrency). This is just a starting point — you will need to tweak it. Many teams get too ambitious with their concurrency, saturating their CPU and slowing down all jobs. The good news is five is the Sidekiq default, so if you don’t do anything, you’ll have a good starting point.

These guidelines will get you started, but what about optimizing your configuration and scaling beyond the basics? That’s what we’ll tackle in the following sections.

Find your concurrency sweet spot

Depending on your container CPU and the type of work your jobs are doing (mainly the percentage of time spent in I/O), you’ll probably need to tweak your concurrency setting. As a very simple rule, you want to CPU usage to be high but not 100% when all threads are in use.

If CPU is hitting 100%, you need to reduce your concurrency. If your CPU usage never goes above 50% at max throughput, you probably want to increase your concurrency.

There are several strategies for changing your Sidekiq concurrency value. The simplest of which is to change your RAILS_MAX_THREADS value, as Sidekiq will ‘listen to’ and respect this value. You should take caution here, though, if you’re changing the value globally in a way that will also change your web Rails processes, as they too will ‘listen to’ a changed value. Accidentally changing your web threads count when you intended to only change your workers thread count can make for a bad afternoon!

That said, since Rails’ database pool value is, by default, configured to adhere to RAILS_MAX_THREADS, we nonetheless recommend changing your Sidekiq concurrency default by changing RAILS_MAX_THREADS… just ensure that the value is only changed for the Sidekiq process(es), not your Rails web processes!

The easiest way to do this is illustrated here — using some fancy environment variable overriding!

Autoscale your Sidekiq containers

The simple truth when it comes to “how many containers?” is that you shouldn’t waste your energy calculating how many containers you need to run. Sidekiq loads are highly variable by nature, and you don’t want to pay for a cluster of 10 containers when no jobs are enqueued. Autoscaling solves this problem by automatically scaling your containers up and down according to load… but what metric should you use for autoscaling?

Sidekiq workloads are more often I/O-bound than CPU-bound—in other words, you can easily encounter a queue backlog even when CPU utilization is low. This makes CPU an inappropriate and frustrating metric to use for autoscaling, even though it’s the most commonly-used metric used by tools like AWS CloudWatch.

Instead, you should autoscale your Sidekiq containers using queue latency. Your business requirements will have an implicit (or hopefully explicit) expectation how long each job can reasonably wait before being processed. This expectation makes queue latency the perfect metric for autoscaling. (And if you’re using latency-based queue names, you’ve already identified those latency expectations!)

Queue-time based autoscaling is the reason we built Judoscale, and we believe it’s the best autoscaling system for Sidekiq. Judoscale is a team of two-and-a-half Rails developers (including a Rails core team member!) and we’ve got extensive Sidekiq experience. Judoscale itself runs on Sidekiq! Join over a thousand other teams running Judoscale in production — it’s free for most applications!

Assign queues to dedicated processes

Sometimes it makes sense to add a queue for a specific job or a particular “shape” of job. Some examples:

If you’re unable to break down large jobs into smaller jobs, you might not want those long-running jobs to become a bottleneck in your queue.
If you have some jobs that use lots of memory, you might need a larger container for those jobs.
If you have jobs that can’t be processed in parallel, you might need those jobs on a dedicated queue that run single-threaded.

These aren’t ideal scenarios, but they’re real-world scenarios that many apps will encounter. It’s best to treat these queues as the anomalies they are and dedicate them to their own Sidekiq process. This way your long-running jobs will only block other long-running jobs, and your memory-hungry jobs won’t require all of your jobs to run on larger, higher-priced containers.

This isolation makes scaling easier because you’re scaling your “special” queues separately from your “normal” queues. Here’s what it might look like in a Procfile, using RAILS_MAX_THREADS to force the memory-hungry jobs to be processed single-threaded (reducing memory bloat):

web: bundle exec rails s
worker: bundle exec sidekiq -q within_30_sec -q within_5_min -q within_5_hours
worker_high_mem: RAILS_MAX_THREADS=1 bundle exec sidekiq -q high_mem

Scaling problems & solutions

The best way to make scaling easy is by keeping it simple: a few queues with small jobs. But of course keeping it simple isn’t always easy, especially in a legacy codebase or a large team. Here are some of the problems or anti-patterns you’ll generally want to avoid:

Not enough connections in your database pool. If you’re seeing the dreaded ActiveRecord::ConnectionTimeoutError in your Sidekiq jobs, chances are you’ve misconfigured your database connection pool. We wrote a whole article about how to solve this one too! Don’t worry; it’s a short article.
ERR max number of clients reached. Unlike the error above, this error is coming from Redis, and it usually means you’re using a Redis service with an extremely limited number of connections available. You can either upgrade your Redis service or reduce your concurrency setting.
Slow job performance / saturated CPU. These go hand-in-hand when you’ve set your concurrency too high. Reduce your concurrency or use a more powerful container.
Sporadic queue backlogs. Most apps have extremely variable load patterns for background jobs. If you don’t have autoscaling in place, you’ll need to run more containers to avoid these backlogs.
Unreliable autoscaling. If you’re not scaling up and down when expected, you’re probably autoscaling based on CPU. Autoscale based on queue latency instead. See Judoscale.
Memory bloat. If your worker containers are using way more memory than you expect, you can either fix the memory bloat, or isolate those jobs to their own queue and process, potentially processing them single-threaded. We’ve written a few posts over the last year on memory that may prove helpful to you here:
Upstream (database) slow-down. It’s easy to scale Sidekiq to the point that you’re overloading your database. There’s no Sidekiq fix here—you either need to reduce total concurrency to alleviate DB pressure, upgrade your database, or make your queries more efficient.

Scaling Redis

The short answer is here is that Redis is almost never the problem when scaling Sidekiq. But for very high-scale apps, you might hit the limits of what’s possible with a single Redis server. The sharding wiki article walks you through some options here, and now Dragonfly might be an even better option.

Just remember that most apps don’t need this! Make sure you’ve worked through the earlier suggestions and confirmed that Redis is your bottleneck before proceeding down these paths.