Judoscale Dev Blog

The Post-JAMstack Era: Just Use Rails.

Jon Sully — Wed, 11 Sep 2024 00:00:00 +0000

Let’s talk about ‘static sites’ / ‘public sites’ / ‘marketing websites’ — whatever you might call them in your organization. It’s your façade. Your company’s digital face. Where all of your SEO juice pours in and you hope that Google indexes favorably! That project that you probably don’t touch all that often, probably don’t want to, and yet somehow get co-oped into fully rebuilding every 2-3 years (why is that?!). You’re reading this on ours right now 😁.

If you’re anything like us, I’d bet that yours is built on some kind of “JAMstack” technology. You know, that craze of

it’s a static site but… ✨cooler✨

it has a dynamic front end and 🪄APIs!! So… powerful

that hit in ~2020? Have you heard much about those tools and systems in the last couple of years? We haven’t 😬. Apparently, as of August 2023, the JAMstack craze is dead. So where does that leave us? What’s next? Do we have to rebuild our public site again, this time on the next new hotness? *sigh*. In the last five years we’ve rebuilt from Gridsome to Eleventy to NextJS, each taking a significant amount of time and effort.

And, for the record, Gridsome is certainly dead:

The Eleventy project seems pretty quiet these days:

And, while the NextJS project is fairly active and alive, it’s definitely not something I’d call ‘simple’ or ‘static’ anymore… more on that later:

But let’s rewind and ask my favorite question.

Why do anything at all?

The obvious question you may be thinking to yourself is, “well sure the JAMstack hype is over but why does that mean you have to rebuild anything?” And you’d be wise for asking! Just because the marketing craze for a particular framework or style is over doesn’t mean you suddenly have a broken application! There are tons of Angular v1 apps out there that still work. Heck, look at every WordPress site still running half the internet 😉.

But then therein lies the reason for change: would you want to go work on one of those Angular v1 apps? Would you be excited to go work on the internals of a WordPress site from a couple of years ago? For us it’s a resounding:

And where there’s too much friction to touch something, there’s a dying app. We want updating our public site to be a joyful process! Not a grueling one.

So, ultimately, what we’re talking about here is developer experience and ergonomics. We don’t actually care about the JAMstack craze ending as it pertains to dramatic articles or marketing campaigns, but it does matter to us in that there were a lot of promises with JAMstack that didn’t pan out.

👀 Note

🌶️ Side-bar rant that has little to do with our own marketing site since everything here really is static: I can’t help but notice and begin to roll my eyes, just slightly(!), as all of these “next gen static site generators” have moved to support dynamic endpoints. After years of talking about how build-time-generated pages are the future ✨, they’ve all succumbed to the reality: back-end computing is necessary for anything resembling actual application logic!

For all the talk of JAMstack, these frameworks are sure starting to look a whole lot like PHP with serverless Functions rather than Jekyll… 😮‍💨🙊. Okay, I’m sorry! Back to the content!

Feeling Pain

One of the things we feel really strongly about on our team is the pain principle, or what some call ‘pain driven development’: wait until something causes you pain to work on it or fix it. Essentially the flip-side of YAGNI. That allows us to work on the things that are really important to our users (what causes them pain) while making do with systems that might annoy us, but aren’t yet painful.

Unfortunately, our NextJS site became painful. And some of this relates to the “promises with JAMstack that didn’t pan out” I mentioned before. JAMstack tools were always promised as being simple, easy-to-understand, and quick to get working correctly. They were supposed to be the prodigal children of simplicity and ship-to-production paradise!

That’s… not been our experience. We’re not here to write a rant list, but here are a few highlights of what became painful:

Fragmented Tooling. While there are several evidences to this point, none stands taller than image-handling. Every JAMstack framework seems to have its own solution for images, but they’re all complicated. And in most cases, the tooling you use to construct pages and display images on your machine as you build the site is a fully separate process and procedure than how you load and prep those images in production. Local development and production processes for images often feel like two completely different workflows. The end result being that images work fine while building the site but doesn’t work at all when you take it live. That’s super frustrating.
Local Functions. Serverless functions in the cloud have long been touted as easy and scalable while being an actual-deployment nightmare. Unfortunately this remains true with the JAMstack abstraction layers built on top of them. Trying to test them locally with Netlify or Vercel’s abstractions is a maze of emulators and makes debugging essentially just trial and error. And even if you do get them to production, it can be difficult (or just impossible) to figure out which version of the function is running, what it’s logging, how it’s executing, and why it’s sometimes slow.
Composable Features. Serverless functions and images aren’t the only components named under the new “Composable Web” umbrella — basic web forms are in there too. “Composable” in the sense that “when you need them, you just plug them in to your site!” Just like Legos 🙂. But they stub your feet just like Legos too. Web forms are some of the most basic, most original components on the web. But when using JAMstack platform tools to handle form submissions, we find over and over that we can’t test these forms until we push them live. We’re constantly pushing to production to implement, test, and hot-fix forms. Production headaches because of untestable forms? That’s crazy.
Content Complexity. You might think that a “static site” means static files, but you’d be mistaken! NextJS allows for a wide range of static and dynamic routes and often blurs the line in the middle. Let’s not even get into React’s front-end re-hydration system. You can have some back-end (dynamic) pieces, some static (build-time) pieces, some React on-hydrate pieces, and some purely user-interactive javascript pieces all running in the same page. If that sounds complicated, it’s because it is!
NextJS isn’t static, anyway. We moved to NextJS because we thought it would be a simpler, static experience for a marketing site that doesn’t need much. Unfortunately the reality wore in on us over a couple of years — NextJS isn’t static and is just another web back-end that we need to learn the ins and outs of, learn to maintain, and learn to work with. The promise of static-site-simplicity just isn’t here.

Now, back to the “JAMstack is dead” bit — with most JAMstack projects being dead, close to dead, or in some lesser degree of development, we don’t have any hope that these gripes will get better. The complexity, the layering, the difficulty in just getting some simple images working for a blog-post… these are all small cuts that ultimately cause pain. And it’s enough pain to do something about it.

Besides, if we see this “failed to compile” error page ONE MORE TIME….

A Change is Gonna Come 🎵

So what’s next? What can we do that will get us to a place of less pain? It can’t be JAMstack, that much is clear. Even outside of NextJS for this site, we found similar pains and issues with Gridsome and Eleventy previously. Packaging back-end features as drop-in plug-ins for ‘static’ sites has proven to be a no-go over the years. We want a robust system with a long history, flexibility available when we need it (simplicity when we don’t), and no “composable features” — it should all be in the same box. Also, everything should run locally!

Cue Adam:

So… what if we just use Rails?

To which we had the natural response (that we think most people have):

That feels like it’d be way more power and size than we need…

And, as one person on the internet (where the best advice comes from) put it:

Don’t use Rails for this.

[it’s like] buying an airplane because it has peanuts in it.

Remember, this is still just a static marketing site with some blog posts and landing pages.

Cue Adam again:

Yeah but… so what? If there’s less pain here and we don’t use those features, who cares?

Which sold us on giving it a try! Let the Rails Static Site journey begin…

The Rails Static Site

“Rails” and “Static Site” are two phrases that don’t ordinarily go together. In fact, once we dig in even a little bit, we realize that we have no idea how these phrases should go together. How does one create a “rails static site”? What does that even mean?? “Static” traditionally meant “no back-end runtime application” which is precisely what Rails is!

Well, we’re not the first to venture into these lands. There’s a whole landscape and history here we can explore. Here are just a few of the options we discovered.

Option 1: Build a Rails app and also, a local crawler. The idea here is that you build your Rails app fairly normally but, once done, you also build out some kind of crawler process. When running locally you just run the Rails app. When going to production, your CI process executes the crawler, puts all the plain HTML files the crawler found into a directory, and ships that to a static host as your website.

This is a clever idea, for sure. When we decided to try Rails we really meant running a live Rails application on a live server, not trying to find a hacky way to turn a Rails app into a legitimate static website. So… kudos for the cleverness. We’re a hard pass on this model. It sounds a little complicated for our liking and, again, we’re after simplicity. Also, this adds a layer of potential confusion to the stack of what-runs-where. Fun idea though!

Option 2: Use a Rails-reverse-proxy to actually run a static website inside of your Rails app. This one took me a few minutes to wrap my head around, but it is a legitimate path forward. This is TestDouble’s static-rails project. Now, it seems like this project was built closer to when JAMstack was hot, so likely an accommodation layer to allow folks to run proper JAMstack systems and Rails in tandem, but nonetheless, the idea goes like this.

Here’s what it does:

In development, static-rails launches your sites’ local servers and then proxies any requests to wherever you’ve mounted them in your Rails app so you can start a single server and transition work between your static sites and Rails app seamlessly

When deploying, static-rails will compile all your static assets when rake assets:precompile is run, meaning your assets will be built automatically when pushed to a platform like Heroku

In production, static-rails will serve your sites’ compiled assets from disk with a similar features and performance to what you’re familiar with if you’ve ever hosted files out of your public/ directory

So.. it’s interesting. In development it’ll run your JAMstack project as a fully separate process, essentially running the Rails app and JAMstack app in parallel and proxying appropriate requests between them. In production it’ll compile your JAMstack app (which I suppose then means it must be fully static, not dynamic at all) then serve it a la /public. In general I thin this feels like gluing two non-cohesive apps together… and ultimately we still have a great deal of complexity to keep track of here as there are no shared assets, components, or concepts between the two apps. This is mostly just a convenience of orchestration. It’s not a project solution.

Option 3: Just run Rails. This is where we get a little tongue-in-cheek. At this point, we’re really not even suggesting a “static” site at all. What if we just built a simple Rails app with hard-coded (ERB) markup? What if we just ignore the M in MVC and serve up some views from our controllers?

As it turns out, Rails has gotten really fast over the last decade. If you mix in the quickness of not hitting a database in your request, sprinkle on some Turbo cocktail sauce, and top it off with Cloudflare, you’ve got a very fast website from a Rails app.

✅ Tip

We didn’t actually find it until after we rebuilt our public site and thus we have our own home-baked code that does essentially all the same things, but we recommend checking out sitepress. We’re not going to get into the nitty gritty of how to set your Rails app up for ‘static’ content, but Sitepress does a fantastic job of making that easy, smooth, and simple. I (Jon) have since used it for another static site I converted to Rails and find it to be quite delightful. While you can write pure HTML and ERB in Rails from scratch, Sitepress adds a nice layer for getting Markdown to play quite nicely with Rails too (which is how these blogs are written!).

Just Run Rails

As I mentioned in the tip, we’re not going to dive too deeply into how one actually sets up a Rails app for static content — though Adam did just release a video that does. At a very high level, just fire up rails new and begin writing views! Hard-coded ERB and HTML is static content. In fact, the welcome page that comes out of the box with rails new is a static page — some hard-coded markup served by a virtually empty controller.

There are a few things we do want to address though, both from experience and logic. Let the flame-war begin!

Rails is too big of a tool for the job. Trust us, we felt that way too. But the reality is that once you’re familiar with Rails as a project system (the conventions, if you will), running static content in Rails is actually quite a small endeavor. You get to essentially ignore the concept of Models altogether, likely ignore Controllers completely (if using something like Sitepress), and just focus on writing a few layouts and static views as needed.

In fact, and especially if you’re using Tailwind, suddenly your Rails project turns into simply the ERB views backing each page. Sure, you’ve still got a config directory and all the other out-of-the-box files, but the project exists almost entirely in ~/app/views. It’s pretty simple!

Rails is complicated. Okay, this is true. Rails is a full fledged application development framework and holds a lot of strong ~~opinions~~ conventions about how MVC ought to be implemented (e.g. being Resourceful). We’re not claiming that Rails is simple, just that it’s simpler than some of the popular JAMstack systems now in common use. Understanding Rails’ conventions around MVC and the basics of the server/client model are, for sure, easier to understand than how to write server-side React code that may-or-may-not actually run server-side, then get delivered to a client, re-hydrate, and may-or-may-not do things on the client side then too. Yes, Rails is complicated. No, Rails is not as complicated as many of these JAMstack semi-dynamic systems.

Rails requires a server! Sure does! But so does every static site and every accessible thing on the internet ever. Just because a static site is files served from a plain web-server (CDN-node) doesn’t mean it’s not a server. We still recommend running green-field Rails apps on Heroku, and at that abstraction layer, it’s about the simplest “server” setup around. Plus, it’s been around a long time and there are answers for every question you could possibly have along the way. (Looking at you, Vercel!) The server isn’t a bad thing!

…Servers cost money. You can happily run a static-content Rails app on Heroku for $5 per month. That is a rounding error in the monthly budget of most people that write code professionally. This isn’t a serious argument.

…Servers can’t scale well if we go viral! 😏 If only there was an autoscaling plugin for Heroku that offered a forever-free tier specifically designed to handle an application going viral and automatically scaling the app up to handle it! 👀 Judoscale what? 👀

Okay, but seriously, there is a valid concern to address here. Ignoring those somewhat-dynamic ‘static+’ frameworks that will also depend on servers, one of the benefits of truly static sites is that their static files can be distributed out to CDN nodes and have no traditional origin server. That’s a big marketing point for static sites. But is it really that impactful?

The commonly-cited other end of the spectrum here is a traditional Wordpress site. Given its nature and how it’s built, Wordpress is generally not scalable. You can’t run more than a single Wordpress server/instance without some major tweaks, major issues, or running an entirely different type of Wordpress. So it is actually likely that you’d have trouble scaling up to meet traffic demands when running this setup and suddenly going viral. Compared to Wordpress, the pure serverless/CDN-node nature of a truly static site can be a big impact!

But compared to Rails? Rails sits right in the happy middle between these two worlds. There’s a back-end running Ruby on a server, but that server is also very scalable. That’s what our system, Judoscale, does! It simply spins up more Rails servers when traffic ticks up. If your servers grow to meet your traffic’s demands all the time, isn’t that virtually the same as being purely static on a CDN anyway? The net result is the same: always having plenty of capacity for your traffic load.

👀 Note

Just for the record, Judoscale itself handles nearly 2,000 RPS and runs on the same Heroku hardware that the Eco and Basic dynos run on. And we have multiple database calls in every request! A static-content-only Rails app on even just one Heroku Basic dyno could easily handle thousands of requests per second.

YAGNI and Peace

Let’s wrap this up with YAGNI — You Ain’t Gonna Need It. The ultimate white flag against premature optimization. There’s a balance between YAGNI and developer happiness, peace, and joy. Sure, Rails is a powerful framework and you could make the case that you don’t need all of that power — not yet, at least. Therefore, presume you won’t need it and don’t use it. But the reality with tools is that we often need to build with what we’re comfortable with, even if that’s just a very small piece of that tool’s overall utility.

As the saying goes, it’s not actually about using the right tool for the job, it’s about using the right tool for the team.

Using Rails for our public site is the right tool for our team. It took us several years and other roads to figure that out, but we’re really happy with where we’ve landed. Aside from familiarity and comfort with Rails, we’re thrilled with how much room we have to grow, should we ever want to. Rails being batteries-included is a great comfort for some recovering JAMstack’ers!

So what do you say?

Give it a shot — write your next static project in Rails, as static-content, not a static site.

Say No To Partials And Helpers For A Maintainable Rails Front-End

Adam McCrea — Tue, 3 Sep 2024 00:00:00 +0000

I just read Garrett Dimon’s fantastic article “Structure Your ERb and Partials for more Maintainable Front-end Code in Rails”. It’s a wonderfully thorough discussion of how to use partials and helpers effectively. You should read it. I agree with everything he said. And yet here I am writing a rebuttal.

No, “rebuttal” isn’t right, because I don’t disagree with Garrett at all. If you and your team are stuck with the canonical Rails front-end toolkit of ERB partials and helpers, you should absolutely follow Garrett’s advice. But I want to offer a different answer, and that’s to throw out ERB partials and helpers altogether, and use Phlex instead.

I recognize this answer will be extreme for many developers and teams. Many of you feel burned by HAML. Many of you think your designers can only work with HTML (or are only willing to work with HTML??). Many of you want to stick with Omakase Rails at all costs.

That’s fine. Phlex isn’t for everyone. My intention here is just to show you how Phlex addresses all of the concerns from Garrett’s post. Phlex doesn’t so much as solve those problems as much as it makes the questions go away entirely.

When should we use a partial or a helper–or a partial rendered using a helper? Which elements should be extracted into partials or helpers? When can we use a simple helper and skip the partial file? When should we use the built-in tag helpers instead of writing HTML directly?

These questions all assume you have multiple tools for the job of extracting reusable or otherwise self-contained markup—something you might call a “component”. With Phlex, you just extract a component and write your HTML in Ruby. No questions, no trade-offs.

No more trade-offs

Garrett’s article discussed many ways of approaching a simple figure element, and the trade-offs inherent in each of them. We’re not going to do that here. When you write components with Phlex, you don’t have to decide what goes in a partial and what goes in a helper. It all goes in the component, and it’s all Ruby.

Let’s jump right to the finished component:

class Figure < ApplicationComponent
  def initialize(src:, alt:, caption: nil)
    @src = src
    @alt = alt
    @caption = caption
  end

  def view_template(&block)
    @caption = block.call if block

    figure do
      img(src: @src, alt: @alt)
      figcaption { @caption } if @caption.present?
    end
  end
end

And here’s how you’d use it form an ERB view. (Of course you could render it from a Phlex view, but I want to show how nicely the two can coexist.)

<%= render Figure.new(src: "path/to/image.jpg", alt: "Alternate Text") do %>
  Italicized Figure Caption
<% end %>

When you decide to embrace Phlex for your components, all the “maintainability” questions with partials and helpers just fall away. You’re just writing Ruby.

Explicit Ruby without hacks

The beauty of Phlex is that it defines the contract and dependencies of a component in plain Ruby. Our Figure component has two required named attributes:

def initialize(src:, alt:, caption: nil)

Compare this to the first line of the ERB partial:

<%# locals: (src:, alt:, figcaption: nil) -%>

One of these is basic Ruby syntax, and one is an optional magic comment introduced by Rails 7.1. I know which one I prefer.

Seeing the shape of the markup

You might notice that the Phlex implementation looks at lot like Garrett’s helper implementation that uses the tag method:

def figure(src, alt:, figcaption: nil)
  figure_content = tag.img(src: src, alt: alt)
  figure_content += tag.figcaption(figcaption) if figcaption.present?

  tag.figure do
    figure_content
  end
end

One of his concerns here is regarding “shape” and “structure”:

It generates all of the markup via tag helpers, and that obfuscates the structure of the underlying markup. It becomes more difficult to see the shape, and it would be more challenging for a purely front-end developer to make changes without help from a back-end developer.

I argue that Phlex solves that by encouraging you to shape your Ruby just as you would your HTML—after all, the structure of HTML elements and attributes very closely mirrors the structure of Ruby blocks and keyword parameters. Most developers and designers alike would have no trouble seeing the HTML structure in this code:

figure do
  img(src: @src, alt: @alt)
  figcaption { @caption } if @caption.present?
end

Beyond a simple example

Of course the figure example was kept simple for the purpose of a blog post. It had to be that way. But Phlex really starts to shine when you have more complex components. The reason is simple: it’s just a Ruby class.

As you start to build out a complex Phlex component, you’ll naturally begin refactoring parts of it into well-named private methods. You’ll extract reused strings like Tailwind classes into local variables and constants. You’ll use all the tools that come naturally to you as a Ruby developer.

These tools don’t exist in ERB partials, and they barely exist in helpers. What you usually end up with is a scattering of partials and helper methods, related in name but defined globally. Nothing is really self-contained, and you’re bouncing between files to try and follow along.

But HTML is wonderful!!

Garrett ends with this sentiment:

Context matters both in the code and on the team creating the code.

I couldn’t agree more. If you or your team are HTML/ERB purists, this approach certainly isn’t going to fly.

But if you work solo or if you’re on a small team, I encourage you to give it a try. I don’t hate HTML, but I’ve worked on enough Rails apps to know that the view layer is always the messiest. That’s not because of HTML—it’s because of ERB partials and helpers, tools that fall way short in terms of readability and refactorability.

When you embrace Phlex as a replacement for partials and helpers, you’ll inevitably want to extract more components. Your components will be easier to read, easier to maintain, and you’ll find yourself wanting to write front-end code again. Long live the view layer!

Shared Hardware: How Bad Can it Get?

Jon Sully — Thu, 29 Aug 2024 00:00:00 +0000

Let me set the narrative stage here just a bit before we dive in. First, Judoscale runs on Heroku and, like any good web SaaS, is itself scaled by Judoscale. Second, we’ve been experimenting with running Judoscale on Standard-1X and Standard-2X dynos — trying to profile which is better (…TBD). Third, several weeks ago we broke our still-in-beta feature, Dyno Sniper, and didn’t realize it for a few days. Nothing major, just a little bug.

Unfortunately, breaking Dyno Sniper crushed our performance.

Let’s rewind though. There’s some context here we need to cover — how Heroku’s architecture works, how metrics work, how neighbors can be rough, etc.

Welcome to Heroku

Heroku’s been around a long time — nearing twenty years! While they’re generally known for starting the Platform-as-a-Service industry (PaaS) by offering highly-automated hosting, they actually remain the best choice for new startups, small apps, and quick operations today. Can we take a moment to consider how wild that is? A startup-oriented hosting product that’s aged twenty years with few feature changes and still remains the best choice? Heroku is both a dinosaur and a unicorn.

👀 Note

Did you know that Heroku was acquired by Salesforce in 2011; only 4 years after it was created? If you’re like me, you probably thought it was much later than that! That there were many ‘great years’ of Heroku before Salesforce came in… but not so!

Oh, and the good news here is that Salesforce appears to be putting more time and money into Heroku (finally?). The last year or two have seen lots of new feature rollouts, upgrades, and tweaks. Check out their change-logs sometime. Heroku’s getting back in the game!

But where there’s history, there’s competition! Newer contenders to the PaaS space, mostly implementing a similar model as Heroku (e.g. Render and Fly), are actively trying to win over the market. We’ve yet to see that happen, and still choose Heroku for our own green-field applications, but it’s neat to see competition driving innovation in the space.

And innovation we do need. Particularly because one of the most fundamental premises implemented in these platforms is that of resource-sharing. PaaS’s rent servers on lower-level cloud providers (AWS, Google Cloud, Azure, etc.) and divvy up that server into smaller chunks of processing power that they then rent to our applications. We get a few benefits in this exchange — namely that we don’t have to worry about what size of server to rent and we don’t have to manually set up all the stuff that goes into a modern web application running on bare metal. But the PaaS’s get a few benefits too — our money, for starters, but also the efficiency of running many different applications on the same, large-capacity, servers.

It’s this trade-off between ‘we pay less per server because we get the big ones’ (the Costco approach) and ‘we put many applications into a single server so they can all run together and efficiently utilize all the horsepower of that server’ (the carpool-lane approach), that causes so much tension. If you’ve ever gotten into a carpool lane, you’ll know that it’s fast and great when there aren’t too many other vehicles in it. You’ll also know that it can be just as slow (or slower) than the other lanes if there are too many vehicles in it.

This metaphor is an accurate depiction of what’s happening on Heroku servers when they run multiple dynos. Too many applications vying for a slice of a pie that’s not big enough to support all of them! So, at times, they each can get cut a little short. Now, this isn’t an all-the-time thing — the pie cuts are constantly changing as different applications spike in their resource needs! It’s a real-time cacophony of “GIVE ME MORE CPU” between every app on the server:

👀 Note

Just a side-bar here — we’re talking about PaaS hosting options where your application will be hosted on shared hardware. Most of these PaaS companies, Heroku included, also offer dedicated hardware hosting options! They’re generally much (much!) more expensive, but they do work great!

But this is ultimately the game of risk that we play as developers: we want cheap(er), easy hosting for our application and are willing to have not-quite-100% performance. So we accept that the pie is constantly changing and use a tool like Judoscale (hey, it’s us! 👋) to autoscale our application instead. Autoscaling will add more dynos when performance dips, so overall everything should be okay, right?

Well… it’s a little more complicated than that. Let’s talk about metrics.

What’s a Metric, Anyway?

I want to step back a little bit (but hopefully not too) far with this question: how do we even know if our app is performing well in the first place? Sure, we could point our browser at the app (assuming it has a UI), load it up, and see how long it takes. But that’s a little like measuring the height of a single wave to determine if the entire ocean is healthy. A sample size of one, as they say 😉.

Alternatively, we could watch every single request in our real-time logs and assess how long it was queued, how long it took to process, and the various data-points within. That would give us some confidence around knowing that our app is running well, but it sounds exhausting. And if your request volume is more than about one request per second, good luck keeping up!

So, and hopefully I’m not being too reductive here, instead we use aggregate metrics. Basically just algorithms that batch up all of those data-points from every request and boil down that data into (typically) a single number or value that we can digest easier. The most common aggregate metric is actually the simple average. I think we’re all familiar with this concept! Here’s an example:

req.1 response_time=90ms
req.2 response_time=78ms
req.3 response_time=55ms
req.4 response_time=67ms
req.5 response_time=88ms
req.6 response_time=94ms
req.7 response_time=61ms
req.8 response_time=72ms
req.9 response_time=83ms
req.10 response_time=234ms
req.11 response_time=76ms
req.12 response_time=58ms

# avg => ~88ms

But aggregate metrics are like zooming out of a high-resolution image. You’ll still be able to make out the gist of the image, but you won’t necessarily be able to see the fine details anymore. And the more you zoom out, the more you lose sight of the little bits! For example, did you see that req.10 had an inordinately high response time of 234ms? An outlier, for sure.

But this zoom-out detail-loss isn’t a bad thing, generally. There’s real value in being able to quickly glance at an image from far away and determine that the image is of a tiger (for example) without having to see all the fine-details and spend time assessing the subject. But this is the tradeoff: assessment speed versus detail.

✅ Tip

Let’s make that last bit a little more visual. Here are two cuts of the same image. On the left is a very zoomed-out view. Depending on what screen-size you’re currently reading this article on, it should be a pretty small image. But you can still make out that it’s a tiger, right?

On the right, however, we can actually see that this tiger’s whiskers are halfway to turning gray. If your job was that of a tiger age specialist, the zoomed out picture wouldn’t have helped much, even though it was easy to identify! Details and resolution are always going to be relative to your responsibility with the system. Just like our response-time average example, the average of 88ms is the zoomed-out view, but req.10 taking 234ms is a detail you’d only see by zooming in.

Thanks to Kartik Iyer for this image

Neat tiger shots aside, the metaphor remains true for application metrics, too. There is value in having a single number we can quickly observe to determine if our application is healthy and stable! But there are caveats of details we might miss in the process. This can be further complicated by the choice of aggregation algorithm in place, too.

Let’s take another example data-set. Here we’re observing queue time, which, if you’ve read our Understanding Queue Time guide, is the single value to watch when trying to determine if you need to scale up or down. There are several requests to various endpoints here across three different dynos:

req.1 dyno.1 queue_time=33ms
req.2 dyno.2 queue_time=48ms
req.3 dyno.1 queue_time=41ms
req.4 dyno.3 queue_time=374ms
req.5 duno.1 queue_time=30ms
req.6 dyno.1 queue_time=27ms
req.7 dyno.3 queue_time=401ms
req.8 dyno.2 queue_time=36ms
req.9 dyno.1 queue_time=43ms
req.10 dyno.3 queue_time=338ms

We can make a couple of early observations here since we’re down in the weeds, looking at the highly detailed, request-by-request data. First, we see that the normal queue time appears to be around 30-40ish milliseconds. Most of the requests are in that band. Second, that leads us to see the few queue times in the 300-400ms range as outliers — higher than the typical. Something’s going on! We have some kind of issue.

Now, if we assess this data using an average aggregate, we’d get a result of 137ms. It’s important to remember how averaging works, in this case. Every value in the collection influences the average and pulls the value until it’s in perfect tension with all the values. We can plot those requests on a histogram-style chart and see this:

If we know our normal queue time is around that 30-40ms mark, then the average is telling us that we’re far above normal. That’s good! The average is allowing the very-high values to influence the metric enough to be visible. This is correctly signifying that we currently have an issue.

Alternatively, we could use a median aggregate. A median simply uses the middle values after sorting. That’d look like this:

And, in our case, the median value is 42ms. Which is totally within our normal expected range for this data… but that’s not good! We are having some kind of issue. If our metric isn’t telling us that, that’s a problem!

The flaw with using a median for this kind of data is that it gives the outliers no weight to impact the metric value. Our illustration shows that — a median is purely just a middle-line after sorting all the data. That means that half (or more) of our requests would need to be in that very-high (bad!) state before the median reports any real problem. We don’t want to get that far! We want to fix our issue sooner than later.

Of course, there’s always the third-M in the aggregate trio we could use: the mode. Unfortunately, the mode is actually quite a bit less useful for understanding system health. Being that the mode of any set of data is simply the value which occurs the most frequently in that set, its resulting metric doesn’t tell us anything about broad system metrics at all. If a server was handling requests with queue times of [33, 33, 418, 381, 332, 583, 427, 470], clearly the majority of the requests have a high queue time and there’s an issue! But the mode in this case would report 33ms simply because that value was repeated. Yikes!

Out of these three options (average, median, mode) the average is the only aggregation algorithm that correctly results in a metric that indicates we’re out-of-normal. That’s great! But let’s see what happens when shared hardware gets involved and we put this into the context of autoscaling.

Averages with Shared Hardware

The tricky thing with averages is that they give every value in the collection an equal weight in the resulting metric. Like our prior example, they all get to tug the ‘average line’ with the same strength, regardless of how outlying the value is. That can get tricky when you’re trying to determine your overall system queue time across different dynos that are behaving differently.

Imagine we have an application running two dynos. Those dynos are on separate physical servers and dyno #2 unfortunately has a couple of noisy neighbors. Over the course of a 20 second window, the queue times for the requests to each of these dynos looks like the following:

Dyno.1 (Quiet)	Dyno.2 (Noisy)
queue_time = 4ms	queue_time = 34ms
queue_time = 3ms	queue_time = 142ms
queue_time = 8ms	queue_time = 88ms
queue_time = 7ms	queue_time = 71ms
queue_time = 2ms	queue_time = 50ms
queue_time = 8ms	queue_time = 22ms
queue_time = 4ms	queue_time = 12ms
queue_time = 5ms	queue_time = 224ms
queue_time = 7ms	queue_time = 163ms

If we take Dyno.1 to be our sense of health for the application, let’s say that, in normal circumstances, the queue time for our app should be below 10ms. That means Dyno.2 is in a pretty bad state.

Obviously, as one might expect, our average queue time across the whole system is going to be pretty high and indicate that we should scale up. Indeed, doing the math, our average system queue time is 47ms. So we scale up! Let’s look at the next 20 seconds, now with three dynos instead of two:

Dyno.1 (Quiet)	Dyno.2 (Noisy)	Dyno.3 (Quiet)
queue_time = 8ms	queue_time = 228ms	queue_time = 4ms
queue_time = 4ms	queue_time = 316ms	queue_time = 8ms
queue_time = 3ms	queue_time = 39ms	queue_time = 5ms
queue_time = 4ms	queue_time = 301ms	queue_time = 3ms
queue_time = 3ms	queue_time = 36ms	queue_time = 5ms
queue_time = 3ms	queue_time = 183ms	queue_time = 7ms
queue_time = 5ms	queue_time = 151ms	queue_time = 9ms
queue_time = 9ms	queue_time = 72ms	queue_time = 6ms
queue_time = 5ms	queue_time = 164ms	queue_time = 7ms

Alright, how’s our average doing, then? It’s now… 58ms. Wait, what? Okay, well, we can talk about that, but in the meantime our autoscaler is still seeing a high queue time and scales us up again. Now, we’re assuming that the new dynos added to the cluster are getting provisioned on non-noisy hosts (which is quite an assumption), but let’s keep on:

Dyno.1 (Quiet)	Dyno.2 (Noisy)	Dyno.3 (Quiet)	Dyno.4 (Quiet)
7ms	259ms	5ms	6ms
5ms	184ms	8ms	4ms
3ms	78ms	7ms	7ms
6ms	245ms	6ms	5ms
4ms	144ms	4ms	8ms
7ms	312ms	8ms	7ms
3ms	438ms	9ms	6ms
6ms	132ms	3ms	5ms
5ms	47ms	4ms	4ms

And the average queue time now is… 55ms. This isn’t good. We keep upscaling but it’s not helping our metric! That’s not how autoscaling is supposed to work! Ugh.

Let’s jump ahead a few minutes. We’re now up to ten dynos — nine of which are on quiet, normal hosts! That looks like this:

Dyno.1 (Quiet)	Dyno.2 (Noisy)	Dyno.3 (Quiet)	Dyno.4 (Quiet)	Dyno.5 (Quiet)	Dyno.6 (Quiet)	Dyno.7 (Quiet)	Dyno.8 (Quiet)	Dyno.9 (Quiet)	Dyno.10 (Quiet)
6ms	289ms	7ms	4ms	8ms	6ms	5ms	7ms	4ms	5ms
5ms	314ms	6ms	5ms	4ms	8ms	7ms	3ms	5ms	7ms
7ms	238ms	3ms	7ms	6ms	4ms	8ms	5ms	3ms	6ms
4ms	372ms	8ms	6ms	7ms	5ms	4ms	6ms	8ms	4ms
6ms	158ms	4ms	8ms	5ms	7ms	6ms	4ms	7ms	3ms
5ms	427ms	5ms	7ms	4ms	6ms	7ms	5ms	6ms	8ms
4ms	341ms	7ms	3ms	8ms	5ms	3ms	8ms	5ms	6ms
7ms	192ms	6ms	4ms	6ms	8ms	5ms	6ms	7ms	4ms
3ms	305ms	8ms	5ms	7ms	3ms	6ms	7ms	4ms	7ms

And our queue time metric? Still 34ms. Still far higher than what we considered ‘normal’ (< 10ms) even though nine of our dynos are very much within normal range! The influence of a single rogue dyno is skewing our metric!

👀 Note

It’s worth stepping back and observing that, before Dyno.2 got noisy, this application was running totally fine on two dynos. We know that because the queue time of Dyno.1 was very low and consistent — that indicates that this app didn’t have a capacity issue at all! It simply encountered a noisy-neighbor issue. And yet, we now have a cluster of five times the number of dynos we started with, still haven’t alleviated our queue time metric, and are essentially throwing money away 💸. This sucks!

So what’s really going on here, then? It comes back to aggregate metric algorithms.

[Averages] give every value in the collection an equal weight in the resulting metric

And, in our case here, that means that the single outlier dyno having a bad day is pulling our average up even though nine (!!) of our dynos are perfectly happy!

All of this leads to a single, poignant, conclusion: we need to be able to root out and halt dynos that are disproportionately impacted by noisy neighbors so that their metrics don’t skew our average metric across the whole cluster.

Luckily, this is exactly what the Dyno Sniper does. We just broke it for a short time 🙃.

Our Experience

While we’ve known about the noisy-neighbor downsides and impacts for quite some time, it’s only after building and running the Dyno Sniper on Judoscale itself that we’ve come to understand just how bad the noisy neighbor problem on shared hardware can be. Especially when you turn it off.

you don’t know what you’ve got till it’s gone

- Renowned Postmodern Philosophers: Counting Crows

As I mentioned far above, we’ve been experimenting for some time with running Standard-1X and/or Standard-2X dynos on Heroku. We have a few hunches here that we’re still working out (hope to post about those soon!), but the point is that we’re very much running on shared hardware.

Additionally, Judoscale is a fairly high-traffic app. We routinely run between 5 and 25 Standard-1X dynos to handle all of our traffic. With Dyno Sniper running, on an average day, we stick to around 15-16 dynos.

As you can see, we only stick to 14-16 dynos because the Dyno Sniper is constantly recognizing single-dyno spikes that are disproportionate to all the other dynos and sniping them! Fascinatingly, just about every single queue time spike in that chart (and subsequent drop) correlates perfectly to a snipe event. Our sniping UI is still a work-in-progress, but you get the idea! The sniper is busy!

But when sniping is disabled (or borked)? Well. That example we walked through where we spun up to ten dynos without fixing our issue? That was us. Except it was 25 dynos and never let up!

We didn’t actually have a capacity issue at all, but we autoscaled to our max scale limit and stayed there for days on end. For us, that’s 25 dynos 💸. That didn’t feel great. Unfortunately it’s also extremely opaque! It’s very difficult to know when you’re facing a problem only impacting a single dyno since almost all of Heroku’s stats and metrics are aggregated across all dynos. That darn average!

Snipe On, Sniper!

The good news is that we got the Dyno Sniper back up and running quickly and have since had no issues with our app or our other beta-testing apps that have opted into sniping.

To that end, if you’re a current Judoscale customer running on shared hardware and you haven’t opted in for running the Dyno Sniper, we really recommend you do. We’re actively iterating on the application UI for it, but it’s functionally very available and impactful from the moment we enable it for you.

So… Shared Hardware — How Bad Can it Get? Bad 🙁. If your particular dyno instance happens to get stuck on a host with some noisy, unpleasant neighbors, there’s almost no way out. Metrics will be skewed, add-ons will be confused, and your application alerting might trigger even though most of your dynos are fine. We hope that Dyno Sniper will allow folks to fully fix, and side-step, this issue.

Mastering AWS ECS Configuration with Terraform

Jeff Morhous — Tue, 27 Aug 2024 00:00:00 +0000

Managing infrastructure for web applications is a complex endeavor, and using Amazon Web Services is no exception. Using an Infrastructure as Code tool like Terraform to provision and change resources for AWS makes the process more straightforward and repeatable. In this article, you’ll learn how to create an ECS cluster with Terraform that runs a simple Node app in a Docker container.

To complete this tutorial, you’ll need to have a few things installed, most notably:

First, we’ll dig deeper into why the software industry loves Infrastructure as Code, and how Terraform plays in. Then, we’ll write a quick example app that you can use if you aren’t trying to deploy an existing project. Finally, we’ll set up our infrastructure one piece at a time with Terraform. As we wrap up, we’ll explore autoscaling and compare your options!

Understanding Infrastructure as Code and Terraform

Whether your team is using a platform, a cloud provider, or even on-site hardware, configuring infrastructure is an unavoidable part of building software. Historically, developers and ops professionals have used user interfaces to configure infrastructure. You can set up and make changes to AWS services using the AWS Management Console, but Infrastructure as Code (IaC) offers a better way.

Infrastructure as Code refers to the practice of provisioning and managing infrastructure as text files, compared to using a UI. AWS offers Cloud Formation, its own product to manage AWS infrastructure with JSON and YAML files. While popular, a vendor-specific offering like this introduces some vendor lock-in, so some teams may wish to use a platform-independent IaC tool like Terraform.

Terraform is a popular Infrastructure as Code tool that lets engineers manage infrastructure across many tools with the same human-readable language. With Terraform, you can provision cloud resources across all the major cloud providers (including AWS!), manage virtual machines, and even configure monitoring tools.

The reason that Terraform is so popular is that it leans on Providers, which are written in Go and often open source, to interpret Terraform files and make API calls to the correct providers. There are thousands of providers hosted on the Terraform registry, including a provider for AWS!

What is AWS Elastic Container Service (ECS)?

Elastic Container Service (ECS) is a managed offering from AWS to orchestrate containers. Launching applications on ECS abstracts away the underlying cloud infrastructure, making managing and scaling applications easier.

You can use ECS in conjunction with EC2, another Amazon Web Services offering that lets you directly manage the underlying virtual server. This requires you to manage the EC2 instances themselves, which comes with its own complexity.

Fortunately, ECS also works with AWS Fargate, serverless infrastructure made to run containers without the responsibility of managing the virtual server. In this tutorial, we’ll use ECS with Fargate as the compute platform for our containers.

Creating an AWS ECS Cluster with Terraform

Building an example application

To show that our Terraform code effectively creates the resources needed to run a web application on ECS with Fargate, we’ll build a quick example application to deploy. First, make a new directory for the app and initialize a new Node application with:

mkdir example-app && cd example-app && npm init

Next, create a new file in the root of the folder called index.js.

Next, Install Express:

npm install express

Finally, paste the sample code for a Hello World application in a new file called index.js:

const express = require("express");
const app = express();
const port = 3000;

app.get("/", (req, res) => {
  res.send("Hello World!");
});

app.listen(port, () => {
  console.log(`Example app listening on port ${port}`);
});

Now that we have a minimal Node application that can listen for and respond to web requests, we need to containerize it. Create a new file, Dockerfile, in the root of the repository and paste the following into it:

FROM node:22-alpine

WORKDIR /app

EXPOSE 3000

COPY package*.json ./
RUN npm ci

COPY . .

CMD [ "node", "index.js"]

Finally, create a file called .dockerignore and paste the following in it to avoid Docker copying unwanted files. The following setup will ignore everything except index.js, package-lock.json, and package.json:

*
!index.js
!package-lock.json
!package.json

While we’re at it, create a .gitignore file with the following contents:

node_modules/

Lastly, we’ll build the Docker image and run a container to confirm that the application is working well in Docker. Build the Docker image by running:

docker build --tag example-app .

Then run the container with:

docker run --publish 3000:3000 example-app

You should see “Hello world!” when you visit localhost:3000 in your browser now!

Setting up Terraform

Let’s get started creating resources on AWS with Terraform! To begin, make a new file in the root of your repository, main.tf, with the following starting code:

terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.56"
    }
  }
}

# Configure AWS Provider
provider "aws" {
  region = "us-west-2"
}

This beginning of the Terraform configuration just sets up the AWS Terraform provider, which Terraform will use to turn the code into real AWS resources. We also set the AWS region, which you can customize to match your needs.

Authenticating Terraform to AWS

Terraform will need to access AWS, so you’ll need to create a new IAM user in the AWS console. Give it a memorable name, like terraform-user, then move on to set permissions. You’ll need to attach policies directly. For a real project, you should follow the principle of least privileged access and only grant this IAM user the minimum access it needs. For simplicity in this tutorial, you can give it full rights with the AdministratorAccess policy. Create an access key for this user for CLI access. Finally, note the access key and secrete.

You can only view the secret once, so AWS gives you the option to download a CSV file with the key and secret.

Next, set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables on your system:

export AWS_ACCESS_KEY_ID="your-access-key"

export AWS_SECRET_ACCESS_KEY="your-secret-key"

Without these values, the AWS Terraform provider will not be able to authenticate to AWS.

Creating a VPC

The first real piece of infrastructure we’ll add (just below the existing Terraform setup) is a Virtual Private Cloud or VPC. We’ll use the Terraform module for AWS VPC, which makes things pretty easy.

data "aws_availability_zones" "available" { state = "available" }

module "vpc" {
  source = "terraform-aws-modules/vpc/aws"
  version = "~> 3.19.0"

  azs = slice(data.aws_availability_zones.available.names, 0, 2) # Span subnetworks across 2 avalibility zones
  cidr = "10.0.0.0/16"
  create_igw = true # Expose public subnetworks to the Internet
  enable_nat_gateway = true # Hide private subnetworks behind NAT Gateway
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
  public_subnets = ["10.0.101.0/24", "10.0.102.0/24"]
  single_nat_gateway = true
}

We’re actually at a point now where we can test our configuration so far to confirm it works. First, install the required providers by running:

terraform init

Then, provision the VPC specified in main.tf by running:

terraform apply

This will create the VPC and all that it needs - the VPC module we used handles a lot of things for us!

Creating a load balancer

Next, we’ll create an application load balancer with Terraform by adding the following to the bottom of main.tf:

module "alb" {
 source  = "terraform-aws-modules/alb/aws"
 version = "~> 8.4.0"

 load_balancer_type = "application"
 security_groups = [module.vpc.default_security_group_id]
 subnets = module.vpc.public_subnets
 vpc_id = module.vpc.vpc_id

 security_group_rules = {
  ingress_all_http = {
   type        = "ingress"
   from_port   = 80
   to_port     = 80
   protocol    = "TCP"
   description = "Permit incoming HTTP requests"
   cidr_blocks = ["0.0.0.0/0"]
  }
  egress_all = {
   type        = "egress"
   from_port   = 0
   to_port     = 0
   protocol    = "-1"
   description = "Permit outgoing requests"
   cidr_blocks = ["0.0.0.0/0"]
  }
 }

 http_tcp_listeners = [
  {
   port               = 80
   protocol           = "HTTP"
   target_group_index = 0
  }
 ]

 target_groups = [
  {
   backend_port         = 3000
   backend_protocol     = "HTTP"
   target_type          = "ip"
  }
 ]
}

Because we added a new module, alb, we’ll need to run terraform init again to install it. Then, run terraform apply again. Terraform is smart enough to know that it already created the VPC, so it will just create the new resource, the load balancer.

Creating an ECS Fargate Cluster

We’re finally to the core of what we’re trying to do! Add the following to the bottom of main.tf:

module "ecs" {
  source  = "terraform-aws-modules/ecs/aws"
  version = "~> 4.1.3"

  cluster_name = "judoscale-example"

  fargate_capacity_providers = {
    FARGATE = {
      default_capacity_provider_strategy = {
      base   = 20
      weight = 50
      }
    }
    FARGATE_SPOT = {
      default_capacity_provider_strategy = {
      weight = 50
      }
    }
  }
}

You don’t have to apply Terraform after every change, but it’s an nice way to sanity check that each addition is doing what you intend. Run terraform init and terraform apply again.

Terraform gives you a confirmation of the success, but you’re probably wondering if things are actually happening in AWS. At this point, it’s easy to check! Go to the AWS dashboard for the region you selected in your Terraform code (in the example we use us-west-2), then go to Elastic Container Service and see your newly created cluster!

Get the Docker container into AWS

The goal of this tutorial is to have the Docker container we built locally in the beginning running on AWS, so we have to set up Terraform to make that happen. First, we’ll need to install the Docker Terraform provider. Edit the required_providers block at the beginning of your main.tf to include the Docker provider in addition to the existing AWS provider:

required_providers {
  aws = {
    source  = "hashicorp/aws"
    version = "~> 4.56"
  }
  docker = {
    source  = "kreuzwerker/docker"
    version = "~> 3.0.2"
  }
}

Next, add the following to the bottom of main.tf to create an Elastic Container Registry and build and push an image to it when running Terraform:

data "aws_caller_identity" "this" {}
data "aws_ecr_authorization_token" "this" {}
data "aws_region" "this" {}
locals { ecr_address = format("%v.dkr.ecr.%v.amazonaws.com", data.aws_caller_identity.this.account_id, data.aws_region.this.name) }
provider "docker" {
  registry_auth {
    address  = local.ecr_address
    password = data.aws_ecr_authorization_token.this.password
    username = data.aws_ecr_authorization_token.this.user_name
  }
}

module "ecr" {
  source  = "terraform-aws-modules/ecr/aws"
  version = "~> 1.6.0"

  repository_force_delete = true
  repository_name = "judoscale-example"
  repository_lifecycle_policy = jsonencode({
    rules = [{
      action = { type = "expire" }
      description = "Delete old images"
      rulePriority = 1
      selection = {
        countNumber = 3
        countType = "imageCountMoreThan"
        tagStatus = "any"
      }
    }]
  })
}

resource "docker_image" "exampleimage" {
  name = format("%v:%v", module.ecr.repository_url, formatdate("YYYY-MM-DD'T'hh-mm-ss", timestamp()))
  build { context = "." } # Path
}

resource "docker_registry_image" "exampleimage" {
  keep_remotely = false
  name = resource.docker_image.exampleimage.name
}

Install the new Docker provider with terraform init and apply the plan with terraform apply. After the plan succeeds, go to the AWS dashboard and find the Elastic Container Registry list for your selected region. You’ll see the newly created container registry, and inside it, a single Docker image!

Creating an ECS Task Definition

An ECS cluster is not very useful on its own. Within a cluster, you must define services, which manage a specified number of tasks running simultaneously. Each task in the service is an instantiation of a task definition, which specifies one or more containers to run together.

We’ll first use Terraform to create an ECS task definition that defines our application. Add the following to the bottom of main.tf:

resource "aws_iam_role" "ecsTaskExecutionRole" {
  name               = "ecsTaskExecutionRole"
  assume_role_policy = "${data.aws_iam_policy_document.assume_role_policy.json}"
}

data "aws_iam_policy_document" "assume_role_policy" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["ecs-tasks.amazonaws.com"]
    }
  }
}

resource "aws_iam_role_policy_attachment" "ecsTaskExecutionRole_policy" {
  role       = "${aws_iam_role.ecsTaskExecutionRole.name}"
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

resource "aws_ecs_task_definition" "this" {
  container_definitions = jsonencode([{
    essential = true,
    image = resource.docker_registry_image.exampleimage.name,
    name = "example-container",
    portMappings = [{ containerPort = 3000, hostPort = 3000 }],
  }])
  cpu = 256
  execution_role_arn = "${aws_iam_role.ecsTaskExecutionRole.arn}"
  family = "family-of-judoscale-example-tasks"
  memory = 512
  network_mode = "awsvpc"
  requires_compatibilities = ["FARGATE"]

  # Only set the below if building on an ARM64 computer like an Apple Silicon Mac
  runtime_platform {
    operating_system_family = "LINUX"
    cpu_architecture        = "ARM64"
  }
}

This creates both the ECS task and IAM role it needs.

You’ll notice there is a runtime_platform section in the ECS Task resource - if you’re building your Docker container on an ARM64 machine like an Apple Silicon Mac, you’ll need this section. If not, you’ll need to remove it!

Run terraform apply to create this task definition.

Creating an ECS Service

Next, we’ll use Terraform to create an ECS service, which manages the task we just created.

resource "aws_ecs_service" "this" {
  cluster = module.ecs.cluster_id
  desired_count = 1
  launch_type = "FARGATE"
  name = "judoscale-example-service"
  task_definition = resource.aws_ecs_task_definition.this.arn

  lifecycle {
    ignore_changes = [desired_count]
  }

  load_balancer {
    container_name = "example-container"
    container_port = 3000
    target_group_arn = module.alb.target_group_arns[0]
  }

  network_configuration {
    security_groups = [module.vpc.default_security_group_id]
    subnets = module.vpc.private_subnets
  }
}

output "public-url" { value = "http://${module.alb.lb_dns_name}" }

Run terraform apply again. Once completed, this will output the public-facing URL for your application! Give it some time to finish the deploy, then visit the URL in your browser to see “Hello World!” from inside AWS!

Autoscaling ECS with Queue Time

AWS’s built-in ECS autoscaler, which manages the number of tasks running in your service, works by monitoring CPU utilization and memory usage of your tasks. When these metrics exceed certain thresholds, it automatically adds more tasks to handle the increased load.

While CPU and memory metrics are useful, they show a limited view of an application’s performance. For web apps, one metric that often gets overlooked is queue time, the duration between when a request hits your server and when your application starts processing it. High queue times often indicate that your application is struggling to keep up with incoming requests, even if CPU and memory usage seem normal.

Autoscaling based on queue time can provide more responsive scaling when compared to using memory and CPU. Judoscale offers the ability to scale ECS applications based on queue time. By monitoring your application’s queue time, Judoscale can scale more precisely, helping you maintain performance without overspending on resources.

Setting up an NGINX Sidecar

If you’re using an autoscaler like Judoscale, you’ll need to run an NGINX sidecar proxy to inject the X-Request-Start header into web requests. Because all of our infrastructure was created with Terraform, we can make this change to the setup pretty easily. We’ll make a quick change to our Task Definition to also deploy Judoscale’s sidecar container from the public container registry. Replace the existing task definition with this:

resource "aws_ecs_task_definition" "this" {
  container_definitions = jsonencode([
    {
      essential = true,
      image = resource.docker_registry_image.exampleimage.name,
      name = "example-container",
      portMappings = [{ containerPort = 3000, hostPort = 3000 }],
    },
    {
      essential = true,
      image = "public.ecr.aws/b1e6w9f4/nginx-sidecar-start-header:latest",
      name = "nginx-sidecar",
      portMappings = [{ containerPort = 80, hostPort = 80 }],
    }
  ])
  cpu = 256
  execution_role_arn = "${aws_iam_role.ecsTaskExecutionRole.arn}"
  family = "family-of-judoscale-example-tasks"
  memory = 512
  network_mode = "awsvpc"
  requires_compatibilities = ["FARGATE"]

  # Only set the below if building on an ARM64 computer like an Apple Silicon Mac
  runtime_platform {
    operating_system_family = "LINUX"
    cpu_architecture        = "ARM64"
  }
}

Next, we’ll change our load balancer to distribute traffic to this NGINX container instead of the web container. The NGINX container will add the needed header, and then send the traffic to the container running our app. In the alb module, change the target_groups to be this:

 target_groups = [
  {
   backend_port         = 80
   backend_protocol     = "HTTP"
   target_type          = "ip"
  }
 ]

Then, apply the changes with terraform apply. After waiting some time for the deploy, click the link that Terraform outputs and you’ll still see “Hello world!”. This shows off some of the convenience of Terraform - we were able to make a pretty complex infrastructure way in just a few lines of code!

You can go into the AWS console yourself to see the cluster, the service, the task, and both containers running!

If you’re having any problems at all, compare your code to the example project’s completed Terraform configuration.

Cleaning Up Our Resources

If you were following along with the tutorial just to learn, then you probably want to delete the AWS resources you created along the way to avoid charges. Deleting resources with Terraform is just as easy as making any other change. Just run

terraform destroy

Regardless of your autoscaling choice, Terraform is a powerful option for managing your cloud resources. While this article went over the configuration one piece at a time, you may want to reference the final example project on GitHub.

Why Did Rails' Puma Config Change?!

Jon Sully — Fri, 23 Aug 2024 00:00:00 +0000

Back at the very beginning of 2024, DHH started a conversation in the Rails repository around an important performance topic: how many Puma threads a Rails application ought to run per-process by default. If that sounds like a mouthful, that’s because it is! Being your resident inspectors of all things Rails performance, scaling, and knob-dialing, this conversation perked up our ears! We followed along back when the conversation was ongoing, but now that its results are live in Rails 7.2, we think it’s a topic worth exploring, explaining, and understanding deeper.

And, just another quick bit of context — this conversation was had in public by many very smart folks in the Rails community. It’s totally viewable on Github and will forever remain so. That’s awesome! But these brainy-brawns got into some serious nitty-gritty details and technical concepts that are, perhaps, less approachable to the everyday-dev! While we recommend checking out the original thread for all the context and data-goodness, our goal with this article is to understand the idea and the changes in plain English!

Puma: A Review

We’re not going to dive too deep into the history here, but Puma is a multi-threaded web- (and application-) server that can also split and run multiple processes as well. For a typical deployment, you’ve got multiple servers/containers/dynos, each running multiple Puma processes, where each process is then running multiple threads. A picture is much easier to understand:

For the sake of this article and our brains, let’s just assume we’re working with a single container/dyno. The big idea with Puma is that you have two primary knobs to control and tailor your overall application performance: the number of processes running (Puma calls these ‘workers’), and the number of threads running per process (Puma calls these ‘threads’).

The first question is obviously, what’s the correct number of workers and threads?!

And if you’re a common Judoscale reader, you’ll know our answer is absolutely going to be “it depends!” But luckily in this case, there are some commonly agreed-upon guide-rails!

When it comes to workers (processes), community wisdom (and our blog) has long-held that you should run as many processes as you have CPU cores… and maybe a little more if you can get away with it (that is, if you have enough memory available)! If you’re running on a container/dyno with 2 CPU cores, run 2 Puma processes. Try 3 and see if that requires too much memory for your setup, but otherwise stick to 2. Etc. We won’t elaborate here, as the number of processes running isn’t the central point of the Rails Repo conversation we want to summarize.

When it comes to threads, for many years the community has essentially settled on the number 5 as the correct default for applications. So much so that even Heroku’s docs for deploying apps on Puma recommend running 5 threads! And indeed, it’s what Rails comes with out-of-the-box. Where our multiple-process knob (above) is mostly limited by memory, adding more threads to a process tends to be limited by the container’s CPU and your appetite for potential latency, not memory (adding threads does add a bit of memory, just much less than adding processes).

So let’s talk about those down-sides, starting with CPU saturation. On Heroku, CPU usage is aggregated and summarized as simply a “Dyno Load” metric. In short, your goal should be to never use so many threads (or processes, technically) that you exceed Heroku’s Dyno Load limit for whichever type of dynos you’re using (chart source):

Heroku will throttle your application if you exceed these limits and, we can confirm, there be dragons! It’s always a good idea to check on your dynos from time to time and ensure that you’re still under your Dyno Load limit.

But, more importantly, let’s talk about the second down-side to increasing your thread count: latency. Indeed, latency was the primary driver of the discussion around changing Rails’ default number of Puma threads! But let’s back up.

Enter: Latency

Let’s start with a reminder: Ruby is a single-threaded language. That means that, while Ruby threads are asynchronous, they are not concurrent. So, while Ruby can do two things “at once”, it can’t do them at the same exact time:

Any of the things Ruby is doing “at once” are in some partial state of progress but halted while Ruby switches to work on some other thing. This is similar to Javascript, human minds, and several other languages! Just for contrast, this is what a truly concurrent flow would look like:

But we live in a single-threaded land! So the next best question is, ‘how does Ruby decide when to switch work?’ That is a great question! The simplified answer is, once a thread starts waiting on I/O. That is, once some actively-running Ruby starts waiting for a response from a database query, a Redis lookup, an HTTP request, etc.; any time Ruby is no longer actively running instructions but is just waiting on some external thing, the Ruby interpreter will switch context to another thread that is ready to run some Ruby code! So, in reality, our diagram looks more like this:

And that feels nice, right? Ruby isn’t wasting time on threads that have nothing to do — it’s spreading out its code-crunching ability between threads that can actually use it! That is nice! Threads allow us to take advantage of Ruby’s single-threaded nature more efficiently when our application code calls external services. And boy does it! Almost every Rails request is going to make several (or many) database calls and perhaps some Redis calls. All of these various wait-moments are prime candidates for Ruby to work on something else in the meantime. This is great!

For contrast, think about what it would look like if we only had a single thread running. The waste!

So running a multi-threaded web server like Puma really does bring about some big efficiency gains.

But, of course, there are downsides. And they can mostly be summarized by this idea: what if the database call in our diagram actually finishes here (blue arrow)?

That is, not even half-way through the chunk of time that Ruby is working on Thread 2’s code? That’s the bad news — Ruby won’t come back to finish the work in Thread 1 until Thread 2 decides to wait on some I/O. Unfortunately, that means Thread 1 is now spending precious response-time doing… nothing. 😓

Now, in general, this tends to be a pretty rare case. In most Ruby code (especially Rails code), the chunks of actual Ruby code processing tend to be pretty thin, so Ruby is constantly swapping between threads and it’s rare to lose more than a couple of milliseconds in the overall workflow. It looks more like this (see the tiny blue slice):

But nonetheless, it can happen, and it’s exacerbated by running even more threads. When Ruby switches to a new thread there’s no guarantee it will switch back to the previous thread, even if it’s ready to process. Other threads might be ready too!

To illustrate this concept, let’s considering the following scenario. Here’s what it would look like if we ran four threads and they were concurrent-capable (again, Ruby is not). If each request was to process straight through, it’d look like this:

Looks pretty straightforward… but what happens when we bring this same request flow into the Ruby / single-threaded paradigm?

Now, I’ll be the first to say that Ruby’s scheduler is almost certainly better than my makeshift diagram-algorithm, and that my chunks of processing time certainly aren’t to scale, but you get the idea. The grand tradeoff here is that, instead of having to run four processes, each with a single thread (which would, in essence, accomplish the ‘Example Concurrent Flow’ above), we ran a single process with four threads. We spent less in overall server costs, but the response time per request rose. Not uniformly — some requests experienced more blue (wasted) time than others, but overall there will be some waiting between threads.

And this is the core premise of running a multi-threaded web server with a single-threaded language. You get to take advantage of time the Ruby interpreter would otherwise spend doing nothing in a single-thread web server, but occasionally that means one (or more) threads could be waiting for the interpreter to become available again. You save on capacity costs since you can handle more requests with fewer processes, but your overall response time will rise a bit, and your maximum response time could increase a lot!

Act Two: p95 Response Time

If you read through the Rails Repo conversation in depth, one thing you’ll see is that folks are consistently looking at and comparing both throughput and a p-number ‘latency’ between different thread counts. We discussed above how running multiple threads allows a single Ruby process to handle many request at once (-ish), thus increasing the throughput capability of a single Ruby process (nice!). We can measure that fairly easily — just monitor how many requests per second a Ruby process can handle in a benchmark!

But when it comes to the ‘latency’ side of the equation, we’re trying to get a grasp around what the overall impact to our system’s average response time will be. In the final chart above, did you notice that, while the single Ruby process handled all four requests, all four took longer than the purely-parallel model? Our average total response time increased!

This is what the percentile metrics are attempting to summarize for us. When we look at the 95th percentile, or p95, response time metric, we’re essentially observing the response time that all but the top 5% of our requests fell under. This gives us a fairly holistic picture of our system response times without the worst-offender (slowest) endpoints included. It’s the “almost all of our traffic is this number” sort of metric. And that’s helpful! We’ll always have a couple of rough edges. p95 helps remove those edges as distractions and keeps us focused on the rest of the system!

The lesson here is to ensure that when you change your thread count, ensure that you’re watching your p95 closely (as well as your other / typical metrics and dashboards).

Now: Optimize

So… the question remains: what’s the right number of threads to minimize latency impacts but maximize throughput / capacity gains?

Three! Or at least, that’s where the Rails repo discussion landed. But the benchmarks they ran and data they combed through to reach that conclusion was fascinating.

What’s more, running five threads — a default that’s been baked into the Rails community for several, several years — is actually not a great trade-off for most Rails apps! Running five threads tends to increase your average and p95 response time significantly more than you might think, at a benefit of only allowing a few more requests per second than three or four threads! That is, mostly slower requests for a very slight gain in how much throughput the Ruby process can handle! That’s not great.

Ultimately, Nate Berkopec ran a few benchmarks that yielded some fascinating results:

For 50% I/O wait apps, 3 threads in the threadpool gives us ~70% higher throughput for 1.3x the average latency (at that 1.7x throughput).

Increasing threads in the threadpool does not increase average or p99 latency when the server is not heavily utilized. Effectively, in the 0-80% “low utilization” regime, perf looks pretty similar.

At the same # of req/sec, increasing threadpool size doesn’t increase latency by a measurable amount.

When very highly utilized (95%+ in our benchmarks), higher threadcounts “fail harder” with higher latency and higher p99 (what your benchmark showed).

Higher I/O wait apps benefit from higher threadpool sizes.

With all that in mind, I think 3 threads represents an ideal compromise for the average 25-50% I/O wait Rails app on MRI.

So, the first piece of optimizing thread counts in your application is determining how much I/O waiting your app does, on average. Most Rails apps are indeed in that 25-50% band, but yours may be more or less tuned. Grab your favorite APM tool and start inspecting your requests. See how much time is spent (as a percentage), on average, in the database queries, Redis lookups, or external HTTP requests.

If you find yourself closer to 75%/85% of request time being spent in those services, you can likely increase your thread count since Ruby will be idle more often in those cases. Conversely, if you find that a very small amount of your request time is spent in I/O, you may actually want to reduce your thread count to ensure that your response time is snappy!

At the end of the day, a fascinating discussion full of data was laid before us in the Rails repo and we all got to benefit from the knowledge being shared! And, thanks to it, Rails now has a new Puma thread count default! 3 indeed. Three should support nearly as much throughput as five did while maintaining much lower and more stable response times across the board. That’s a win in our book!

The original thread is totally worth a deep-dive if the charts and diagrams here made sense, so we do recommend giving that a read. Otherwise, please feel free to reach out to us with any questions if you’re having trouble applying these concepts and/or simply want to boost your performance! The discussion around processes, threads, and performance DevOps is absolutely what Judoscale is here for.

Sidekiq (Infinitely) Iterable Jobs

Jon Sully — Mon, 12 Aug 2024 00:00:00 +0000

We wrote in our last post all about Sidekiq 7.3’s new feature, iterable jobs — how to use them, logical systems they may be useful in, and where to avoid using them… but we had another idea along the way.

Earlier this year we wrote a blog post called “How to Run Code (Safely) on Repeat Forever”, all about different patterns for executing repetitive code safely forever. This could be some kind of scanning job that you simply need to run every (for example) 10 seconds. It could be an import processor that runs every (example) 30 seconds. The specifics aren’t as important; the idea is simply that we wanted to illustrate a few strategies for safely repeating code forever, and compare how those strategies stack up.

Just for a brief recall, we ultimately illustrated three strategies.

The first was the self re-enqueuing background job. The simple idea here being that once your work is completed in a background job, that very same background job kicks off another copy of itself. This forms the infinite chain which should keep the logic running:

E.g., something like this:

# ~/app/jobs/cycle_job.rb

class CycleJob
  include Sidekiq::Worker

  def perform
    results = SomeDatabaseQuery.run
    aggregated_data = Aggregator.call(results)
    AggregatedStuff.insert(aggregated_data)

    CycleJob.perform_async
  end
end

And the tl;dr: for this strategy is that getting the timing just right (if you want to run exactly 10 seconds apart, not 12-13, etc.) can be challenging, random errors breaking the chain can be devastating, and there can even be odd occurrences where the chain forks and you end up with two chains going. We ran this setup for quite a while but ended up leaving it after many of these headaches. Here be dragons!

The second strategy was the (longly-titled) scheduled background job with global lock. The idea with this one is that you run an external (its own process/dyno/container) job scheduling system that’s responsible for kicking off (not executing!) jobs on a cyclical schedule. Then the job logic itself ensures a global lock on some resource to ensure that multiple copies of itself can’t run at the same time. Visually, that looks like this:

And the code might look something like this:

# ~/clock.rb

class Clock
  include SomeScheduleFramework

  every 2.seconds { CycleJob.perform_async }
  every 5.minutes { SomeOtherJob.perform_async }
end

# ~/app/jobs/cycle_job.rb

class CycleJob
  include Sidekiq::Worker

  def perform
    # Lock against other attempts
    return unless Rails.redis { |r| r.set "cycle-job-lock", "busy", nx: true }

    results = SomeDatabaseQuery.run
    aggregated_data = Aggregator.call(results)
    AggregatedStuff.insert(aggregated_data)

    # Unlock for next pass
    Rails.redis { |r| r.del redis_key }
  end
end

While this approach requires more infrastructure (a dependency on Redis and a dedicated Clock process), the tl;dr: on this one is that it’s pretty safe and reliable. This is the pattern we moved to after years of the self re-enqueuing job. Since we wrote the first article, we haven’t had any issues with this pattern. Neato! Not quite “throw money at the problem”, but close enough 😁

The third strategy we discussed in that article is what we dubbed the ‘forever-running rake task’. The idea here is that we skip using a background job system altogether and instead just do the cyclical work in a rake task that loops forever (with a delay of some sort):

The code here may be, for example:

# ~/lib/tasks/continuous.rb

namespace :continuous do
  task aggregate: :environment do
    loop do
      results = SomeDatabaseQuery.run
      aggregated_data = Aggregator.call(results)
      AggregatedStuff.insert(aggregated_data)

      sleep 5
    end
  end
end

While this strategy has some interesting tradeoffs in terms of sequential execution guarantees (which are doable with background jobs and global locks, but obviously trickier), the tl;dr: is that it’s extremely wasteful, resource wise, and that it doesn’t scale well. If you’re renting cloud server time to run this process, the fact that it sleeps for the majority of its lifecycle is truly just wasted money 💸. Additionally, if you have multiple different workloads that need to run continuously, you’d need to create multiple of these jobs and multiple new processes in your Procfile, thus spinning up multiple dynos/containers which will burn money over time. Not ideal!

And that was essentially the article. Three ways to run code on repeat forever; option two seemingly being the safest and best. That’s what we’ve been running our continuous code with since — four different jobs that run on ten second intervals and one that runs one a one second interval. That’s a lot of continuous code.

So while that’s all been working fine and we don’t have any need to change things, an idea popped into our minds while writing our last article. What if you set up a Sidekiq iterable job where the enumerable is simply infinity. Will that, in essence, run your code safely forever?

A picture is worth a thousand words, so let me illustrate with a code sample that ought to accomplish this hack:

class ProductImageChecker
  include Sidekiq::IterableJob

  def build_enumerator(*args, **kwargs)
    Enumerator.new do |yielder|
      loop do
        yielder.yield
      end
    end
  end

  def each_iteration(*args)
    # Same example as above; some repeating work example:

    results = SomeDatabaseQuery.run
    aggregated_data = Aggregator.call(results)
    AggregatedStuff.insert(aggregated_data)

    sleep 5 # slow down the iteration
  end
end

The each_iteration bit is pretty straightforward, but let’s talk about that Enumerator.new block! We know that build_enumerator needs to return some kind of enumerator, but rather than returning a set of records or other array-style object, we create a new Enumerator from scratch. Inside that definition (the do |yielder|) we setup an infinite loop (loop do). This essentially represents the content of the Enumerator we’re creating. Then, inside that loop, we’re simply yielding. In context, that means the Enumerator can be enumerated endlessly, and each time it’s enumerated it simply yields control to the caller. Or, in an even higher context, we get the equivalent of an infinitely long array without any objects inside! That includes the benefit of iterating forever, but also includes the benefit of not having to setup any objects or data for it! Neat.

If we use the same rubric from our last blog post to determine the pros and cons of this new pattern, we need to ask a few questions.

Does it satisfy the “run often” premise? Yes, most definitely. Sidekiq will churn away at the iterable job loop, under its own management, as fast as it can. So much so that we actually need to bake a speed limit into the iteration by adding a sleep call!

Now, you’d be correct if you thought back to the sleep call we added in the forever-running rake task strategy and how that lead to a conclusion of “it wastes resources!” On its nose, it might feel like the sleep call here will lead to wasted resources too. The one redeeming quality here is that, since this is happening in a Sidekiq job and Sidekiq is a multi-threaded job processing system, other threads will be able to work on other jobs while this job is sleeping! It’s not a total redemption, but it’s pretty close.

Does it satisfy the “don’t run multiple copies at a time” premise? Absolutely, and it’s not even our responsibility anymore! With the iterable jobs feature, Sidekiq itself now holds the responsibility for making sure these jobs run, continuing to run them as long as there are values left in the enumerator (yet to be processed), and ensuring that all of the work happens sequentially and not in parallel! We don’t have to write any of our own code for that in this new pattern.

This new pattern has a bit of magic elegance to it, too. The secret sauce is all in that build_enumerator method. Explaining that a bit, let’s first notice that we’re returning a custom enumerator from the method. This is the object that Sidekiq will ultimately call .each on to sequence through the objects in the enumerable. In our case, the magic is that we’re essentially spoofing a set of objects and instead just running an infinite loop. A perfect endless loop that spoofs Sidekiq into thinking that the set of objects never ends! Seamless.

Are there any downsides? Well, the interesting thing about this approach is that it’s broadly somewhat similar to the self re-enqueuing job pattern (#1 above) but it solves many of the issues there. By Sidekiq itself essentially taking responsibility for the self-re-enqueuing, we now have guarantees from Sidekiq around the chain never getting broken and staying in a proper sequence indefinitely. Those two things were actually the only issues we had with the self re-enqueuing pattern. Having them solved out-of-the-box is pretty incredible! So aside from the aforementioned sleep call efficiency (again, multi-threading makes this mostly a non-issue), we don’t foresee any downsides with this approach!

…Are we going to switch? As we mentioned, we’ve been running the clock-process approach (#2 above) for the better part of a year now with zero issues, and we are pretty happy with it. Now, it does require more stuff than this new approach — a clock process and encoded schedule for kicking off jobs — but once we set those things up we haven’t really thought about them since. This new pattern does feel simpler though, and that’s always an alluring goal. TBD for us! We may give it a shot with one of our repeating workloads in the coming weeks to see how it does. A test of sorts, if you will.

Anyway, that’s all for this post! Hopefully this new concept and idea sparks something in your mind and you find a neat application for it, too! We’d love to hear back any experiences you, dear readers, have with the iterable jobs setup! Shoot us an email! They go straight to our small team of (all human) devs and we personally read all of them.

Sidekiq Iterable Jobs: With Great Power....

Jon Sully — Thu, 8 Aug 2024 00:00:00 +0000

We’ve written about Sidekiq several times over the years — from planning your queue setup, to designing your architecture for the best possible scaling, to running jobs on repeat forever! This is both because we’re fans of Sidekiq (thanks Mike!) and believe in its capabilities and potentials. Sidekiq is an extremely powerful tool for Ruby applications.

So when Sidekiq announced that version 7.3 would include a new feature, “Iterable Jobs”, our interest was piqued! As a group of senior Rails developers with, collectively, forty+ years of experience, we already maintain some design ideas about how to break down large chunks of work with background jobs. Adding a new pattern for work-breakdown is interesting stuff!

We’d like to spend this article looking at this new feature, figuring out how it compares and contrasts to traditional computing designs, and maybe even give a few (early) recommendations about when to (and when not to!) use this new tool. Let’s dive in!

The Old Way: Parallel Division

For many years now we’ve built and suggested a design pattern we’ll describe here as “parallel division” — where we take a large heap of work and break it down to many ‘sub-jobs’ that can run in parallel. Rephrased, when we have a project or feature that needs to do some chunk of work on, or with, each record of a large set, we can break down that project to run an individual job for each record.

This is easier understood with an example and some tangible code. Let’s consider a library that wants to use an LLM to generate a textual summary of each book in their collection. In its simplest implementation, that job could look like this:

class GenerateBookSummariesJob < ApplicationJob # < ActiveJob
  def perform
    Book.all.each do |book|
      book.update! summary: FancyGpt.summarize_text(book.full_transcript)
    end
  end
end

And, while you ⚠️should not⚠️ call .each directly on your model classes (use .find_each instead!), this simple implementation would work and accomplish your project.

The issue here is that this single job is going to take a long time to run. Worse still, if it gets halted in the middle (be it an error or a dyno restart or any other number of events), it’ll restart from the beginning when you rerun it. Ouch.

The better route here is to break down the total work of this project into sub-jobs that each do one little piece of that work. The simple way to think about this is to have each record in your set run its own job, just for that record. This gives you two separate jobs — a parent ‘Batch’ job and a child sub-job:

class GenerateBookSummariesBatchJob < ApplicationJob # < ActiveJob
  # Simply kicks off a child job for each id in the set
  def perform
    sub_jobs = Book.without_summaries.ids.map { |bid| GenerateBookSummaryJob.new(bid) }

    ActiveJob.perform_all_later(sub_jobs)
  end
end

#

class GenerateBookSummaryJob < ApplicationJob # < ActiveJob
  # Does the work required on each record individually
  def perform(book_id)
    book = Book.find(book_id)

    book.update! summary: FancyGpt.summarize_text(book.full_transcript)
  end
end

Aside from decomposing into two separate jobs, we made a few extra subtle tweaks here. First, we’re using ActiveJob’s new(ish) perform_all_later API to bulk-enqueue all of the sub-jobs at once for added speed in the batch job. Second, we’re now using a scope, .without_summaries, to make our batch-job reentrant: we can safely re-run the batch job without worrying about all books getting new summaries made, should we need to, in the future.

But the major point here is the split of responsibilities and the flexibility we gain. The batch job’s entire role is to identify candidates and kick off a child-job for each — any books that don’t have a summary yet, in our case. The child-job’s role is simple too! It simply receives a book ID and generates a summary for that record.

The first advantage of this breakdown is that we’re completely impervious to dyno/system restarts. Should our systems crash, Heroku decide that right now is time for our daily restart, or we push a new deploy, our jobs are now ready for that Sidekiq shutdown-command. While the batch job should now run much faster (it’s doing far less work per iteration than before), it’s possible that the shutdown command may arrive while the batch job is still in progress. Since we added the aforementioned scope, this is fine. When Sidekiq boots back up, it’ll re-run the batch job and only grab records that don’t have a summary. This is far better than our initial implementation.

Things are even better for the sub-job when a Sidekiq shutdown/restart occurs. By default, Sidekiq waits up to 25 seconds for each thread to complete any work in-progress before outright killing it when shutting down. That’s good news for us since each sub-job is a very small chunk of work — an API call and a DB save! This essentially means that Sidekiq will have time to finish any in-progress sub-jobs when it shuts down and can continue chugging through the back-log of sub-jobs as soon as it boots back up!

Actually, the story gets even better when we start thinking about infrastructure and autoscaling with respect to these large batches of work. With a tool like Judoscale and our custom Sidekiq integration, we can automatically scale up to many more dynos/containers once we kick off a large number of jobs. This makes total sense visually, so allow me to steal an image from our Ultimate Guide to Autoscaling Heroku:

Put into context, that means our batch job can kick off a sub-job for each of the 150,000 books in our library collection, we can scale up to 100 dynos/containers to accommodate the massive backlog of jobs, and we can safely churn through all those jobs as fast as our LLM service’s API will let us (😉) without worrying about restarts or crashes. We can now execute the entire workload of our project in minutes instead of hours or days, as it would’ve been with the initial (single job) implementation!

Having a simplified mental model for the job responsibilities, a safer setup for restartability and idempotency, and an ultra-scalable job structure are the primary reasons we employ and encourage this pattern. It’s hard to beat!

So what’s the new pattern that Sidekiq 7.3 introduces?

The New Feature: Iterable Jobs

Mike Perham, the author of Sidekiq, announced iterable jobs with a dedicated post on his own blog, and he opened with a good question:

What happens if you have a job which processes a large amount of data serially, the infamous long-running job?

He then introduces the idea of Sidekiq iterable jobs: a setup where Sidekiq now understands that you have a large sequence of jobs to work through and Sidekiq itself maintains a cursor to keep track of what’s done and what isn’t as it breaks down the sequence into sub-jobs for you. That’s a mouthful! Let’s see the code samples:

class ProductImageChecker
  include Sidekiq::IterableJob

  def build_enumerator(*args, cursor:)
    active_record_records_enumerator(Product.all, cursor: cursor)
  end

  def each_iteration(item, *args)
    item.check_image
  end

  def on_complete
    logger.info { "Finished checking product images!" }
  end
end

Let’s look at each of these methods, but let’s go from the bottom up. At the end we see that we have support for lifecycle-style callback methods — on_complete being shown here (on_start, on_resume, on_stop also being supported!). Seems like a handy feature! Our Parallel Division approach above doesn’t actually have the means to do this since the batch job doesn’t maintain any awareness about the sub-jobs once they’re fired off. Neat!

👀 Note

Just FYI, you can get lifecycle-style callbacks with the batch decomposition / parallel division pattern if you use Sidekiq Pro and implement its Batch feature. All of the Pro features are great if you have the budget for it, and supporting the long history of Sidekiq’s open-source contributions is also awesome.

Alright, moving up the class we find each_iteration(item, *args). This will be where we write the code for the actual work that happens on/with each record in the set. If we continue to think about our library example, this would be where we execute item.update! summary: FancyGpt.summarize_text(item.full_transcript). The simplicity here is pretty nice. Having everything else hidden away behind the scenes makes this a friendly API!

Last, we have build_enumerator. Building custom enumerators isn’t exactly something we think of as a fun time, but we’re in luck: Sidekiq ships with several enumerator-helpers out of the box. We can see one in use here: active_record_records_enumerator. Really the only thing about this method that we’d change for our library example is the Product.all argument being passed into the enumerator helper. We’d simply change that to Book.without_summaries as so:

  def build_enumerator(*args, cursor:)
   active_record_records_enumerator(Book.without_summaries, cursor: cursor)
  end

So, even though everything around it looks a little complicated and unknown, the only piece of this method that really matters to us is the first argument of the helper. That’s where we define what the set of records that needs work done will be. Everything is fine to copy and paste!

See the simplicity here? All we have to do is setup a Sidekiq::IterableJob, copy the build_enumerator method and change the first argument, then set up an each_iteration method where we tell it what to do for each record. With that we’re granted shutdown-safety, item-by-item processing, and less responsibility for maintaining custom code! Very cool. And its all in a single job, not two!

An Example

There are a couple of key details with iterable jobs that we want to tease out here. There are definitely times when using this new feature is the right answer for the work that needs to be done, but there are definitely times when this feature is the wrong answer too.

The first piece of this puzzle is a detail Mike wrote in his initial post:

Iteration allows you to decompose some work into a sequence of steps, but which still execute serially as a single job.

And this quote is probably the most pivotal thought in the whole feature. Iterable jobs allow you to do some work as a serial execution (as in, one-after-the-other only), safely. This is wholly different from parallel division, where we want to do all the work at once (in parallel; all at-the-same-time).

The best way to describe the difference is an example. While Mike’s blog post used an example of checking an image on each product in a system, that might be confusing — checking product images is more likely an example of work that should be decomposed with parallel division! There’s likely no reason we can’t check all of the product images in parallel.

To find a fitting example for iterable jobs, we need to think about a system that really does need to execute one record/object at-a-time; where the sequence is part of the requirement.

To illustrate this, let’s consider a bank. More particularly, let’s think about a single account at that bank. In its simplest abstraction, each account is a ledger; and each ledger essentially looks like this:

Date	Description	Trans	Balance
07/03/2024	Opened Account	+500.00	500.00
07/08/2024	Paid John for XyZ	-55.00	445.00
07/10/2024	Paycheck from work	+500.00	945.00
07/17/2024	Kroger grocery	-122.88	PENDING
07/17/2024	Shell Gasoline	-42.22	PENDING
	etc

The subtle detail about a bank ledger is that each transaction needs to be processed in sequence for historical correctness. This is true for many background processes at the bank, but in the most plain sense, it’s true because we need to show what the “Balance” was at the point in time that the transaction was processed.

We can see a couple of pending transactions in our example ledger — these haven’t been cleared, finalized, and processed yet. Let’s assume we’ll process those overnight with a job we’ll write here.

Considering all of the discussion around job patterns above, let’s think about how we might write this job. We’ll have an Account record that should be processed (this ledger), and we’ll need to process each pending charge on the account, but we have some options on how we might do it.

Let’s say that we begin by using the parallel division method — maybe something like this:

class AccountBatchJob < ApplicationJob # < ActiveJob
  def perform(account_id)
    account = Account.find(account_id)

    account.pending_transactions.find_each do |tns|
      PendingTransactionJob.perform_async(tns.id)
    end
  end
end

#

class PendingTransactionJob < ApplicationJob # < ActiveJob
  def perform(transaction_id)
    transaction = Transaction.find(transaction_id)

    # assume Transaction has_one :prior_transaction
    previous_balance = transaction.prior_transaction.balance_after

    transaction.update!(
      cleared: true,
      balance_after: previous_balance + transaction.amount
    )
  end
end

At first glance, this may seem fine. We kick off a sub-job for each pending transaction on the account and each sub-job determines the balance-after for that transaction.

Put into context, that means the batch job would kick off a sub-job for the “Kroger grocery” transaction and the “Shell Gasoline” transaction. Our job processor would then pick up the sub-job for the “Kroger grocery” transaction, determine its balance-after (from the already-cleared “Paycheck from work” transaction), save the data, and move on. It would then grab the sub-job for the “Shell Gasoline” transaction, determine its balance-after (from the just-finished “Kroger grocery” transaction), save the data, and move on. Everything works! Everything’s processed. Both records were cleared and had their balance-after set correctly.

Except that’s not how it would happen.

Our batch job would indeed kick off two sub-jobs, one for each pending transaction. But we can’t guarantee the order in which those sub-jobs will actually execute. What happens if the sub-job for the “Shell Gasoline” transaction runs first — before the “Kroger grocery” transaction is processed? How will it determine the balance-after when the prior_transaction.balance_after isn’t set yet? It can’t! We’ve got a problem.

Pure parallel-division decomposition doesn’t work in systems that require sequential execution. Parallel decomposition has no guarantees around the order in which sub-jobs are executed, and actually intends for them to be executed in parallel, with no knowledge of each other. As such, parallel division is the wrong solution for our bank-ledger problem.

This, however, is exactly where iterable jobs do solve the problem. This goes back to the “serially” bit of Mike’s blog post:

Iteration allows you to decompose some work into a sequence of steps, but which still execute serially as a single job.

In this case, we might draft up our iterable job in this way:

class BalanceAccountJob
  include Sidekiq::IterableJob

  def build_enumerator(account_id, *args, cursor:)
    account = Account.find(account_id)

    active_record_records_enumerator(account.pending_transactions, cursor: cursor)
  end

  def each_iteration(transaction, *args)
    # assume Transaction has_one :prior_transaction
    previous_balance = transaction.prior_transaction.balance_after

    transaction.update!(
      cleared: true,
      balance_after: previous_balance + transaction.amount
    )
  end

  def on_complete
    logger.info { "Account ledger is processed!" }
  end
end

While the decomposition of work looks similar here, it’s Sidekiq’s guarantee around iterable jobs being executed sequentially (one iteration at-a-time, in order) that provides the power here. Because of that promise, we can be certain that when each_iteration runs, the prior record has completed its own each_iteration pass. So we can be sure that transaction.prior_transaction.balance_after actually exists! This is the right solution for this problem.

Some Differences

What we hoped to highlight with that example is the difference between breaking down a large chunk of work into small chunks that can be executed in parallel (they don’t depend on each other) and a large chunk of work that can be broken down but where the small chunks need to execute in order (they do depend on one another). That is the key difference between batching and iteration.

And, it’s worth noting explicitly: to use one when you really ought to use the other is likely going to be unpleasant! So it’s very much worth taking the time to deliberately determine which pattern you need to implement for your given problem.

It’s also worth saying that, in the grand scheme of web computing, having the need for sequence-dependent work is fairly rare! It certainly is the right solution for logical cases where you need one-by-one, sequential processing, but those cases tend to be uncommon. We recommend being cautious when determining if the iterable job is right for your need!

Another contrast to keep in mind is how both of these patterns can scale to accommodate the work. This might seem obvious, but we want to highlight it anyway — when you need to do work in a one-by-one fashion, you can’t scale up more dynos/containers and expect the work to get done faster. It doesn’t matter how much extra capacity you have; capacity isn’t the bottleneck. When it’s sequential, the work will only execute as fast as a single processor can handle. This is the opposite of parallel division: scale up to the moon and finish everything in record time! Again, this may seem obvious, but it’s important to keep in mind. The only way to ‘speed up’ the overall runtime of sequential / iterable jobs is to use a faster CPU for the single core that’s executing the sequence of work (and often these gains are marginal — modern CPUs are all very fast). There’s really no way to leverage autoscaling for enhanced performance in the iterable/sequential job world.

Why Not Both?

There may be interesting use-cases where we can actually combine both patterns to accomplish a greater system goal. Our bank example actually represents this concept very well. It goes like this: while processing one singular account may need to be sequential, and thus an iterable job, the bank needs to process all accounts every evening. Since processing one account doesn’t depend on any other account being already-processed, we can use a batch job and parallel-division to kick off iterable jobs for each account. The iterable jobs will still operate in their correct, sequential way, but now we’ve parallelized the processing of each account!

This can be extremely powerful — we gain the benefits of autoscaling thanks to the parallel division and the promises of sequential processing thanks to the iterable sub-jobs.

This pattern (combing parallel division and sequential iteration) requires a pretty specific use-case and likely isn’t applicable to most businesses and organizations, but it’s a neat idea that would be fun to implement.

Wrap Up

Okay, let’s circle back on a few things.

First, if you simply have a large chunk of work to get done and you want to break it down in background jobs to accomplish it quicker (with stability), use parallel division to give each record/object its own job. Then, after you’ve designed and set that up, prepare your autoscaling to spin up several more dynos/containers so that your workers crunch through all those jobs quickly!

Second, if you discover along the way that your parallel division actually breaks down since processing one object/record depends on another in the set having already been processed (an intra-set dependency), consider thoughtfully if there may be another way around. If not, you may need to reach for iterable jobs. There’s a speed cost here (a sequential job setup is almost certainly going to take longer, end-to-end, than a parallel division approach), but if that’s what your system requires to compute its data, so be it!

Third, take a moment to determine if you can leverage both patterns to get some of the best of both worlds. That could mean the example described above, where each record/object has a requirement for sequential processing internally but each record/object can be processed in parallel to other records. That could mean cleverly determining how much of the processing needs to be sequential and what other chunks could be parallel-divided. Only you can say! It’s worth taking a bit of time to determine the maximum amount of work you can do in a parallel-division style due to its performance benefits.

Follow those steps and you’ll be on your way!

Ultimately, we wanted to write this article to shed some light on this new feature, iterable jobs, and the various cases it may be helpful in. And, of course, the various cases it won’t be helpful in. Iterable jobs are a powerful new tool in our tool-belt and should be wielded with care — like any tool, they’re not the right solution for every job… but you’ll be thankful to have it when the particular need arises. We hope this article gave you some clarity around this new feature and sparked some ideas within you about how to leverage it!

👀 Note

Interested in other use-cases for Sidekiq Iterable Jobs? We published a follow-up to this post which introduces a novel idea: infinitely iterable jobs. Check it out here!

How to Roll Your Own Autoscaling

Jon Sully — Sat, 3 Aug 2024 00:00:00 +0000

Alright, we understand that this may seem like a counterintuitive article for us to write since Judoscale is itself (the best!) autoscaling solution available, but there’s two important factors at play here. First, Judoscale doesn’t work everywhere or with everything — we support autoscaling on Heroku, Render, and Amazon ECS with response time or, more importantly and primarily, queue time. But that’s not everyone’s cup of tea! Second, we’re a tiny team of devs that earnestly want to help people. If you‘re set on rolling your own autoscaling, we’d love to empower you to do so by sharing some of our experience and a few tips. And we’re here if you have questions. We’re happy to help, even if you’re not a customer!

Now, that we’ve got that out of the way, let’s take some time to seriously look at all the components required to roll your own autoscaling.

What We’re After

Figuring out what we actually want is probably a good place to start. For the sake of this entire post, let’s presume we have some application — call it Zephyr. We’ll say that Zephyr is a typical Ruby on Rails application running on Amazon ECS and has both web processes and background job processes. Let’s also assume Zephyr is some kind of US-centric e-commerce application (perhaps selling consumer-grade blimps, zeppelins, and other dirigibles 😜).

Zephyr is currently over-provisioned: running far more tasks for both its web processes and worker processes than it actually needs to service all of the traffic it receives. Ergo, Zephyr is burning money 💸! At least for the money spent, Zephyr is stable. Better to burn money than fail to service (potential-customer-) requests, right?

The ideal state is that Zephyr only utilizes the number of tasks/resources it really needs to suitably serve all of the requests its receives, and that as traffic levels change through the days, more or less tasks are spun up/down to accommodate the need. We want to have our cake and eat it, too — a fast, reliable system for low-cost! The good news? We can do it.

What We’re Watching

We briefly linked to another article above, but it’s worth having a brief discussion about metrics here. It’s perhaps the most core question of autoscaling in general — how do you know when you ought to scale up or scale down? What value or metric can we observe to make scale changes by? How do we know that this value is reliable? What math proves that this value is stable and correctly models capacity needs?

We won’t dive into all those questions here since we have a post that covers all of those details in depth — “Understanding Queue Time: The Metric that Matters”. We very much recommend giving that post a thorough read if you don’t yet have a firm grasp on how queue time is the oracle of all things scaling!

But we need to keep in mind that we’ve got two separate processes and systems in play here. We have web processes and background job processes — web requests and background jobs. So we’re going to look at different things for each: our request queue time on the web side, and our job queue time for the background side. That is, how long any given web request is waiting before being serviced by one of our web processes and how long any given background job is waiting before being serviced by one of our background job processes. That’s a mouthful!

What We’re Calling

Let’s step sideways for a moment and consider the scaling action itself. Sure, we’re going to build a system that watches Zephyr’s queue times, but then what? We shouldn’t gloss over the “scale Zephyr up or down” step, as it’s not always that simple! In fact, it’s not always possible. Unless the platform you’re hosting on has an API or other programmatic solution that we can call from our autoscaling-system, we won’t have a means to tell that platform, “Hey scale Zephyr up by +1!” And if we can’t do that, we can’t autoscale.

So before we dive any deeper, if you’re currently considering rolling your own autoscaler for your application, check with your hosting platform and make sure that you can, in some way, programmatically change the scale of your application. If you can’t… we might recommend switching providers! Otherwise you may be stuck with any autoscaling they offer directly as a platform (which won’t be queue-time based, and likely won’t be great) or no autoscaling at all (💸💸).

What We’re Architecting

How we go about building this system becomes an important factor in this discussion thanks to an alluring idea: “what if we just build autoscaling into the application itself? It can scale itself!” ⚠️ Don’t do that!

While that would be an easier implementation, it can catastrophically fail in the worst moments 😬. An application that handles its own self-autoscaling can fall prey to negative feedback loops! Meaning that, as queue time rises and the application slows down, the logic to increase scale also slows down. As the application gets slower, the up-scales it so-desperately needs fail to execute, and the problem compounds.

Of course, this huge red flag means a bigger overall architecture: we have to externalize autoscaling away from Zephyr itself. It needs to be a separate system, specifically dedicated to autoscaling our app — separate processes, a separate deployment, and likely a separate code-base.

Let’s pause here and describe what we’re going to try to build, then. We need:

A separated, isolated system for managing Zephyr’s scale
A means of observing and keeping a record of Zephyr’s web-request queue time and background-job queue times
Some kind of setup for (carefully!) executing scale-changes for Zephyr on its hosting platform

Let’s dive in to some of the nitty-gritty here. We’re going to start with point 2 above.

The Client

Here at Judoscale, we call our open-source, installable packages “adapters” or “clients” (or just “packages”). Regardless of how we want to name them, they’re essentially client-packages for collecting and sending metrics back to the Judoscale servers. We’re going to need the same thing here!

In short, since we’re creating a stand-alone system that isn’t built into Zephyr directly, we’ll need some kind of code running in Zephyr that sends queue-time metrics over to our autoscaler.

But things get a little bit tricky here for performance reasons. Obviously our goal is to ensure that Zephyr is scalable for performance and traffic reasons, but if our client-package slows down Zephyr in any way, we may be at a paradox! Have we really gained anything if Zephyr can autoscale but each individual request is actually slower? Or do we just have a system that consumes more resources at the end of the day for the same net result? 💸

We must do everything we can to ensure that our client-code doesn’t slow down our actual web requests (or background jobs)! How do we do that? We spawn.

Judoscale’s client packages are a bit complicated as we need to support several different frameworks and libraries at the same time and automatically detect which Judoscale clients are running (e.g. the judoscale-rails and judoscale-sidekiq packages are separate but need to just-work© when both are present) — but let me pull back the curtains a bit.

At a high level, Judoscale’s clients spawn a new, separate thread which is responsible for collecting metrics and sending them back to the Judoscale server. We call this thread the ‘reporter’ and we’ve spent a lot of time optimizing it to use as few processor cycles as possible while doing its job.

We begin to add some complexity when we consider that many web/application servers these days are both multi-process and multi-threaded within each process! And, of course, we may have multiple tasks running on our hosting provider. We need to be very precise about where, when, and how, we setup our reporter process, then. We need a reporter for each process that’s handling web-requests, and we need each thread that handles requests to push individual metrics to that process’ reporter.

This is easier to visualize:

And let’s now add that last bit of logic we discussed: a background reporter (thread) for each process that’s handling web-requests and something in each thread to push individual metrics to the reporter in the background.

With all of that in place, the workflow ought to go something like this: when the Rails application process boots, before it forks separate processes (we do this in a Railtie), we spawn the background reporter thread. This thread will live as long as the Rails process (web or background job) does!

In the meantime, we wire up a small hook that fires for each web-request. For Rails, this is Rack middleware. Its job is simple and efficient: right before Rails begins processing any incoming request, this middleware grabs the timestamp of the request and calculates the amount of time that request was waiting to be processed (this is its queue-time!). In essence, this is the difference between the time denoted in the X_REQUEST_START header and now() — the moment when the middleware runs. Finally, the middleware shoves the queue time data into a small memory store for the reporter to later read.

Remember, the middleware hook needs to be as minimal as possible and should use the most efficient Ruby calls and math as possible. This middleware ought to fully execute in less than one millisecond for any request. Every CPU cycle counts!

The last step in the flow here is the reporter’s reporting. The reporter periodically (we use 10-ish seconds) flushes all the metrics from the in-memory store and POSTs them up to Judoscale.

And that’s the bones of the client! A background reporter for each process, a middleware hook that extremely-efficiently grabs the queue time for each request from each thread, and a period POST to deliver those metrics up to the autoscaler.

If you’re tackling this yourself, make sure that you write some specs around memory leaks, launch failures, and graceful crashing. Trust us, you don’t want your client package to be responsible for killing an application! An application with zero tasks running is way worse than one that can’t autoscale for a while. 😅

The Autoscaler

So you’ve got metrics coming in from each of your processes on each of your tasks, but what to do with them? There are a few things we need to cover in this section: aggregation, statistics, scanning, and storage.

Aggregation

Remember that what’s getting POSTed to our autoscaler periodically is raw queue-time metrics with timestamps. This may be as simple as:

2024-08-01T14:08:31Z - 1ms
2024-08-01T14:03:57Z - 3ms
2024-08-01T13:57:06Z - 2ms
(..a few thousand of these..)

Which isn’t inherently useful to our broader goal yet. We can’t usefully scale Zephyr based on individual metrics; we need to aggregate them. But how do we aggregate them?

Our first several years at Judoscale we actually used a fully custom setup. Essentially we created our own aggregate-level records in dedicated Postgres tables and ran a fleet of background jobs, all operating at different layers, to aggregate individual metrics into roll-up metrics, then into time-span metrics that were well-indexed.

We don’t recommend that 🙂. There were too many layers, too much complexity to keep in our minds at once, and it had a slight NIH smell to it. Aggregating data over time is a well-solved problem these days!

That’s why we switched to a Timescale-based backend a few years back (when we migrated from Rails Autoscale to Judoscale!). Timescale is essentially just a Postgres database plus time-series data-handling and aggregation — batteries included! At its core, Timescale essentially does all of our previous complicated Sidekiq aggregation jobs for us, but down in the DB layer directly for much better performance and stability.

Timescale allows us to throw metrics into ‘raw data’ tables and query roll-up tables directly without having to worry about any of the plumbing or machinery that does the actual aggregation. Put in individual metrics; query “okay what was the average queue time over the last 30 seconds across XYZ tasks”. It’s a neat platform.

So that covers aggregation tooling, but let’s talk about statistics.

Statistics

As with any data aggregation, we must pick an algorithm for how to aggregate. Are we after average queue time? 95th percentile queue time? Maximum queue time? Each of these will present slightly different numbers. The most important thing here is considering how each of these different algorithms will represent how Zephyr is performing in real-time.

As any application handles requests over time, it’s important to remember that not all endpoints are created equal. Any typical app is going to have a wide range of fast and slow paths, controllers, and connections. While this doesn’t directly impact queue-time, it certainly can indirectly. A given web process handling one of the slower endpoints can get stuck for a short time and cause other requests headed to that same process to be queued (wait) until the slow request is completed. This is essentially a localized, short-term backup.

All that to say, there’s a natural variance with queue times in web applications. You shouldn’t expect to have a super-steady queue time unless you have an extremely optimized application or you’re very over-provisioned.

This rules out using maximum as our aggregation algorithm. This would essentially cause our application to scale up virtually any time our slower endpoints get traffic. Our queue time aggregate would be inflated according to our slowest endpoints! What’s worse, we almost certainly wouldn’t need to scale up at those points. The application will recover and continue on without issue once the slow request(s) are finished — we don’t need to spin up more machines!

That leaves the 95th percentile (p95) and average as good candidates for our aggregate algorithm. And luckily we have experience with both! We aggregated on p95 for several years then switched to average a couple years back, and we have some opinions. We need to talk about the other major component of statistics in play here first, though: bucket size.

The very premise of an aggregation algorithm is that it condenses information from a wide range of values over a large period of time to a smaller (single) value representing that period of time. When we think about queue time and autoscaling, this means aggregation condenses our mass of datapoints over a bucket-time into a single aggregate value for that bucket. E.g. “our average queue time was 12ms over the last sixty seconds.” But a critical question here is how large to make the bucket — sixty seconds? Two minutes? Thirty seconds? Five seconds!?

It’s a trade-off. Going with a smaller bucket means that your app can have more responsive autoscaling — your queue time suddenly spiking will be far more impactful to a ten-second rolling bucket than a two-minute rolling bucket — but be cautious. Using too small of a bucket, along with too sensitive an algorithm, can make for an autoscaling system that’s altogether too touchy.

To drive this point home, we’ve found over the years that you need a significantly larger bucket when running a p95 aggregate than when running an average. And that’s due to the same logic mentioned above. When we used p95, we ran a two-minute rolling bucket. p95 is a powerful aggregate and filters out all but the 5% most outlying data; using a bucket any smaller than two minutes simply didn’t have enough data to give a good representation of the app’s queue time — any smaller of a bucket and it behaved a lot more like maximum!

For contrast, since we switched to using an average aggregate, we now maintain a ten second rolling aggregate! Since the average aggregate works across every data point instead of only the maximum 5%, there’s plenty of data in even a ten-second window to get a fair representation of the app’s general queue time. And since we’re using a ten-second window, the autoscaling is super responsive! Any major change to traffic or capacity that would require an up-scale can be detected in only a few seconds!

So with both bucket-size and algorithm in mind, here’s our suggestion: stick with an average aggregate and use a smaller bucket size for the aggregation — ten to thirty seconds is ideal. This will grant you the benefits of both autoscale responsiveness and an autoscaler that’s not too touchy, thanks to average including all data-points. Plus, it’s cognitively simpler!

Scanning

We’re calling this section ‘scanning’, but in some sense it’s really about timing. And we’re not talking about ensuring that the raw data is ingested into a time-series data structure with the correct timestamps — we rely on Timescale to ensure that the records are in correct time! We’re talking about the timing of the machinery that actually ‘watches’ your application.

We just spent several paragraphs above discussing aggregation algorithms and bucket sizes, but once you make that decision, that should all run automatically in your time-series system. What doesn’t happen in that system is the autoscaler’s logic for “constantly check if Zephyr is beyond its thresholds”. The time-series system aggregates the data into a more convenient table, but it’s up to the autoscaler to actually read that table and determine if scale changes need to be made!

And read that table, you must. But we’ve got to be careful with timing here. First, we need to be acutely aware of our time-series system’s own timing. We might have a ten-second aggregation going on, but we need to consider the timing between how the data comes in, the range of possible timestamps of the incoming data, and how far back the time-series system will aggregate. If that sounds complicated, it’s unfortunately because it is. Ultimately, we need to be highly confident that whatever time range we query the time-series data for is fully aggregated with all possible data — we don’t want to query the time-series data for a range that isn’t fully processed yet or is missing data. Autoscaling with an incomplete picture of application health is a big no-no.

So, while this is a little hand-wavy, we’ll need to setup a background job system that checks the aggregate Zephyr metrics periodically to determine whether Zephyr needs to be scaled up or scaled down. In essence, that means we’ll likely setup a job that runs every fifteen seconds (forever!), whose code simply queries the time-series data and determines if the average queue time (across a ten-second window) is too high, just right, or even low. This is what we call ‘scanning’. It’s the constant checking of queue-time for an application. At Judoscale we have highly tuned queries that check the queue time of all of our client applications at once (✨), but in your autoscaler it’ll be much simpler.

This can be over-thought, but at its core it’s a simple idea: every fifteen seconds, query the time-series data and make sure Zephyr’s average queue time is within normal range. If it’s not, scale up or down accordingly!

Storage

The final critical component of your autoscaler is storage and monitoring. The last thing you want is a system that will ingest all of your raw data, store it into a finely tuned time-series data structure, scan it for outliers constantly, and even execute up-scales and down-scales on Zephyr’s host… but not have any means of seeing what’s going on!

There are two aspects of this section that we need to cover: storing the data, and viewing the data (monitoring). Luckily for the ‘storage’ side, we’re already storing all the queue-time data. Now, we’ll want to setup a retention policy on that data so our database doesn’t grow infinitely, but the time-series data is there for the reading!

But we do need to store some other data, too — the scale events themselves! Any time the autoscaler kicks off an up-scale or down-scale for Zephyr, we’ll want to store that event. Both as a safe-history moment, but also because we’ll need to see that event in our monitoring to understand system health better.

Let’s talk monitoring and visualization, then. The reality is that, even with the best autoscaler system running seamlessly and smooth, your human team will still want a way to open that system, see (visually) how Zephyr’s queue-time health looks over time, see the autoscaling history, and make some changes. The autoscaler needs a UI. And not just any UI, but a UI that renders time-series data charts, scale graphs, and scale-change events that correlate to both! If you just gulped, trust us, we understand. This UI took a long time to build and dial in:

But we submit it as a guide for the data your team will want to see over time. A selectable time-range, the queue-time data for that range, and the scale (and scale changes) throughout that time. We’ll leave the implementation as an exercise for the reader, but we highly recommend spending some quality time on this interface. It’s not the step you’ll appreciate skipping if something goes wrong in the future 😬. The critical question you’ll need to be able to answer in a past-tense context is, “Why did we autoscale here?” That’s the question this UI seeks to provide an answer to!

The Platform

We talked early on in this article about needing to run Zephyr on a host that supports programmatic scale changes — that if we can’t change Zephyr’s scale via some API call, we can’t autoscale at all. So we’ll assume here that we’ve got that figured out. What’s more important to discuss in this section is the great responsibility that comes with that great power.

It’s easy to jump to the simple statement of, “we scanned the data and it looks like we need to upscale, just fire the API request!” But we’d urge caution there. While idempotency / writing reentrant code is a popular topic for several areas of programming, it’s one that bares special consideration when writing code that automates scale changes. What ought happen if your API call to increase your scale count fails? Do you re-run the API call? What if that increase scale by more than you previously thought? What even is the current scale of the application, and how can you be sure of that?

These are all pertinent questions that are worth thinking through on your way to building your autoscaler, but we’ve got some tips here too.

First, your platform API calls will fail. Be it random, sporadic network failures from time to time, the platform having an API outage (which happens more often than you might think), or simply application hiccups on your end. You need to count on the fact that the API call to “add another task to the service” will fail, and will do so in varying ways. It may fail but still have increased the scale; it may fail without changing scale at all; it may respond with a success code but not actually apply the scale change. It’s a black box that you don’t have control over. What we recommend here is double-checking the current scale both before and after you execute a scale-change request. This ensures confidence that, first, the scale-change you’re about to execute is still valid (we didn’t already fire up a new task), and second, that the scale change request succeeded in its intention.

Second, timing is hard and writing reentrant code that mutates records often requires locking. You’ll want to be locking. We take advantage of pessimistic, row-level record locking via our database to ensure that while we’re attempting to scale up or down an application, no other resource can attempt to do the same. The last thing you want is multiple copies of background jobs all trying to scale up Zephyr at the same time and suddenly have far more tasks running than you intended 😬.

Finally, the last piece of the platform discussion we want to touch on is noisy-neighbors. Depending on which platform and which specific hardware tier Zephyr runs on, you may be sharing hardware with other applications. Any hosting platform that shares hardware across applications is challenged with spreading those resources around equally. It’s a very difficult program to solve programmatically! Unfortunately, it virtually cannot be done perfectly, and some degree of noisy-neighbor syndrome is likely to occur. We’ve written about Heroku’s Noisy Neighbors in one of our other blog posts, “How to Fix Heroku’s Noisy Neighbors”, where we introduce the idea of “sniping”. We bring that to your attention here because you may want to consider doing the same thing. If you’re hosting on shared hardware and have the means or API access to restart individual tasks (especially on a different container host), you may want to do that. We’ll likely write a future article on the topic, but we would stress that the impact a single misbehaving / errant task can have on your overall application health is larger than expected. A noisy neighbor (resource hog) is a big deal!

Roll Your Own!

And there we have it! The three major components that we think are required to roll your own autoscaler, with some batteries-included opinions about the best way to get there. We recommend:

Watching application queue-time closely using an as-efficient-as-possible client adapter
Aggregating datapoints into useful metrics using a powerful time-series data structure
Visualizing and monitoring aggregate data for spikes and threshold breaches to kick off scaling
Layering safety, locking, and re-checking all over your platform “change scale!” calls, and keeping a lookout for noisy neighbors

Each of these pieces is a major slice of the autoscaler pie. It’s worth really spending some time to ensure that you’ve gotten each one right! Ample testing, specs, and validations are worthwhile when the risk of a system failure is either lots of money spent on over-provisioning or loss of company revenue by not being able to serve requests (lack of capacity)! Yikes!

If rolling your own autoscaling is in the cards for your team, we fully believe you can safely do it. We hope that our thoughts and advice here helps guide that process and offers some unique tips that could be overlooked until you’re waist-deep in the autoscaling waters 🌊.

Or.. Maybe Not

Of course, though we need not remind you given that you’re reading this article on our website, if you don’t want to roll your own autoscaler, Judoscale is here. We’re coming up on 10 years of autoscaling experience, iteration, and design tweaks to perfectly dial in each of these systems (and many others!). While autoscaling isn’t something we recommend not building yourself (live your best life!), we do recommend giving Judoscale a shot before you dive into that world.

Judoscale’s out-of-the-box adapter packages (for Ruby, Python, and NodeJS) enable you to fire up queue-time autoscaling as soon as they’re deployed, and we even offer response-time autoscaling out-of-the-box for all Heroku applications — no adapter required! The idea is to get you autoscaling quickly, easily, and without having to worry about all of the complexity in this article.

Whichever direction you choose, we’re here if you need a hand! Just grab some time on our calendar or give us a shout. We’d be happy to help.

Maximizing Performance with Judoscale: Setting Sensitivities

Jon Sully — Sun, 7 Jul 2024 00:00:00 +0000

Alright, let’s dive into this third chapter of Maximizing Performance with Judoscale: Setting Sensitivities. And that title works both ways! We’re both setting (the verb) the sensitivities for how Judoscale scales your application, but that is itself the setting (noun) sensitivities 😉. Let’s make some jumps, talk about some frequencies, and pause for a few delays…

Prepare your snorkels! 🤿

Maximizing Performance with Judoscale is a series that covers all of Judoscale’s features and options. Jump to other posts here:

Target Queue Time Range
Scheduling your Scaling
Setting Sensitivities (this page!)

👀 Note

Just FYI, we’re going to use the term “dyno” or “dynos” in most places here; this is Heroku’s word to represent an application container instance. Render calls these “Services” while Amazon ECS calls them “Tasks” wrapped up under a “Service”. All of the theory in this article is just the same between all three, “dyno” just remains our default phrasing.

Sensitivity

This article is all about the last group of settings available in Judoscale’s control panel UI: “Upscale Jumps”, “Upscale Frequency”, and “Downscale Delay”, and how this powerful trio works together with your Target Queue Time Range and Scaling Schedule to give you the most flexible and tailored autoscaling possible.

Upscale Jumps

Simply put, this setting controls how many dynos your application will scale up at-a-time. For example, when set to 4, Judoscale will add 4 dynos to your app’s current scale the moment it detects a queue-time breach, rather than just 1 (the default):

For certain types of traffic (and other) spikes, setting this value higher can help immediately quash issues and slowness! There are actually a few key types of situations and applications that would benefit this type of adjustment.

For web dynos, jumping by more than 1 dyno at a time is extremely helpful if you know ahead of time that your application’s traffic pattern is inherently spiky. That is, if you work in some kind of domain where traffic will, as an expected part of the business, go from low to very-high very quickly, often. This is slightly different than a schedule, as we covered in Maximizing Performance Part 2. We can think of schedules as a pre-set jump to a particular number of dynos at a given, known time. Upscale jumping is instead better when there’s not a known time. If instead you know you have spiky traffic as part of your business but it’s not just once a day at a particular time, increasing your upscale jumps may be the answer.

For worker dynos, turning up your upscale jumps can be super helpful if you do a lot of batch work. That is, any time you generate a large number of jobs that are suddenly waiting to run:

While a huge batch of background jobs getting created will follow the normal autoscaling process and work its way up to more dynos over time, we can crunch through all those jobs faster if we leverage upscale jumps. The idea is simple: when a bunch of jobs are created and autoscaling kicks in, it’ll spin up several new dynos instead of just one-at-a-time to crush all those jobs ASAP:

👀 Note

When it comes to upscale jumps with background jobs, remember that Judoscale maintains settings for each of your processes separately. So you can easily turn up you upscale jump count for just your batch-worker process, if you have one, while keeping your other workers more normalized! In fact, if you do a good amount of batch work but don’t have a process dedicated to processing batch jobs, it may be worth splitting your batch work out to a separate process specific for Judoscale to be able to upscale with big jumps!

Finally, the last condition we want to call out where increasing your upscale jump count can be beneficial is any time you’re running lots of dynos. This signals a setup where your dynos are small relative to your traffic. That could be running Std-1X’s with 3,000 (super-efficient-) Requests per Second or Perf-L’s with 1,000 (not-super-efficient-) Requests per Second. Every app varies. But if you’re running >25 dynos on any given process (web or workers), you may want to consider increasing your upscale jump count. This is less about spikiness and more about the impact that scaling up by a single dyno may have. Simple math here: when your process runs 10 dynos, adding 1 more is a 10% capacity increase. When you run 25 dynos, it’s only a 4% capacity increase. Some of our customers run 50-100 dynos on a single process! This is easily understood visually — look at the relative difference in capacity increase between these two high-count processes!

So if you run >25 dynos, you’ll likely want to play with your upscale jump count. It’ll make your upscaling moments more impactful, relative to your scale level and traffic (but there is no concrete correct answer here!).

Upscale Frequency

A close sibling to upscale jump count is upscale frequency! Where the former is “how many to upscale by”, upscale frequency is “how fast should I upscale again?” And the ‘again’ there is key, since upscale frequency doesn’t matter the first time your application needs to upscale! It only matters if that first upscale didn’t resolve the capacity issue and your queue time remains high. An easy way we like to think about this is the width of our dyno ‘boxes’ in our charts — an idea we briefly talked about in our Ultimate Guide to Autoscaling on Heroku:

In this view, the width of each column of boxes (dynos) shows how fast Judoscale can scale up your process. With the wider boxes on the left side of the chart, regardless of what happens with traffic and queue time, Judoscale must wait until the end of the box before scaling again. If we zoom in, it looks like this:

But the key feature here is the difference in the width once we bring down our upscale frequency time (so that upscales can happen more frequently):

All of that to say, using a faster upscale frequency can make your application more reactive to capacity needs that aren’t resolved by the first upscale. But, as with so many things in life, there’s a catch!

The primary reason we default upscale frequency to 120 seconds for new applications is that we need to wait to see if the first upscale is already resolving the capacity issue. Remember that it takes a little while for a new Heroku dyno to spin up and start adding functional capacity to your application. Additionally, once that dyno is up and servicing requests/jobs, queue time still might be high but decreasing. That means the additional capacity is solving the problem, even if your queue time is still above your upscale threshold.

At that point, there’s no need to scale up again! That first upscale added enough capacity to fix the issue and queue time will soon be zero (or close to it) again — we just need to wait for the additional capacity to help burn through the backlog that’d built up. That’s the essence of the upscale frequency delay! Waiting to see if the additional capacity that was already added solves the problem and alleviates the queue-time backlog, or if even more capacity needs to be added.

But of course, you don’t want to wait too long! The goal of autoscaling is to quickly alleviate capacity issues that may be slowing down your users’ requests! So we don’t want to go too high on our upscale frequency.

On the other hand, if you set your upscale delay too low, your process will continue to scale up and up before even knowing if the early upscales helped! That means added cost over time and lots of subsequent downscales after the queue-time has been alleviated. This is a type of ping-pong scaling we want to avoid!

We’ve found that an upscale frequency value between 90 and 120 works well for most apps. As with all Judoscale settings, we encourage all of our customers to experiment and see how their apps behave with various setting values. As you make changes and watch the scaling and queue time charts over time, you’ll find a sweet-spot!

👀 Note

We discussed ping-pong autoscaling in Maximizing Performance Part 1 but the keen-eyed among you will note that the discussion in part 1 focused on ping-pong autoscaling as a function of queue-time parameters. In contrast, here we discuss ping-pong autoscaling as a function of upscale frequency (and below, downscale delay)! You can ping-pong in multiple different ways?

Actually, these three separate controls are all working together! The target queue time range discussed in Part 1 essentially determines if the ping-ponging will happen, whereas the upscale frequency and downscale delay will control how fast the ‘ping’ and ‘pong’ (respectively) occur! Neat, right?

Of course, there’s the loop-back effect of the ‘ping’ and ‘pong’: as you change your capacity, that will likely impact your queue time at that moment. That new queue time may now be above your upscale threshold or below your downscale threshold, which will then cause another ‘ping’ or ‘pong’. Thus the endless ping-ponging loop continues!

Downscale Delay

Finally, we have downscale delay. Similar to our upscale delay, this is a purely time-based gate… but it too has its own interesting implications. For starters, we can note that Judoscale’s default downscale delay is significantly longer than the upscale side of things! In general, we want to upscale quickly to resolve capacity problems but downscale cautiously as to not cause any capacity problems. This can be a tricky balancing act!

The core premise of automatic downscaling is to cut capacity costs when we’re over-provisioned at any given time. This is ultimately how we save money and retain application stability in the long-term! But how do we actually know that we’re over-provisioned? How do we know that it’s okay to down-scale at all?!

The answer is simply by giving the system time to settle into its current state before changing things. And, since it’s much less risky to stay at a higher capacity for longer than necessary than it is to stay at a lower capacity for longer than necessary (that would mean failing requests!), we opted to make the downscaling side of things the longer delay. By waiting several minutes (ten, by default) and ensuring that queue time never breaches the downscale threshold (which should be very low), we’re confident that the application can be downscaled without issue. This exact logic is actually why it’s so very important to make sure your downscale threshold is accurately tuned! Any jitter in your queue time, even while remaining generally very low, is an indicator that you’re at the right capacity level and shouldn’t change. Having nearly no jitter at a given capacity (for several minutes) is the indicator that we can safely down-scale.

vs…

But, of course, defaults aren’t perfect for everyone!

So when is the right time to change the downscale delay? There are a few cases we often see that benefit from a shorter, or longer, delay.

The first is applications running a lot of dynos on a given process. Again, generally >=25, but the rule isn’t hard-set. Essentially any situation where a single dyno scale-down would only change your capacity a few percentage-points. In those cases, you may want to decrease your downscale delay. Especially if you’ve increased your upscale jump count, this helps your app fluctuate around its large cohort of dynos faster and remain more reactive, both on the up-side and down-side, to your traffic patterns.

The second occasion you may want to adjust your downscale delay is the opposite — when a single dyno scaling down would have a major impact on your overall capacity. Again, that could be because you’re only running a handful of dynos at a time, or it could also be that you’re running several dynos, but they’re each handling massive amounts of traffic. In both cases, you want to be extra cautious when scaling down to ensure that your current traffic load is safely handleable by your current dyno count minus one. The best way to ensure that safety is to wait longer before downscaling. It’s always safer to temporarily run more dynos than you need! You just want to make sure it is indeed temporary, or you won’t end up saving all that much with autoscaling!

Setting Your Settings!

So there we have it. Upscale jump count, upscale frequency, and downscale delay. The three knobs to tune Judoscale’s autoscaling algorithm to your specific app’s traffic and time needs! Just remember, small steps / tweaks and time spent monitoring those tweaks before adjusting further is the path to happy days!

Most of Judoscale’s default settings will work for almost all apps, but getting things dialed in for your particular app will inevitably yield even better autoscaling performance and happier customers. 😁

Also, keep ping-pong scaling in the back of your mind! The more reactive you setup your upscaling and downscaling, the more possible it’ll be to end up with ping-pong scaling! Just try to be careful to adjust them both within reason. We know you’ll find a happy medium for your application!

Bonus: Dyno Sniper

We mentioned a few months back in our monthly newsletter, and also in our recent blog post, How to Fix Heroku’s Noisy Neighbors, that Dyno Sniper is the answer to noisy neighbors. We also mentioned that it’s currently in an opt-in status (let us know if you want to give it a try)… but since we’re talking about the bottom of the Judoscale settings UI, I didn’t want to leave it out! We’ll write another post soon about the impacts of noisy neighbors, but suffice it to say that, in our own experience, a day without dyno sniping enabled can average twice as many dynos as a day with dyno sniping enabled. One simple checkbox… but so much power! ⚡️

An Opinionated Guide to Planning Your Sidekiq Queues

Adam McCrea — Mon, 20 May 2024 00:00:00 +0000

A Judoscale customer recently wrote in asking for feedback and advice on their Sidekiq queue structure, and it’s a question we see often enough to warrant a full post of its own. Here are the specific questions we want to answer in this post:

“How should our team structure our Sidekiq queues?”
”How should I think about spreading queues across Sidekiq processes?”
“How do I avoid queue back-ups or over-provisioned resources?”

Being intentional about your Sidekiq setup can help you avoid an explosion of queues and unnecessary complexity, and efficiently utilize your resources (and cost 💰). We’re going to give you some simple strategies that will yield big gains!

Some Sidekiq basics

Before we answer the questions above, we need to understand how Sidekiq works and the basic terminology.

A job is a single unit of work in Sidekiq
- E.g. send_an_email_job
A job gets pushed onto a queue that lives in Redis
- E.g. email_jobs_queue
A Sidekiq thread pops jobs from one or more queues and executes the work.
A Sidekiq process manages the threads
- E.g. a sidekiq-worker process defined in your Procfile
The process runs on a container, which you might also call a “service” or “worker dyno” (Heroku’s terminology)
- E.g. the actual dyno Heroku spins up to implement that Procfile process

For a typical Heroku app, this might look like the following:

You, the app developer, decide how many queues exist, what you call those queues, and which jobs go into which queues. You also decide which processes monitor which queues (called queue assignment), how many threads to run per process (called concurrency), and how many containers to run in parallel (called horizontal scaling)…

That’s a lot of decisions! And we haven’t even touched on configuring the compute characteristics of the containers themselves (say hello to vertical scaling).

But have no fear—it doesn’t have to be complicated! With a clear mental model for how these concepts interact and a simple, and a sensible starting point, you (yes, you!) can easily scale your Sidekiq system.

How Sidekiq queues get out of hand

Out of the box, Sidekiq gives you a single queue called “default.” This means:

A job class that doesn’t specify a queue will enqueue jobs to the “default” queue
Running bundle exec sidekiq without specifying a queue will fetch jobs from the “default” queue

Could you just roll with this default setup? Sure!

Should you? Absolutely not! ❌

The reality is that for a brand-new app, this simple setup would work just fine… for a little while. But eventually, you’re going to want another queue. One of the following scenarios will pop up, and probably sooner than you think:

You have a job that needs to be run quickly, so you want it to have a higher priority than your other jobs
You have a job that takes a while to run, so you want it to have lower priority so it doesn’t block other jobs

When you encounter these scenarios, you might create an “urgent” queue and a “low” queue. You update your Sidekiq command to look like this:

bundle exec sidekiq -q urgent -q default -q low

This ensures that Sidekiq picks up jobs from the urgent queue first, then default, then low. And it’ll probably work great… for a little while.

The problem with this setup is that the queue names are ambiguous. How urgent is “urgent”? What does “low” even mean?

Inevitably, you’ll have many degrees of urgency, and this ambiguity will result in an explosion of Sidekiq queues. Now you’ve got very_urgent and most_urgent and seven other ambiguous queues. You might also end up with queues for specific business functions like csv_exports and forgot_password_emails.

This is a mess! And there is a much better way. Let’s level-up.

Latency is everything

Let’s take a step back and talk about Sidekiq metrics. We need to figure out how to measure our queues — how healthy they are, how fast they’re running, and when we’ve hit a bad point. Like most queueing systems, we’re typically talking about two key metrics:

Queue depth: how many jobs are in a queue waiting to be processed
Queue latency: how long any given job waits in the queue before it’s processed

Queue depth is easier to visualize, so it’s often mistaken as the most important metric. Unfortunately, queue depth is a lie, and I’ll show you why.

Imagine two queues, both single-threaded:

Queue A has 10 jobs enqueued. Each job takes one second to run
Queue B has 10,000 jobs enqueued. Each job takes one millisecond to run

One of these queues might appear to be “backed up” because it has a high queue depth (10,000 jobs), but in reality the queue “health” is the same—they will both clear their backlog in 10 seconds. That’s queue latency, and it’s the number that matters.

So, is a 10-second queue latency good or bad?

Of course, it depends. That’s a business decision based on the kind of work those jobs are doing. Here we find ourselves back at the concept of “urgency”, but now we can quantify it.

The clarity of latency-based queues

Now that we’ve established queue latency as the metric we care about, we can fix the ambiguity of our previous queues:

“urgent” becomes “within5seconds”
“default” becomes “within5minutes”
“low” becomes “within5hours”

That is, if I push a job to the within_5_seconds queue, I should expect that that job begins processing within five seconds!

Of course, you can change those numbers to whatever you want, and you can have more or fewer queues. The specifics don’t matter. What matters is being explicit.

We’ve encoded explicit latency expectations directly in our queue names, and by doing so we’ve avoided several problems:

When implementing a new job, choosing a queue is a business decision around when the logic of that job ought to start running, not an arbitrary technical decision
We avoid adding unnecessary new queues because every new job will fit into an existing queue
We have clear performance expectations for our queues, which will guide our scaling and auto-scaling plan

Each of these benefits is hard to overstate. As a developer writing a new job for your application, it’s so much easier to reason about the queue that job should be submitted to when the queue names communicate the latency expectation. If we’re writing a job to email someone their password-reset link, it’s as simple as, “well it should run faster than within_five_minutes, so we’ll put it in the within_five_seconds queue!” Developing jobs is a significantly nicer experience when our queue names represent our desired runtime latency expectations. But, of course, this only works when the jobs in the queue actually do run ‘within five seconds’! And to ensure they do, we need to introduce scaling.

Scaling Sidekiq queues the easy way

Scaling Sidekiq is typically all about avoiding a queue backlog, but as we’ve discussed, we won’t be distracted by queue depth. Latency is what matters, and now that we have our latency expectations encoded in our queue names, we can quantify our scaling goals:

👀 Note

Each queue’s latency should remain within its target SLA (name), using as few resources as necessary.

You can’t do this manually. Job throughput fluctuates too much to know how what resources you need at any given time.

Autoscaling makes this trivial as long as your autoscaler supports queue latency (sometimes called ‘job queue time’) as the trigger metric. This is the reason we built Judoscale, so check it out if you’re not already autoscaling Sidekiq based on queue latency.

But your autoscale settings will depend on your queue assignment, and that’s often the trickiest part to get right.

Assigning queues to processes

It’s easiest to think of queue assignment as a spectrum. On one end is a single process that’s watching all of your queues, and on the other end each queue has a dedicated process. In between are many multi-process setups, where some processes watch multiple queues.

The benefit of a single process watching all of your queues is that it’s the simplest to set up, and it’s typically the most resource-efficient. The downsides are that you risk long-running jobs blocking other high-priority jobs, and it’s harder to autoscale. Let’s dig into each of these.

Let’s say we have a single process that’s watching three queues. We obviously want our highest-urgency queue to have the highest priority when fetching jobs. When there’s nothing in the high-urgency queues, the process will fetch jobs from the lowest-priority queue. Now let’s say we have a bunch of low-priority jobs that are high-effort jobs, maybe taking up to several minutes to run (I would call any job taking more than a few seconds a long-running job).

Our process picks up those jobs and starts processing. Meanwhile, some high-urgency jobs start to enqueue. Even though that queue has highest priority, all of our worker threads are busy with other jobs. The long-running jobs have effectively blocked all of the queues. 😱

Regardless of using priorities or sampling rates for how a process should pull from multiple queues, nothing can help you once that process is actually busy processing long-running jobs! It simply can’t pull another — it’s already busy!

In the dedicated process setup, this could never happen. Long-running jobs would only block other jobs in their own queue, which, in this case, is a low-priority queue and not a problem. High-urgency jobs would only block other high-urgency jobs (and hopefully only briefly)!

That’s both simple and clear!

But aside from how the workers process through jobs in queues, what about autoscaling the worker processes? Let’s think about this starting with the dedicated worker-per-queue setup. With this structure It’s actually super simple: you autoscale each process based on the latency of the one queue assigned to the process. The latency expectation is codified into the queue name, so set that as your latency threshold, and you’re set. The within_five_seconds queue is pulled and run by only the within_five_seconds_process, which is autoscaled such that it always keeps the latency within five seconds. Everything is aligned.

Alternatively, if the structure of a single process (watching all the queues), you’re forced to autoscale based on your highest-urgency queue. Using our example queues from earlier, this would be five seconds. But we don’t want to scale up if our “within5hours” queue has a latency of over five seconds, and this is where things get muddy with a single process. You’ll typically have your autoscaling only monitor the highest-urgency queue in a process, but then you might not scale up when needed for your other queues.

That can be dangerous, so let’s walk through it a little bit. Let’s say we have one process watching all three queues: less_than_five_seconds, less_than_five_minutes, and less_than_five_hours. Since we only have one process, we setup autoscaling to only watch the less_than_five_seconds queue and scale up if that queue gets slow. At some point later, our app fires off a few thousand jobs in the less_than_five_minutes queue, each taking several seconds to run. After ten minutes, there are still lots of jobs in the queue! Thus, we’ve broken the expectation of the less_than_five_minutes queue and our autoscaling didn’t scale up to accommodate the latency because it’s only watching the less_than_five_seconds queue! This is the inevitable trade-off of having a single process watch/process more than a single queue. There will be opportunity for latency expectation failure.

What if we run a mixed setup? Where some processes are dedicated for a single queue and others watch a few queues? This setup can work, but we don’t recommend it overall. As mentioned above, any time you have a single process watching more than one queue, there’s opportunity for expectation failure. It’s just not worth it!

In reality, the best answer here is to run dedicated processes per queue. Aside from making the mental model simpler and clearer, this setup makes autoscaling a breeze. Of course, everything has trade-offs — what’s the downside to running dedicated processes? Cost… kind of.

At face value, the idea that we’ll run dedicated dynos for each queue does mean we’ll incur the cost of all those extra dynos, but again, autoscaling to the rescue! Judoscale can actually scale your worker process down to zero dynos when there aren’t any jobs to run! No cost incurred when you’re not running any dynos, right? Then, once a job comes in, Judoscale will scale your process back up to handle that job. Judoscale does this 24/7, so, while you’ll have a bit of extra cost when running dedicated dynos, it’s nothing close to actually running extra dynos 24/7 if you’re not processing jobs 24/7.

And with the cost being an almost-non-issue, that means there really are no downsides to running dedicated processes per queue!

Special queues for special jobs

Occasionally, you’ll run into a job that has unique performance characteristics:

A job that requires much more memory than other jobs. Import and Export jobs sometimes fall into this bucket
A job that requires much higher CPU than other jobs. Doing some LLM work, perhaps?
A job that depends on a slow or unreliable external API

These jobs still have some kind of latency requirement, but they might cause problems lumped into our latency-based queues. In cases like these, it can make sense to create a special queue that’s not tied to latency.

I still recommend that you name the queue based on its unique performance characteristics rather than a business function. For example, “high_memory” is better than “exports”, because future high-memory jobs can utilize the same queue instead of creating a new queue.

In terms of queue assignment, non-latency queues should almost always have a dedicated process. High-memory and high-CPU jobs live in their own queue so that you can control the compute resources you dedicate to them—there’s no reason to upgrade your entire worker fleet just because a few jobs are memory hogs. You might also want these dedicated queues to be single-threaded to constrain resource usage.

And it’s worth noting here too, scaling these queue’s processes down to zero instances may be very helpful for curbing the cost of these additional, dedicated, processes. Especially if (like many export jobs) they only run a handful of times per day. Don’t pay for compute you’re not using!

Your recipe for Sidekiq bliss (Tl;dr:)

Now that we’ve gone deep into the why, let’s come back and answer the questions from the beginning:

How should our team structure our Sidekiq queues?

Your queues should be named based on the latency requirements of the jobs in those queues. Three latency-based queues (less_than_five_seconds, less_than_five_minutes, and less_than_five_hours) is a good starting point. Over time you might need a small number of special queues based on unique performance needs, but do your best to stick to your latency-named queues.

How should I think about spreading queues across Sidekiq processes?

The easiest and most efficient answer here is to run a dedicated process per queue. This mental model is simpler, makes autoscaling much cleaner, and, when using the right tool (like Judoscale), you can scale the process down to zero dynos/instances, meaning you’re not spending much extra for the added clarity of separate processes.

How do I avoid queue back-ups or overprovisioned resources?

Setup autoscaling for each process according to its queue-time expectation (name). E.g. set up autoscaling for your less_than_five_seconds queue-handling process so that it scales up any time the job latency is above five seconds. This aligns your queue-time expectation with your scaling behavior! This should essentially keep your queue-time expectation true all the time. The moment it breaks that expectation, it’ll scale up another dyno to bring it back into spec.

On the other hand, you can also setup autoscaling to scale processes down to zero dynos when there are no jobs to run. This is essentially free savings 🤑! Don’t spend money on running dynos you don’t need to be running. This is a core feature of Judoscale, so check it out if you’re running another setup/methodology.

One note: we wouldn’t recommend scaling your less_than_five_seconds queue/process down to zero. Given the dyno boot-time, jobs that land in that queue when there aren’t any dynos running could take 30-45 seconds to finally start running, which is not “less than five seconds”!

Sidekiq Setup Can Be Easy

We hope this guide gives you several new ideas to consider, a plan for how to implement Sidekiq into your app, and some cost-savings to engage ASAP. Just remember the Tl;dr:’s above and you’ll eliminate almost all background-job-system headaches that most folks wade through year-over-year.

If you follow these steps, you’ll have a Sidekiq setup that’s simple, reliable, and scalable! And if you have questions along the way, feel free to reach out. We’ve seen the good, the bad, and the ugly, and we’re happy to share our time and assistance 🙂.

P.S. Container Size

While this guide is mostly around application strategies and autoscaling parameters for Sidekiq, we’d be remiss if we didn’t include a note at the end addressing vertical scaling just a bit: how big of a container/dyno should you use for Sidekiq?!

While it’ll be different for every platform, start simple.

On Heroku, we recommend starting with Std-1X’s for your Sidekiq processes. This is because, as we mention in our Ultimate Guide to Autoscaling Heroku, smaller containers result in higher-resolution (more modular) autoscaling steps! If we can use Std-1X’s, we should.

Like any Rails process, memory will almost certainly be your limiting factor here, not CPU. So use Std-1X’s with the default number of Sidekiq threads and see how the memory looks after running with a normal job-load after a few hours.

If the dyno is operating within the memory limits Heroku sets for Std-1X’s, you’re good to go! If you’re grossly over the memory limit, you’ll need to either reduce your Sidekiq threads or go up to Std-2X’s.

Post-Mortem: How Heroku Router 2.0 Wrecked Our App

Adam McCrea — Wed, 15 May 2024 00:00:00 +0000

On Saturday, May 11, we decided to try the new Heroku Router 2.0, which is still in beta. The result was 50 minutes of downtime and a completely rebuilt production application, back on the legacy router. Here’s our full story.

Enabling Router 2.0

We’d been running Heroku’s new router in our staging environment for 24 hours with no issues, so we decided to give it a shot on production. Worst case scenario, we’d just roll it back… right?

Anyway, I settled in with my coffee, looking forward to an easy Saturday morning upgrade.

I ran the command, and immediately I started seeing slow requests across the board—all dynos, all app endpoints. I assumed the app was just catching up after the switch, much like it has to do after a restart or deploy. Catch-up usually just takes a few seconds, so I waited.

But minutes later, nothing had changed. Our autoscaler (Judoscale, naturally) was scaling us up, but requests were slow no matter how many dynos we were running.

Our application response times looked great—this wasn’t an issue with Rails or our database. But overall response times were awful—request queue time was off the charts.

Reverting the change

By this point, our Slack was going crazy with alerts, and my teammate Carlos offered to help. We hopped on a call to investigate it together.

We tried restarting all of our dynos, and we tried deploying a new release, but neither helped at all. We decided to bail and revert to the old router.

Unfortunately, reverting to the legacy router didn’t help at all. We thought maybe we were still using the new router, but we confirmed the legacy router in our router logs:

At this point we updated our status page to notify our customers about the incident. We thought we had an “undo” button if the router migration didn’t work out, but we were now in uncharted territory. We were back on the router where we started, but nothing was the same. We had no idea what was going on.

We tried restarting the app again. We tried scaling all dynos down to zero then back up. We re-examined our metrics to make sure it wasn’t an upstream database issue. Our requests were still performing great in Rails, but requests were timing out all over the place.

As a last resort, we tried re-enabling Router 2.0 again, but there was no change to our response and error metrics.

Recreating our production app

Our dynos were way over-provisioned. We should have had plenty of capacity for our traffic, but requests were still queuing and timing out. It seemed like a Heroku router issue, and there was nothing we could do about it.

So we reached for the nuclear option: We created a brand new production app on Heroku.

We really had nothing to lose at this point. Our app had been effectively unavailable for 20 minutes, and there was nothing else we could do except open a ticket with Heroku. We simply couldn’t wait for that.

Our thinking was: If switching routers somehow hosed our current production app, maybe a fresh app wouldn’t have the same problem.

Fortunately, it wasn’t as daunting as it sounds. We don’t use many Heroku add-ons, and the ones we do use aren’t mission-critical:

AppOptics for monitoring our infrastructure and performance metrics.
Scout APM for performance monitoring.
Judoscale for autoscaling.

Our data stores, error tracking, and log management are all third-party (not add-ons), so all we needed to do was copy over the environment variables from the existing production app.

We made sure the app worked as expected at the direct Heroku URL, then we decided to flip the switch by updating the domains.

We updated our CNAME in Cloudflare, and… OOPS!

In our stress and haste, we forgot about the SSL cert!

No problem. We created the origin certificate in Cloudflare, added it to Heroku, and we were in business.

Resolution

We watched as traffic flowed into the new app, and our response times dropped back to their normal levels.

We started breathing a little easier. We continued to monitor the app while we updated our status page and checked for support tickets from affected customers.

We were finally in the clear! The total time of the incident was about 50 minutes—it started at 2024-05-11 12:53 UTC and cleared at 2024-05-11 13:52 UTC. Fortunately the app was partially available for most of that time, so customer impact was minimal.

Lessons learned & next steps

In hindsight, we should have load-tested our staging app with the new router. Our staging app only sees about 5 RPS, while our production app is 1,200–1,500 RPS. It wasn’t fair to say we’d tested the new router by simply throwing it on our staging app.

On a positive note, it was super reassuring to know that we can recreate our entire production app within a few minutes! It felt sort of outrageous to do it at the time, but I think it was the right call.

I mentioned that we didn’t open a support ticket with Heroku during the incident, but we’ve since opened one so they can help us understand what happened. If they provide some insight, I’ll be sure to update the post.

Maximizing Performance with Judoscale: Scheduling Your Scaling

Jon Sully — Sat, 11 May 2024 00:00:00 +0000

Time for our second edition of Maximizing Performance with Judoscale. This time we’re going to dig into scheduled autoscaling. We want to show you a few clever setups and situations where combining Judoscale’s typical queue-time based scaling with time-based tweaks can make for both a reliable application and great savings.

So… let’s talk about time! ⏰

Maximizing Performance with Judoscale is a series that covers all of Judoscale’s features and options. Jump to other posts here:

Target Queue Time Range
Scheduling your Scaling (this page!)
Setting Sensitivities

👀 Note

Scheduled Scaling

Let me set a little bit of context here before we dive in. Judoscale’s primary objective is to scale your application based on queue time — either your request queue time for web dynos, or your job queue time for background-job dynos (we often call these “worker” dynos). Without any additional setup, that’s exactly what Judoscale will do! Throughout the day, 24/7, Judoscale will constantly monitor your app’s metrics and adjust the number of running dynos (your ‘current scale’) to fit your Target Queue Time Range. That is, until/unless we reach your specified minimum or maximum number (“scale limits”) of dynos you want to run on this process:

We can think of scale limits as guard-rails for your autoscaling. In this particular example, the process can scale up but Judoscale will never scale the process higher than 3 dynos. Similarly, Judoscale will never scale the process lower than 1 dyno.

👀 Note

Did you know that Judoscale can scale your process lower than 1 dyno? Judoscale is able to scale your background-job processes down to zero when they’re not in use! Then, once a job is queued, it can fire the process right back up to handle the job. It’s like an automatic ‘sleep’ mode for your worker dynos 😎

So when we designed and built the schedule feature, we did it in a way that worked together with autoscaling, not against it. We didn’t want a ‘schedule’ that essentially disabled autoscaling during a particular span of time — we wanted to build something that would just adjust the scale limits for a particular span of time! And that’s all it does. A schedule allows you to setup spans of time where your scale limit is overridden to a different value.

When you create a schedule, you first setup the different spans of time where you want the override to kick in. Let’s use a contrived example. Let’s say that my application is only ever really used on Wednesdays, only by users that live in Eastern Time, and only during typical business hours. In time-terms, I know that I’ll want a scale limit override to kick in from about 8am to about 5pm in EST. Our schedule override interface uses UTC to keep things simple, so we’re looking to override our scale limits from 12:00 UTC (8am Eastern) to 21:00 UTC (5pm Eastern).

So I’ll pop open our Schedule override panel and setup a single time-span that I want to override our defaults:

Once I save that, I’ll now see two Scale Limit ranges in the Judoscale UI:

And I can change and adjust the scale limit for the override time apart from my app’s default limit. Since this example is all about expecting users and traffic during our override, we’ll bump the values up to prepare for that traffic:

The neat part of this (and the reason we call it ‘scheduled autoscaling’) is because autoscaling is still active during this override period, but Judoscale is aware of your new scale limits. That means that, for example, your app can happily stay at minimum scale all night (1 dyno) but Judoscale will immediately scale it up to the override-minimum (3 dynos) once it hits 12:00 UTC. Scheduled autoscaling is still queue-time autoscaling, it just changes the scale limits at a set time, for a set period.

But Why? When?

We built this feature into Judoscale because many of our customers in the early days wanted it! But why? Let’s explore a few clever ways this strategy can benefit both your application and your business.

When you know the traffic is coming ahead-of-time. If you’ve ever run any kind of e-commerce company and launched a sale, you’ll know that traffic spikes right at that moment. In fact, many businesses find the trend of ‘weekly drops’ to be extremely effective in today’s market so we see them all over the place. The idea is this: if you know the traffic is coming, setup a scaling override to accommodate that traffic pickup! Whether it’s a sale, a product launch, ticket sales, course enrollments, or anything else where you know ahead of time that you’ll have a large, sharp influx of traffic, set up an override! If you don’t, Judoscale will still scale up your dynos on a step-by-step basis, but this can still leave you with too few resources when traffic increases are sharp enough.

Let me put it in visual terms. This is what it looks like when you have a very sharp traffic increase and no override schedule. We can see the dyno count does increase over time, but it’s not enough to thwart the massive traffic increase:

Conversely, we can see in this chart what it looks like to have a schedule override — queue time hardly even budges!

When you have reliable daily traffic patterns. Does your application primarily serve users in only a few timezones? In one hemisphere? Is it an internal-only application generally only used during work hours? Is it a content-driven site that primarily caters to morning coffee-drinkers or afternoon-reddit-scrollers? Maybe it’s content-delivery for media that people primarily consume after work? All of these situations make great candidates for override scheduling — they all have clear, time-based, traffic patterns on a daily or weekly basis. Slightly different than the above, the focus here is on the cyclical nature of the traffic as a function of the application’s use and the slower ramp up of traffic.

These types of traffic patterns are less likely to spike traffic fast enough to where the step-by-step / typical autoscaling setup is an issue, but setting an override schedule can help you better-control the upper and lower limits of what’s considered ‘normal’ in those traffic windows.

For instance, let’s consider an internally-used application for a large company that operates only during North American work hours (8am Eastern to 5pm Pacific). During those hours, the team wants to keep a minimum dyno count of 5 and a maximum of 10. Outside of those hours, it’s okay to bring the minimum dyno count all the way down to 1. No sense in wasting resources outside of known traffic patterns! They could also lower the maximum dyno count during off-hours to help prompt alerting if traffic is abnormally high during off hours. Lots of flexibility allowed here!

When you have batches to process! This is a similar situation as ‘when you know traffic is coming ahead of time’, but for your background jobs! If your application kicks off large batches of long-running or high-horsepower background jobs on any kind of known schedule, you can use override scheduling to your advantage! This works on both sides, too. If you want to minimize the latency and runtime of those batches, you can setup an override schedule to preemptively upscale you so that your workers can quickly parallelize and crush your job queue! On the other hand, if you’re in a period where you know you won’t be processing jobs, you can use an override schedule to allow your worker dynos to downscale all the way to zero. If a job comes up, Judoscale will spin up a dyno to execute it (this isn’t a forced ‘stay at zero’), but you have the flexibility to allow zero workers for different periods of time.

Wrapping it up

Judoscale offers you the facilities to setup a smart autoscaling schedule based on overriding your scale range for various periods of time. When considered in the scope of your application, your business, and your users, this override schedule can be a simple way to ensure that your application’s resources are pre-set for success, even while auto-scaling to accommodate the smaller tweaks and needs.

Whether it’s anticipating a major sale, handling predictable traffic surges, or managing resource-intensive tasks, scheduled autoscaling helps your system stay resilient and responsive.

If you made it through this whole article but haven’t actually signed up for Judoscale yet… kudos! We’d recommend you check out our forever-free plan — you can have free autoscaling forever and only need to install a simple Ruby Gem, Node Package, or Python ~~Cheese~~ Package.

P.S. We’re around to help you through this process too! Just click the “Help” button in the Judoscale UI and pick your style!

Maximizing Performance with Judoscale: Target Queue Time Range

Jon Sully — Fri, 5 Apr 2024 00:00:00 +0000

We’ve written several articles around application performance, architecture, and efficiency… which is fair — keeping an application running efficiently at a high level requires a lot of knowledge! Many of our previous articles help to prepare someone for why they ought to use autoscaling, what they should look for in an autoscaler, and what autoscaling should really do for you. These are all great things to understand, even if you don’t use Judoscale.

But while reviewing this knowledge-base recently, we realized we probably need to expand into Judoscale — to cover more deeply the specific dials and knobs available to you once you’ve got your application up-and-running in Judoscale.

So buckle up. We’re going to get nerdy.

Maximizing Performance with Judoscale is a series that covers all of Judoscale’s features and options. Jump to other posts here:

Target Queue Time Range (this page!)
Scheduling your Scaling
Setting Sensitivities

👀 Note

We’re going to use the term “dyno” or “dynos” in most places here; this is Heroku’s word to represent an application container instance. Render calls these “Services” while Amazon ECS calls them “Tasks” wrapped up under a “Service”. All of the theory in this article is just the same between all three, “dyno” just remains our default phrasing.

Target Queue Time Range

We might as well start with the root of the theory! If you’ve read our guide on Request Queue Time, you’ll know that request queue time is the single most reliable metric for determining whether your application is currently over-scaled, under-scaled, or well-scaled (that is, doing fine). If you haven’t read that guide, take our word for it 😜. You can setup autoscaling based on overall request response time, CPU % saturation, RAM % usage, or other metrics, and that might work in some cases, but those metrics are unreliable factors for scaling and can leave you in an unexpected rut! Stick to queue time!

So, for starters, we plot queue time on a chart over time so you can quickly understand how your dynos are performing. For this entire post we’re only going to consider web dynos, but these principles apply just the same to your workers too — instead of request queue time, it’d be job queue time, but the theory is all exactly the same.

And while I haven’t given any sense of scale to the axes here (yet), we can already begin to make some high level judgements:

In a simplistic sense, when your average request queue time is high, requests are waiting for a dyno to handle them and that is bad. When your average request queue time is low, a dyno was ready to take the request the moment it came in — that is good.

The Upscale Threshold

Let’s try to overlay a threshold line for where request queue time is too high and we should start to scale up:

Here we’re acknowledging a couple of important truths.

First, that queue time isn’t always going to be zero. In fact, it’ll never be truly zero. Even if you’re way over-provisioned, requests still take time to travel from a load balancer to your server processes — and this is part of queue time. Beyond that, there is an acceptable level of request queue time for every application. And that specific threshold is different per application. Your app’s unique endpoints, resource capacity, resource usage, and many other factors drive this threshold.

Second, we’re acknowledging that some request queue time (micro-) spikes will resolve themselves. This chart annotates the spike on the left and the spike on the right as “Should Upscale” but it doesn’t point to the little spike in the middle. That’s important! Some applications will have spikier queue times than others and not every spike should trigger upscaling.

This really depends on your app’s endpoints and performance. If your application tends to run only a few dynos/services (say, 1-2) but you know you’ve got a few really slow endpoints (say, CPU churning through huge CSV file uploads) that will stop any other web threads from handling other requests in that time — that’s going to be a queue-time spike! The thread churning through that CSV file will prevent other threads from handling other incoming requests. But that doesn’t necessarily mean you should upscale. That’s a natural queue-time spike. “Natural” because it’s the natural result of non-performant code. But also “natural” because it will likely recover on its own. As long as most of your application’s endpoints are reasonably performant and most of your traffic is on those endpoints, it’s likely that your application will recover from those hiccups naturally.

👀 Note

Have you ever taken a sip of water and it “goes down the wrong pipe”? You cough a few times, sure, but you recover after a moment and continue breathing (or drinking). Your dynos/services will do the same thing here.

On the other hand, you can have cascading queue-time increases. These are not natural queue-time spikes — these are where you simply need more capacity! Where natural spikes recover on their own thanks to being temporary and short-lived requests on low-performance endpoints, cascading queue-time increases tend to come from simply taking on more traffic to any / all of your endpoints.

This illustration (below) can help show the difference. At first we see a small queue-time spike — a natural spike that recovers on its own. Scaling up at this little spike would be premature and wasted cost. But soon after we start to see the cascading queue-time growth! This is an exponential curve on the queue-time chart — as traffic increases and current capacity can’t keep up, additional traffic only means additional wait!

So we want to carefully place our red “upscale” line above the natural spikes but low enough that upscales will kick in before cascading queue-time gets too bad:

Let’s convert this concept back over to Judoscale and its dials and knobs. In Judoscale we have the Target Queue Time Range sliders and a chart that illustrates this same concept. For now let’s just focus on the top line of the shaded green area. That top line is the Upscale Threshold line, and it’s exactly the same concept we just discussed.

This is the value that we want to tweak so that it’s above the natural queue-time spikes, but low enough that cascading queue-time growth causes upscales. This particular screenshot was taken from one of our applications where we’ve determined 50 milliseconds to be the appropriate threshold that doesn’t upscale for natural spikes but does upscale when queue-time trends upwards overall.

The key word there is, “we’ve determined”. Every application varies in its efficiency, endpoints, and overall performance, so there’s no magic number here. The right answer is to spend a day sussing out what a good upscale threshold for your application is.

The simplest way to figure out a reasonable upscale threshold for your app is to statically over-provision for a day (by just a bit) and watch your natural spikes. That is, if your application normally scales between 2-3 dynos for the day, turn off autoscaling and set your dyno count to 4. If your service is normally between 20-24 dynos, set your dyno count to 26 or 27. Then watch the queue time chart in Judoscale throughout the day. You shouldn’t see any cascading queue-time increases since you know you’re already over-provisioned and should have enough capacity. You should see natural queue time spikes that resolve themselves automatically. That should look something like this:

👀 Note

If you’re worried about having autoscaling disabled completely, even with a higher-than-usual dyno/process count, you can do this same experiment by:

Leaving autoscaling enabled
Setting your minimum scale count (the lower end of your Scale Range) to the higher-than-usual number mentioned above so that you should already be over-provisioned all day
Setting your upscale threshold to a very high number, like 250ms

This will let you observe the natural spikes for your app while over-provisioned, but will still scale up if things get out of hand. Just make sure to write down your previous scaling settings so that you know where to reset them to once you finish your day-long experiment.

For this application, we can then see that setting an Upscale Threshold of 45ms would sit above the natural spikes:

But in reality, we want to give a little more breathing room for natural spikes as they happen. For this app, we’d recommend 60ms instead of 45. That puts the line here:

A little higher than above the spikes than the 45ms line, but that’s a good thing. Remember that we only want to go ahead and autoscale up if we’re facing a cascading queue-time increase! The nice thing about a cascading increase is that it will… keep cascading! So the relative difference between an upscale threshold of 45ms and 60ms is relatively minor — likely only a second or two before autoscaling would otherwise kick in. We can visualize that, too!

Alright, that’s about it for upscale threshold. The “tuning” process is the process of learning what natural spikes vs. cascading queue-time looks like for your particular application and setting a reasonable threshold for that value in the Judoscale Target Queue Time Range.

The Downscale Threshold

Now that we’ve got upscaling theory in our minds and our Judoscale dials tuned to our app, let’s talk about the other side of scaling — downscaling! Let’s refer back to our initial queue time chart first:

And let’s recall that we have two goals working in parallel. First, to have low queue times so that our users’ requests aren’t waiting to be processed. Second, to run as few dynos / services as possible. We want to satisfy both of these at the same time! In a single sentence, we want to run as few dynos / services as possible while still making sure our queue time is low.

The logic to accomplish this is rather simple and clever: if your queue time is stably low, downscale (gently) and see if it stays low. If it does, repeat. If it doesn’t, you’ve either found the correct scale for your application at its current load, or you need to upscale again (which should happen automatically; see above).

Let’s break that pseudo-algorithm down a bit.

“If your queue time is stably low” — What is low? When is queue time low enough to support off-loading capacity? For starters, we call this value the “Downscale Threshold” — the queue-time level that is low enough to downscale one step from the current scale. We represent this as a horizontal line in the queue time chart too! But how high that line should be in the chart becomes the question.

Here’s the same queue-time curve as shown above but with three different options for the downscale threshold level. Orange, Green, and Blue. (And for the keen-eyed among you, no, the colors don’t mean anything 😉 green is not a ‘right’ answer here):

If we had our downscale threshold set to the orange line/value, we’d, in theory, have downscaled twice (see the two orange arrows). If we’d had it set to the green line, we’d have downscaled once. And, of course, if we’d had it set to the blue line, we wouldn’t have downscaled at all.

So what should we have done? It might surprise you, but in general our recommendation is closer to the blue line than the orange or green. We’ve found over the years that a healthy application ready to be downscaled actually has a queue time line that looks more like this:

That is, extremely stable and very low while traffic continues — a nearly flat line. If your queue time looks like our hand-drawn chart above (lots of flux and roller-coasters), your application isn’t stable enough yet to downscale.

So how should we set our downscale threshold? Low enough that we don’t downscale until our queue time is a nearly-flat line as shown above? Not exactly. Setting our downscaling threshold is all about zooming in. Way in!

We want to see queue time numbers at a minute-by-minute basis. The Judoscale charts will give those numbers to you happily! For reference, let me give you the chart above with a more zoomed-in view:

Here I can see second-by-second queue time data as I hover my mouse across the chart and find that the queue time varies between 1.7 and 2.3 milliseconds. That is extremely stable. This app could downscale safely here. In general, an application can only achieve this extremely stable and flat queue time when they’re over-provisioned (have more capacity than they currently need)!

On the other hand, this is what an application that’s at its correct scale (for the current load) looks like:

This app has a mostly-low queue time (around 6ms) but is seeing little spikes of queue time (natural spikes that resolve themselves; see above) that peak at 18-30ms. That’s great. When your application is experiencing natural queue time spikes and your queue time is low but not ultra-stable, you’re at the right scale.

This app has their downscale threshold (green line) set very well, too. The peaks of these natural spikes are above their downscale threshold (20ms), so they won’t downscale here. That’s a good thing. The data clearly shows they’re at the right scale now, so downscaling here would only cause ping-pong scaling.

Ping-pong scaling is something to watch out for, and something you may have experienced. That is, downscaling only to find that you’ve gone too far and need to upscale back up to the prior scale… then a little while later doing the same thing. Repeating ad nauseam. That looks like this:

We do not want this. Luckily, the solution for ping-pong scaling is the same as above: zoom in!

If your application is already ping-ponging, zoom in to your queue time in the minutes before the downscale events. You’ll likely see some amount of variance in your queue time but with micro-spikes that peak below your current downscale threshold. Like these:

While these spikes are minor, they’re enough to signal that your app has enough capacity for its current load, but not so much extra capacity that it could downscale. So you’ll want to lower your downscale threshold until those spikes are above that line. That will prevent the next downscale event and should cease your ping-ponging!

So, where the goal of the Upscale Threshold is to set it high enough to ignore natural spikes but take action when there’s an actual capacity problem, the goal of the Downscale Threshold is the inverse: to set it low enough that natural spikes are above it but true over-provisioned-stability is below it.

Putting It Together

In Judoscale we display the Target Queue Time Range as a shaded area bounded by the Upscale Threshold on the top and the Downscale Threshold on the bottom:

But, with all the theory we just covered, we can think about that chart more like this:

The key there being that natural queue-time spikes should be captured in the shaded area. And that’s going to be different for every app as platform differences, system differences, code efficiency differences, and all sorts of other variables work together.

One piece of advice we can lend pertains to shared hardware vs. dedicated hardware (Std vs. Perf dynos on Heroku). It’s likely that you will experience more frequent, and taller, natural queue-time spikes when you’re on shared hardware and not dedicated hardware. In order to avoid needlessly upscaling on these natural spikes, you will likely have a higher Upscale Threshold on shared hardware.

On the other hand, if you’re running on dedicated hardware, your natural spikes will likely occur less often and be shorter. In order to prevent ping-ponging and correctly capture natural spikes into the ‘natural spike zone’, you’ll likely have a very low Downscale Threshold on dedicated hardware.

As an example, this application runs on dedicated hardware and maintains a downscale threshold of just 5ms. That’s very low, but it’s reasonable when on dedicated hardware. And indeed, when the application has room to downscale, its queue time is consistently below 5ms. Dedicated hardware is nice!

Conversely, this application runs on shared hardware and thus has natural spikes that peak into the 800ms and 900ms range frequently. We don’t often see applications impacted this heavily by shared hardware, but I wanted to use it as an example for how high your upscale threshold may need to be when on shared hardware:

If you’re in this boat: don’t panic. While these spikes look concerning and you may be tempted to reach for upscaling, these spikes are natural spikes. They resolve themselves without any scale changes and, outside of them, the app maintains a very low queue time (8-9ms). Upscaling to fight shared hardware issues can be futile — you’ll simply be adding another shared resource to your group! That new resource will have the same problems 😅. If you’re running on Heroku, this can also just be Heroku’s random routing giving you an unlucky hand!

The last piece of tuning your target queue time range is to simply ensure that you’re checking on it once in a while! Applications change over time — new features ship, hardware providers migrate to new machines, languages and frameworks become more efficient, etc. It’s a good idea to hop into Judoscale every couple of weeks to peek at your dynos’ performance and queue times. Judoscale exists to stay out of your way, but its charts can be extremely helpful as your application grows and changes.

And that’s it. You should now have all of the tools you need to fully assess your application’s queue time and determine a healthy range for it, based on a well-tuned upscale threshold and downscale threshold. Give it a shot and let us know how it goes!

If you made it through this whole article but haven’t actually signed up for Judoscale yet… kudos! We’d recommend you check out our forever-free plan — you can have free autoscaling forever and only need to install a simple Ruby Gem, Node Package, or Python ~~Cheese~~ Package.

P.S. We’re around to help you through this process too! Just click the “Help” button in the Judoscale UI and pick your style!

Outgrowing Heroku: How TeePublic Conquered Black Friday on Amazon ECS

Adam McCrea — Mon, 1 Apr 2024 00:00:00 +0000

Picture this: You’re running TeePublic, a bustling online marketplace where artists sell their creations. It’s Black Friday, and your website is swarming with customers eager to snag holiday deals. But there’s a problem—your hosting platform, Heroku, is struggling to keep up. Pages load slowly, errors pile up, and your phone won’t stop buzzing. It’s a nightmare scenario for any e-commerce site.

TeePublic isn’t just another online store. It’s a vibrant marketplace where artists share their designs, and shoppers find unique apparel and accessories. But with great popularity comes great challenges, especially during the holiday season when sales skyrocket.

Here’s Matt Tarantino, dev-ops tech lead at TeePublic:

During our holiday season—which starts Black Friday and runs through Christmas—we’re in total code freeze, just focused on performance.

Recognizing the need for change, TeePublic turned to AWS, with a bit of help from Judoscale. This wasn’t just about moving to a new hosting service; it was about ensuring the platform could scale dynamically, making every transaction seamless, regardless of traffic volume.

This is the story of how TeePublic conquered Black Friday by migrating from Heroku to AWS, turning their busiest sales day from a challenge into a triumph.

The Limitations of Heroku

TeePublic’s Marketplace app is the heartbeat of their operation, hosting thousands of artist storefronts and a dizzying array of unique products. On Heroku, the setup was straightforward for a Ruby on Rails monolith application—Heroku web containers (called dynos) serve the web app while worker dynos process background jobs via Sidekiq. Heroku also provided data services for Postgres and Redis.

However, as TeePublic’s community grew, so did the limitations of their Heroku infrastructure. Scalability, performance, and cost became significant concerns. During high-traffic events like Black Friday, the platform struggled to keep up. Heroku’s dyno model, although initially a boon, became a bottleneck; there was a hard cap on scalability.

We needed more dynos to support the traffic, but we couldn’t get them out of Heroku.

Heroku limits applications to 100 dynos, and TeePublic was hitting that ceiling during peak times.

Database connections were also an issue—Heroku’s Postgres database has a hard limit of 500 connections. TeePublic was using PgBouncer to create a larger pool of connections, but it still wasn’t enough.

We had a primary and three replicas to get around the Postgres connection limits. Our app had a bunch of code to be able to support multiple replicas. All of that was just extra overhead, and every replica adds additional cost.

These limitations underscored the urgent need for a more robust, scalable solution to handle peak loads without sacrificing performance or breaking the bank. It was clear: for TeePublic to continue thriving, a migration was necessary. AWS promised not just scalability and performance improvements but also a more cost-effective infrastructure suited to TeePublic’s expanding needs.

The Decision to Migrate to AWS

The decision to migrate to AWS marked a pivotal moment in TeePublic’s journey, signaling a commitment to scalability, performance, and operational efficiency. Leading this ambitious endeavor was the newly formed ops team, tasked with navigating the complexities of the migration and ensuring a smooth transition.

While many cloud providers can handle Ruby on Rails applications at scale, AWS emerged as the clear choice, primarily for its unmatched scalability, robust ecosystem, and the flexibility it offered for managing high-traffic loads.

The allure of ECS Fargate lay not just in its serverless nature but also in its familiarity. Several team members had prior experience with ECS, reducing the learning curve and ensuring a smoother adoption process. This prior knowledge, coupled with the desire for a solution that could automatically scale resources as needed, made ECS Fargate an obvious choice over other cloud hosting options.

When I had used ECS in the past, Fargate was not a thing yet—you had to maintain your own fleet of EC2 instances. And so when it came time to figure out where we wanted to go, serverless was just the easiest answer there.

Despite the confidence in AWS and ECS Fargate, the migration was not without its challenges and concerns. The team had to consider the potential for downtime, data migration complexities, and the learning curve associated with adopting new technologies and practices. There was also the overarching goal of maintaining the same level of performance, if not improving it, without significantly increasing costs.

Planning and Executing the Migration

The migration from Heroku to AWS was a meticulously planned operation. The dev-ops team, with Matt Tarantino at the helm, charted a course that would ensure a smooth transition without any hiccups.

The first step was to test the waters by migrating a small internal tool to AWS. This “guinea pig” project allowed the team to gain valuable insights and refine their migration strategy before tackling the main Marketplace app.

We started off migrating a very small internal tool. Went fine, didn’t really experience any issues with that, and we felt pretty good about it. So then we went and did Marketplace. That was kind of the big one that took the most time.

For a seamless transition, TeePublic leveraged Terraform—the popular infrastructure-as-code (IaC) tool—to manage their AWS resources. By codifying their infrastructure, the team could easily version, replicate, and manage the AWS setup, significantly reducing the risk of human error and streamlining the deployment process. Additionally, they developed internal tools to facilitate the developer environment setup, ensuring that the team could hit the ground running on the new platform.

Despite the team’s thorough planning, the migration process encountered several challenges. One major concern was the data transfer cost, since—unlike Heroku—AWS charges for external data transfers. They chose Crunchy Data to house their Postgres databases, and even though Crunchy Data was in the same AWS region, they initially weren’t VPC-paired. They resolved this with a private link to avoid the external data transfer costs, but it was a painful and expensive lesson.

Another hurdle was ensuring minimal downtime and maintaining data integrity during the migration. The ops team meticulously planned the database migration with the involvement of Crunchy Data for their Postgres databases. They employed real-time data replication from Heroku to AWS, facilitated by Crunchy’s expertise, and carefully scheduled cutover windows. This approach minimized the impact on the marketplace’s operation, allowing TeePublic to transition smoothly without compromising their service or data integrity.

The planning and execution of the migration to AWS were testaments to TeePublic’s commitment to providing a stable, scalable platform for their community of artists and shoppers. By addressing each challenge with thoughtful solutions and leveraging powerful tools like Terraform, the team not only navigated the complexities of the migration but also laid a solid foundation for future growth and innovation on AWS.

Autoscaling ECS with Judoscale

On Heroku, Judoscale was invaluable for TeePublic, providing dynamic scaling that kept their platform humming. However, when they migrated to AWS, they faced a challenge: Judoscale wasn’t initially available for ECS. TeePublic had to rely on manual scaling and overprovisioning to avoid downtime, and their compute costs were massive.

TeePublic worked closely with the Judoscale team to develop an autoscaler tailored for ECS, addressing a critical gap in their infrastructure. This partnership allowed them to implement a queue time scaling approach, moving beyond the limitations of manual scaling and traditional, less responsive metrics like CPU or memory usage.

Before we had started talking, there was a period where we were going to have to use CPU or memory to be able to determine auto scaling. And it wasn’t exactly what we needed. Could it have done the job? Maybe. But it’s not looking at queue time.

The impact of integrating Judoscale into their AWS environment was profound. TeePublic handled traffic surges seamlessly, improving site reliability and user experience—moreover, the autoscaler optimized resource usage, cutting nearly $10,000/month in compute cost by eliminating the need for overprovisioning.

This collaboration not only solved TeePublic’s scaling challenges but also paved the way for Judoscale’s expansion into AWS, highlighting the power of partnership and innovation in overcoming technical obstacles.

Black Friday: A Testament to Success

The ultimate test of TeePublic’s migration to AWS came during Black Friday Cyber Monday weekend (BFCM) of 2023. This event was not just a sales opportunity but a critical moment to validate the effectiveness of their new infrastructure. Remarkably, the weekend passed without a single incident, a stark contrast to previous years, when the team braced for issues and interruptions.

The difference between the Heroku and AWS setups was night and day. On AWS, TeePublic could effortlessly and without limit scale to meet demand, ensuring smooth and fast user experiences even at peak traffic. The auto-scaling capabilities provided by Judoscale played a pivotal role in this success, allowing the platform to adapt in real-time to the influx of customers.

The metrics from this Black Friday spoke volumes about the migration’s success. Matt Tarantino, the tech lead, hinted at the scale of improvement, noting that this Black Friday saw the highest number of orders in TeePublic’s history. The seamless handling of this sales volume and traffic on AWS marked a significant milestone for TeePublic, showcasing the robustness and scalability of their new hosting environment.

The Impact of AWS and Judoscale on TeePublic

The migration to AWS, enhanced by Judoscale’s auto-scaling capabilities, marked a transformative period for TeePublic, yielding significant operational improvements across the board. This transition not only addressed the immediate challenges of scalability and performance but also laid the groundwork for a more resilient, efficient, and developer-friendly infrastructure.

Feedback from the development team has been overwhelmingly positive. The move to Docker-based environments, both in production and development, standardized workflows and reduced the time spent on environment setup and troubleshooting. This shift not only improved productivity but also fostered a more collaborative and innovative engineering culture.

I definitely feel like we’ve come a lot further as an engineering group. Our tools have gotten better. Our just overall composition of everything, it just feels like it’s tighter, it’s cleaner.

From a financial perspective, the move to AWS, facilitated by Judoscale, led to notable cost savings and efficiencies. The dynamic scaling capabilities ensured that TeePublic paid only for the resources they needed when they needed them, avoiding the cost of overprovisioning while maintaining the flexibility to scale during peak times. This optimized approach to resource management translated into direct cost savings, making the platform not just more scalable and performant but also more cost-effective.

Reflecting on the migration, TeePublic’s tech team viewed it as a resounding success. The journey taught them valuable lessons in planning and execution and the importance of choosing the right partners and tools for scaling in the cloud. The move to AWS and Judoscale not only addressed their immediate needs but also laid a solid foundation for future growth and innovation.

Wrapping Up: Lessons and Insights from TeePublic’s Cloud Journey

TeePublic’s journey to AWS is a masterclass in strategic cloud migration. The phased approach, starting with less critical systems, allowed for a careful assessment of risks and a valuable learning curve. This careful planning, combined with the expertise of a dedicated dev-ops team, ensured a seamless transition and a robust foundation for future growth on AWS.

For companies contemplating a move from Heroku to AWS, TeePublic’s experience is instructive. If you’re pushing against Heroku’s limits and can dedicate resources to a specialized dev-ops team, TeePublic’s story demonstrates the potential rewards of meticulous planning and choosing the right partners.

I feel like AWS definitely gives you more flexibility in every area. Yes, there’s more that you have to kind of think about. But you’re in control of those areas.

Ultimately, TeePublic’s migration resulted in significant gains in scalability, performance, and operational efficiency. It’s a compelling testament to the strategic advantages of cloud migration, executed with precision and the proper support.

["How to Run Code (Safely) on Repeat Forever"]

Jon Sully — Thu, 11 Jan 2024 00:00:00 +0000

We found ourselves in a less-than-common situation: we had a chunk of code that we wanted to run nearly-constantly (at least once every couple seconds) but which should also run in a single-threaded style (subsequent runs of that logic shouldn’t overlap if any take longer to process). Let me explain…

The Use-Case

We take on a lot of traffic here at Judoscale. Almost all of it is real-time metrics for the various applications we autoscale — thousands of POSTs to our servers containing request and job queue times from our clients’ applications as they run.

Ultimately this data needs to be sorted, filed, and aggregated for it to be of any use to us. For performance reasons, our architecture for those POST-handling controllers is to, essentially, save the data quickly and process it later. More technically, we push the POST data straight into Redis and let a different piece of our architecture handle it. We want our POSTs to yield fast 200’s! (And they do — our average response time is just 10ms)

But that of course leaves the question: how do we ‘process it later’ at that scale and speed?

Well, before we get there, let’s talk about the constraints. Loosely, we have at least a thousand bundles of data to process every second. And those bundles can contain a lot of data, so processing may itself take a few moments. So first, we need this system to run often. Really often. Letting our fire-hose of data queue up is bad for memory, bad for processing times, and bad for Judoscale’s ability to scale up your application quickly. Second, given that processing this data requires non-trivial resources, we don’t want multiple copies of the processing to run at once — they might both attempt to process the same incoming data and/or step on each others’ toes. Which is wasteful and potentially bug-causing.

So, in short, we want our incoming data processing to behave like a single-threaded loop: a chunk of code that runs over and over sequentially, but which cannot step on itself since the current loop iteration must finish before the next begins. We want to fully-process all of our incoming data bits then repeat, making sure that only one actor is doing so at a time. We’re going to call this continuous code from here on out! Let’s dive in.

A Few Concepts

There are plenty of ways one could implement the continuous code concept, but we considered three. These break down into two camps: setups that make use of a background job system and setups that don’t. On the ‘use background jobs’ side, we’ll talk through continuous code via:

Self-re-enqueuing background jobs
A scheduler + global lock

And on the not-background-jobs side, we’ll cover continuous code via:

A forever-running Rake task

Though, it should be noted that each of these methodologies comes with its own pros and cons and, (I hear your groans) there is no single right answer for every application out there. Our goal here is to walk you through why we chose the best path for us. Yours might be different!

To Background Job or To Not Background Job

Before diving into the implementations, it’s worth talking about the tradeoffs in using your background job system for Continuous Code.

Generally, a background job system isn’t built for this task. Background job systems exist to allow you to run some code asynchronously outside of a web-request thread (a ‘background job’), but the guarantees and styles of these systems are different than the Continuous Code concept. For starters, background job systems tend to be multi-process, multi-threaded, or both — you want several different job runners to handle all the various jobs that might be fired off. This can stand at odds with Continuous Code, where we need to ensure that the next pass only begins after the current pass finishes. Additionally, background jobs aren’t typically designed to loop. They’re designed to run a specific chunk of code with particular arguments, mark it completed, and move on to the next job. Overcoming these differences in design is added complexity.

On the other hand, not using a background job systems means we’re not taking advantage of infrastructure and frameworks that we already run, develop for, and support. Inevitably, if we’re not using a background job system, we’ll end up implementing some sort of new runner that requires its own processing power and upkeep. This too is added complexity.

These tradeoffs will become more apparent as we talk through the specific implementations we considered, but keep them in mind along the way.

The Self-re-enqueueing Background Job

The Idea

We build a job that does its work then immediate re-enqueues a new copy of itself back into the job queue. Something like this (a Sidekiq and Rails example):

# ~/app/jobs/cycle_job.rb

class CycleJob
  include Sidekiq::Worker

  def perform
    results = SomeDatabaseQuery.run
    aggregated_data = Aggregator.call(results)
    AggregatedStuff.insert(aggregated_data)

    CycleJob.perform_async
  end
end

(And, of course, we’d want some error handling to ensure that the job gets re-enqueued even if the current iteration errors for some reason.)

Does it satisfy the “run often” premise? Well… usually. At first glance it appears that it’ll run nearly as fast as the work can be done, but we should be careful with that assumption. It’s true that the next copy of the job gets enqueued about as fast as the work can be done, but we should remember that there’s a latency between when a job is submitted and when it’s picked up by a job-runner. We should also remember that the job is submitted to a queue. Which is to say, there could be other jobs ahead of it! But let’s assume that our application is following healthy standards for background jobs and keeps low queue times. Does it ‘run often’? Yes.

Does it satisfy the “don’t run multiple copies at a time” premise? Well… usually. (I know — frustrating, right?) Again, at first glance it appears as though one job kicking itself off again would grant us the guarantee that never more than one copy is running at a time. And that is true almost all of the time. So for now, we’ll just say yes.

The Problem(s)

I should note first that we ran this exact setup at Judoscale for years and became very aware of the following two ways this structure can fail. Generally in the middle of the night, of course 😅

The chain is broken. A job enqueuing the next copy of itself repeatedly forever is essentially a chain. Each link forges the next. If for some reason, somehow, one link doesn’t create the next, the chain stops. It has no recovery at that point. The only recourse is for us to manually create a new chain again.

Now, in all theory, this shouldn’t happen! We have error handling! We have smart tools! Yeah… and yet. No matter how far we dove down the rabbit hole, the self-enqueuing model always yielded a broken chain after some amount of time. Could it be our own incompetence? Entirely possible. But regardless, we lost confidence.

The chain is.. duplicated? Continuing our chain metaphor here (including moments of inexplicable results), we occasionally observed situations where somehow we now had two chains of the same job going. This implies that somewhere along the way, a particular chain link decided to create two links after itself instead of one. And thus, we’ve now broken the “don’t run multiple copies at a time” premise!

The Benefit(s)

The one stand-out benefit of this setup is the lack of added complexity. You don’t need any kind of third-party scheduler, you don’t need any additional processes, and you don’t need to make your architecture more complex. You simply have a job that kicks itself off again when it’s done. And it works almost all of the time. That’s a real complexity benefit! If your team and/or use-case is open to the premise that the job might just need to be manually started again once in a while, or you’re working on an early-stage proof-of-concept, this can be a real win. It’s faster to implement, easier to understand, and quicker to get running than any other option. (Just make sure you have reliable monitoring in place!)

Our Verdict

The self-re-enqueuing background job is intuitive, but the structure does come with a cost in the reliability department. Some of this comes back to an earlier point: background job systems weren’t intended to offer the guarantees we’re after, so trying to get those guarantees yields friction and fringe-cases. Should the chain ever break? No! Should the chain ever branch/duplicate? No! Do those things still happen somehow? …yes.

Given that our use-case is a critical back-bone for our system — a chain that should never break — we decided to move away from this pattern. That said, we ran this setup for years and, while we did have chain-breaking issues, they were rare enough to run this setup for years 😉. This is likely the right setup for lots of early-stage applications and/or lesser-critical needs.

The Scheduled Background Job with a Global Lock

That’s a mouthful!

The Idea

This one isn’t too far from the above, but instead of one instance of a job kicking off the next, we instead have a scheduler process running somewhere that’s kicking off jobs itself on some very quick schedule. Maybe something like this:

# ~/clock.rb

class Clock
  include SomeScheduleFramework

  every 2.seconds { CycleJob.perform_async }
  every 5.minutes { SomeOtherJob.perform_async }
end

# ~/app/jobs/cycle_job.rb

class CycleJob
  include Sidekiq::Worker

  def perform
    # Lock against other attempts
    return unless Rails.redis { |r| r.set "cycle-job-lock", "busy", nx: true }

    results = SomeDatabaseQuery.run
    aggregated_data = Aggregator.call(results)
    AggregatedStuff.insert(aggregated_data)

    # Unlock for next pass
    Rails.redis { |r| r.del redis_key }
  end
end

But there’s a bit of nuance here: we’ve introduced a new chunk of architecture, the Clock process, as well as a locking system.

The Clock process is a simple idea that we’ve written about before — an additional process that exists solely to kick off asynchronous jobs at various time intervals. This is generally just a Ruby implementation or clone of Cron. But it is additional overhead and is best run as a separate process when deployed in a production environment. This gives it the isolation and consistency needed for reliability. And separate clock processes, like Cron, are indeed very reliable.

The second piece of nuance here is the locking. The code above shows an example of how one might accomplish this locking using Redis, but any style or implementation of pessimistic locking should work just the same. The idea is simply that, while the Clock should only kick off the job once every two seconds, it’s possible that one copy of the job may run for longer (for reasons unknown). If that were the case, the next copy of the job would realize the lock is still checked out and skip itself.

Does it satisfy the “run often” premise? Absolutely. And the clarity in the syntax is refreshing. As it reads in plain English, “every 2 seconds do the thing” is extremely easy to grok.

Does it satisfy the “don’t run multiple copies at a time” premise? Yes, we can guarantee that multiple copies of the job won’t run concurrently with our pessimistic locking approach. Technically this doesn’t mean that another loop of the code will begin as soon as the former finishes, but since we’re running quickly enough (every 2 seconds in our example), that should be fine.

The Problem(s)

This approach is pretty safe; it shouldn’t have ‘problems’ in the sense that the code may stop running and/or need manual intervention. Clock processes (as with Cron) are regarded as extremely reliable, as are pessimistic locking systems through tools like Redis, MySQL, or Postgres.

That said, there are a couple of concerns worth keeping in mind with this approach. The first is that you’ll need to spin up more infrastructure — an additional process that will run indefinitely. While often minimal, this is still cost and complexity. The second is the general mixing of patterns. If you’re going to run a Clock process, you’ll probably want to do that for all of your scheduled background jobs. Mixing a Clock process for just your continuous code with another scheduling system for your other background jobs is likely not worth the mental complexity.

On the other side of that coin, if you’re already running a Clock process for your scheduled background jobs, this setup may be an easy add-on for your application!

The Benefit(s)

While this approach brings a bit of added complexity compared to others, the biggest benefit is definitely its reliability. As I’ve mentioned a couple times, Clock processes in Ruby (there are a few gems that do this) are essentially clones of the venerable Linux OS-level scheduler, Cron. Both Cron and its Ruby counterparts are tremendously reliable. They just work.

Additionally, this approach requires us to implement pessimistic locking to guarantee single-threaded-style execution. This too is added complexity, but once again, extremely reliable. Whether you opt to build the locking using Redis, MySQL, Postgres, or Sqlite, all of these tools will handle our second-by-second locks with ease.

So, while this approach may have one or two layers more than other Continuous Code implementations, it remains the most reliable overall. And, to us, that means the most peaceful.

Our Verdict

Compared to the self-enqueueing job approach, the scheduled background job (+global lock) approach is harder to grok and it requires a bit more tooling. But it’s worth it. The added framework and tooling all work to support a high level of reliability and consistency — both of these yielding peace for a development team. The entire premise of Continuous Code is implicitly built upon the “it should just work” mindset, and this approach just works.

This is ultimately the pattern we moved toward with Judoscale. We haven’t seen a single issue since. It’s actually been really great.

The Forever-Running Rake Task

The Idea

While it’s not the approach we ended up going with, it’s the alternative we almost went with. The idea is essentially to not use a background job system at all. We said at the beginning of this article:

in short, we want our incoming data processing to behave like a single-threaded loop

So… why not try exactly that? If we encode our continuous-code into a Rake task that simply loops forever, that would be a single-threaded loop. That might look like this:

# ~/lib/tasks/continuous.rb

namespace :continuous do
  task aggregate: :environment do
    loop do
      results = SomeDatabaseQuery.run
      aggregated_data = Aggregator.call(results)
      AggregatedStuff.insert(aggregated_data)
    end
  end
end

Then we can spin up a new process in our production environment that just runs that task: bundle exec rails continuous:aggregate and voilà!

Does it satisfy the “run often” premise? As often as you prefer! By default it will run as fast as the steps take to complete, but we could also add a sleep call to slow it down as preferred.

Does it satisfy the “don’t run multiple copies at a time” premise? Yup! By constraining all of our code into loop, we benefit from the guarantees of a single-threaded loop — each iteration of a loop must complete before the next iteration begins.

The Problem(s)

Like the prior approach, this one is pretty safe too. We shouldn’t have problems with this approach that ultimately impact whether or not the code is executed. Just a few concerns around why you might, or might not, choose this approach.

The first of those concerns is simply inefficiency. It’s very likely that you’ll want to give this task a speed-limit — a sleep call that spaces out each iteration of the loop just a bit (be it 250ms, 1 second, or 5 seconds). That said, the time spent sleeping is ultimately wasted processor cycles. That’s wasted money. That’s resource hosting costs we’re paying for but not actually using! Ultimately that may not be very much money and/or may not be much sleeping depending on how you balance the code, but it is something we wanted to point out.

The second concern with this pattern is that it doesn’t scale very well. And I don’t mean scaling in the sense of running multiple dynos/services for this process (don’t do that!) but instead scaling with the number of Continuous Code jobs your application needs to run. Judoscale has four or five different chunks of code that need to run continuously. If we took this approach, we’d need to spin up four or five new processes, run the four or five Rake tasks that house those codes, and deal with the inefficiency costs multiplied by four or five. For applications with only one chunk of code that needs to run continuously, this approach is probably great! We just didn’t want to spin up all those processes.

The last concern with this pattern is that it doesn’t use a background job system to do the work. And that can be a pro at times too, rather than a con. But, like most things, it can be both. The reason it falls on the con/concern side is that it expands the mental footprint of the app. When you think of “oh the work is being done in the background”, now you have to retain and recall that there are two totally separate systems and means by which the work may be getting done ‘in the background’. It’s just more mental overhead.

The Benefit(s)

Like the self-re-enqueueing background job approach, this approach has the benefit of simplicity. Maybe even more-so since grokking this approach skips background job systems entirely — this is a single loop of code that runs forever outside of any background jobs. It’s intuitive, runs just like you’d run the same code in a local development console, and reads simply.

Additionally, this approach gains all of the benefits inherent to a Rake task. When using a hosting service like Heroku, if the task were to ever fail and crash, Heroku will automatically restart it. That’s great! Many platforms are well-prepared for these sorts of long-running tasks and using them to implement Continuous Code gives us those benefits out of the box.

Our Verdict

We really liked this approach for many aspects, but ultimately decided against it. While the simplicity and clarity brought by the syntax and execution style felt really nice, we didn’t love spinning up so many processes in our apps. Judoscale just has too many chunks of code that need to each run continuously for this approach to still feel simple.

That said, we do recommend experimenting with this approach if it might fit your application’s needs. It wasn’t the right fit for us, but it might be for you!

Wrap it Up

So there you have it. Three unique approaches to implementing the Continuous Code concept, each with their own particular tradeoffs, costs, and benefits. Hopefully this breakdown gave you some insights and/or questions to think through for your own codebase.

👀 Note

Sidekiq 7.3 launched in July of 2024 with a new feature, “Iterable Jobs”. This lead to our novel idea of infinitely iterable jobs, which represent a fourth option (that may be the best of all worlds!) after the three in this post.

We recommend reading our overview of Sidekiq Iterable Jobs here first, then checking out the new infinitely-iterable idea here second!

["We Were Wrong: Don't Use Heroku Scheduler"]

Adam McCrea — Sun, 10 Dec 2023 00:00:00 +0000

After seven years, we’re finally ready to say goodbye to Heroku Scheduler. It has served us well enough—there’s no hate here. But we won’t be using it in future projects, and we don’t recommend you do, either. This is the story of why we’re switching, what we’re switching to, and an exploration of alternatives.

What is Heroku Scheduler?

Heroku Scheduler is a free Heroku add-on (provided by Heroku) for scheduling recurring tasks or jobs. You do it through a web UI that looks like this:

Scheduler lets you add an unlimited number of jobs that repeat on a schedule of every 10 minutes, every hour, or daily. We’ve been using it at Judoscale for seven years. We started with it because it was free and super simple, and indeed, it’s very easy to use and has been super reliable for us.

So what’s the problem?

Every 10 minutes is not good enough

I generally have a mindset of embracing constraints as much as possible. I’m a Ruby on Rails developer who fell in love with “convention over configuration”, happily rolling with framework defaults whenever possible. Constraints and conventions free our minds to focus on creating value for our customers, rather than bike-shedding solved problems.

For years, I’ve taken the same view of Heroku Scheduler. In other words, I’ve let the constraints of the product dictate how I architect my scheduled jobs, instead of vice versa. “Every 10 minutes is good enough!”, I would say.

But it’s not good enough.

The core of our product is an autoscaler that needs to run every 10 seconds (not minutes). We also support scaling on a schedule, and those schedules need to be resolved every minute.

For years, we’ve hacked around this by having some of our jobs re-enqueue themselves instead of scheduling them with a job scheduler. This is a brittle system! If a Sidekiq process is terminated in an unsafe way, the job will stop running. Ask me how I know!

Fortunately, we have very good monitoring and alerting in place, so it’s been a quick fix the few times this has happened. But even a few minutes of downtime is unacceptable for a core part of our product.

But it’s not just that.

Your job schedule should be version-controlled

We did use Heroku Scheduler for many other recurring tasks. We have jobs that run nightly for things like calculating daily usage for our customers, reconciling our systems with our platform partners, and sending notifications. Heroku Scheduler has worked really well for this!

But I can’t tell you the history of our job schedule, and I can’t guarantee that our staging and production schedules are in sync. Heroku Scheduler is managed via a web UI, not code, so nothing is version-controlled, and nothing is automated when we deploy.

This hasn’t caused any major problems for us, but it’s annoying, and it’s just a matter of time until we forget to schedule an important job after a deploy. Our workaround is to add notes in our pull requests so we don’t forget to schedule the appropriate jobs in staging and production.

It’s time for our job scheduler—and our workflow around it—to grow up!

Alternatives to Heroku Scheduler

There are many ways to run recurring, scheduled jobs, and there’s no one clear replacement for Heroku Scheduler. The alternatives fall into four categories:

Good ol’ Cron
Other add-ons and third-party services
Framework-integrated job schedulers
Dedicated clock processes

Cron vs. Heroku Scheduler

Cron is the classic solution to schedule tasks, as it’s built into Linux and has existed for decades. It’s very simple: you manage a “crontab” configuration file, and Cron runs your jobs for you.

Cron is not an option on Heroku, though, because Heroku dynos are ephemeral and stateless. The schedule also lives outside of the app source, so it’s not version-controlled in the way we’d like.

And so we move on…

Other scheduling add-ons for Heroku

Heroku Scheduler is the free add-on provided by Heroku, but there are a few third-party add-ons that are more capable, notably Cron To Go and Advanced Scheduler. These services do solve the problem of Heroku Scheduler’s limited scheduling options, but they don’t solve the problem of version-controlling our schedule—that’s still handled via a web UI.

Let’s keep exploring…

Integrating a scheduler with our background job system

Many background job systems—in our case, Sidekiq—have a mechanism for scheduling recurring jobs. The open source version of Sidekiq doesn’t handle this natively, but Sidekiq Enterprise does, and there are several third-party extensions to Sidekiq that add scheduling capability, the most popular being sidekiq-scheduler and sidekiq-cron. Other background job processors like Good Job, Delayed Job, and Celery have similar extensions.

One significant constraint with this approach is that each scheduled job must be written as a background job—for us that means only Sidekiq jobs can be enqueued on a schedule. This is actually a good thing! Sidekiq (like most background job systems) has excellent support for error handling, retries, prioritization, and scaling. By keeping the actual work of our scheduled jobs in Sidekiq, we get all those benefits for free. The scheduler itself should be as lightweight as possible so it doesn’t need to scale and minimizes the opportunities for errors.

Even with Heroku Scheduler, we were already embracing this separation of scheduling and execution, where Heroku Scheduler handled the scheduling, but Sidekiq did the actual work (execution).

Embedding a job scheduler with a background job processor isn’t without its quirks, though. The scheduler will run in every Sidekiq process, and we’re always running several worker dynos. There are some workarounds here, but they’re not pretty. Sidekiq Periodic Jobs (in Sidekiq Enterprise) handles this automatically by electing a “leader” process that handles the scheduling, but we’re not currently using Sidekiq Enterprise, so it’s not yet an option for us.

Beyond that, there’s a bit a “messiness” in having the scheduler embedded in the background job processor. If we encounter issue of scale or resource usage, there’s no way to monitor the scheduler on its own.

These are pretty minor concerns, though. Using an embedded scheduler is a viable option, especially since it leverages tools we’re already using.

Let’s take a look at one more option…

Using a dedicated clock process

A “clock process” is an executable program that runs indefinitely (a daemon) whose sole responsibility is triggering scheduled tasks. It’s conceptually similar to Cron, but written as an executable that lives in your codebase. The clock process is run in an isolated environment—in Heroku-land, that means it gets its own line in a Procfile.

In our case, we’re running a Ruby on Rails application, so we might reach for something like clockwork or ruby-clock. Both of these tools define the job schedule in code, so it’s version-controlled (yay!). The scheduling itself can use Cron syntax or natural language, with the ability to run as frequently as every second (double-yay!!).

The only real downside of this approach is the extra cost of running a dedicated dyno for the clock process. We’re at a scale where that extra cost is negligible, but it might be a concern for a new app or a hobby project.

So where did we land?

Running a dedicated clock process ultimately felt like the cleanest approach here. Conceptually, we like the clear separation between “scheduling” and “executing”. These are two different responsibilities, and it makes sense for them to run as separate processes.

We chose ruby-clock for our clock process. A few reasons:

The scheduling DSL is super simple and does exactly what we need.
The README is clear and concise.
It has substantially fewer lines of code than clockwork (the most similar competing tool).
The maintainer appears to be responsive to issues.

Ruby-clock was a breeze to set up, and we now have a version-controlled job schedule that’s much more flexible than Heroku Scheduler.

Living without Heroku Scheduler

We’re a couple weeks into our migration from Heroku Scheduler to ruby-clock, and we couldn’t be happier with the change. Our scheduled jobs are running flawlessly, and our schedule is a readable, version-controlled work of art.

Personally, I won’t be using Heroku Scheduler on future projects. It does its job just fine, but there are plenty of other options (many of them free) that do the job better. They’re more maintainable, more flexible, and they make me a happier dev.

How We Monitor Our 1000+ RPS Heroku App

Jon Sully — Fri, 1 Dec 2023 00:00:00 +0000

As a leading provider of high-performance autoscaling, Judoscale handles quite a few requests per day — over a thousand every second, 24/7! This is probably on the lower end of “a high-traffic app” (depending on who you ask) but we nonetheless want to share our monitoring strategy and why we use the tools we do. Making sure that a hundred million requests per day get served quickly, accurately, and correctly requires some thought! Since our service ensures that other teams’ and companies’ apps remain available and responsive, it’s extra (and meta?) important that we keep our eyes peeled! Here’s how we do it… ice cream sundae style. Because ice cream is awesome.

For added fun, all of today’s diagrams will be provided courtesy of DALL-E in the post-impressionist artistic style. My mileage is going to vary.

It Starts With Sentry

Just like a proper sundae, we need a bowl. Something that holds all of our great things together in one place and provides a solid external boundary in case things get wild. That’s what a production error monitoring/tracking system is like! It can’t stop bad things from happening, but it at least provides some containment when they (inevitably) do!

So why Sentry, in particular? Why not Bugsnag or Rollbar or any of the other competitors out there? It’s not because those products aren’t great or that they’re missing features — we don’t even use half of the features that Sentry has added over the years. There are a few simple reasons we continue to stick with Sentry:

Its primary (original) feature still works very, very well (error tracking)
We’ve found it to be highly stable and available whenever we need it
It alerts us promptly but doesn’t suffer from being too noisy
New features the Sentry team has added over the years haven’t cluttered or busied the UI to the point where the original feature feels lost
Nobody has presented a compelling alternative that delivers a notably better solution to the original problem (error tracking)
It allows us to selectively ignore some errors on an “until it happens X times per hour” basis, which is great for some recurring issues

That’s a tough list! Props to Sentry for keeping the high quality of their main product and not letting it get lost amongst its other features. At its core, Sentry is a great error tracker. And that’s what we use it for. It’s our faithful, solid, bowl.

We’re Going to Need a Spoon

If a bowl contains the unknowns, I suppose it fits our metaphor to cast our logs parser and explorer as a spoon! What’s a ~~log system~~ spoon for if not allowing us to navigate through the rough edges and help us find the sweet stuff?! DALL-E, go!

Brilliant.

We’ve tried a few avenues for our logs over the years, generally constrained by a few realities:

We generate a lot of logs. Even accounting for just Heroku’s router logs, it’s almost 200GiB per day. Add in Rails logging and it’s… a lot more.
We don’t manually dig through logs all that often… needle-in-a-haystack kind of thing
Log systems are expensive!

So, while we were on Logentries’ drop-in Heroku Add-on for several years, last year we looked around and realized that BetterStack (which was LogTail at the time) offered more features for a cheaper price with a better UI! And it turns out that BetterStack’s log filtering and searching is way more powerful than what we’d experienced before. You will feel much better about your application when the phrase ‘digging through your logs’ doesn’t strike despair into your heart. BetterStack nails that for us. It also has no problems ingesting all of our data and can display it to us in real-time. It’s a sharp and quick spoon!

Now that we’ve got the bowl and spoon (call them the ‘logistics’ tools), let’s look at the actual ice-creams in our monitoring sundae.

Ice Cream 1: AppOptics

Given that AppOptics is a visualization and aggregation layer on top of our application logs, it fits that we’ll cover it just after BetterStack! But we need an illustration first. Let’s call AppOptics our first scoop of ice cream in our monitoring sundae (though there’s no real particular order):

What does AppOptics do? In short, it reads our logs to generate visualizations and graphs based on aggregate data. That’s a mouthful. Logs go in, charts come out:

Those are a few of the charts we keep up for our background jobs’ health, but our main production dashboard has 28 different charts on it! If there’s some piece of data or some metric we want to keep an eye on, we build a chart for it in AppOptics. Let me show you the zoomed-out view:

But why AppOptics in particular? In truth, we don’t have any grand affinity to this tool over others. And there are lots of others. We chose AppOptics because it has a simple drop-in interface via their Heroku Add-on, it handles our log volume with ease — and, yes, because the charts look nice 🙂. AppOptics also offers much better pricing for smaller teams like ours. We’re not a giant enterprise business… there’s just two of us!

Thanks to its easy Slack alerts, AppOptics gives us a great holistic view of our production environment and pings us if things aren’t going as-planned. There are probably other tools out there that accomplish these goals, but we’ve had a great experience with AppOptics! It’s been smooth, like a great scoop in a sundae.

Side-note: you would not believe how many attempts and prompt-tweaks it took to get DALL-E to generate an image of just ONE scoop of ice cream in a bowl. More than twenty tries. It only knows how to generate images with lots of ice cream. And, honestly, that’s my preference in life too… but sheesh!

Ice Cream 2: Scout

While AppOptics lets us visualize terabytes of data with ease, sometimes we want to dig into the Rails stack itself and what’s going on from the perspective of Rails and its internal layers. There are a lot of Application Performance Monitoring (APM) tools out there — NewRelic, Datadog, and countless others. We prefer Scout. It’s a crunchy and exciting treat that complements AppOptics’ charts with pizzaz. Much like this scoop of yum!

The sprinkles are obviously the bursts of joy that come from the Scout UI, am I right? Okay, I’ll tone it down. But we really do love Scout as an APM tool. Its dashboard design makes grokking traffic patterns easy and understanding exactly which layer of the stack is slow (DB vs. view rendering etc) much more approachable. It also comes as a Heroku Add-on, which means install is extremely simple for us. It’s a great tool. Granted, Scout’s pricing structure may not be as favorable as other products depending on your team make-up and size, but we’re not switching any time soon! We love these views.

Oh, and if you too have a lot of traffic, make sure you familiarize yourself with wrapping the Scout SDK in sampling. It’s a great technique to keep in your back-pocket. We’d run through our Scout transaction limit very quickly if not for sampling.

While some tools we use without much strong preference, Scout is a tool that we deliberately choose over its competitors and would again tomorrow. It’s not that the competitors aren’t great, it’s that Scout has won our full attention! And just like a chunky scoop of the frozen stuff with sprinkles on top keeps us engaged, so too does Scout’s oversight of our application layers!

Ice Cream 3: Judoscale

You better believe that Judoscale uses Judoscale! Aside from being an excellent automatic scale-adjuster, the Judoscale dashboard and alerts also make for a great monitoring tool. Like a third scoop of ice cream in a sundae can add that pop of accent-flavor, Judoscale works in conjunction with AppOptics and Scout to reveal a few key insights — just like the top scoop in this growing bowl!

The key is to understand queue time. With a single glance at our Judoscale dashboard for our web process, I can be confident that we’re at the right scale count for the number of requests we’re currently receiving and how many resources each request is taking.

Similarly, if we’re getting errors or something is going wrong, I can know from a brief look at our Judoscale information whether that issue may be a scale problem (that Judoscale is likely already adjusting) or, more importantly, that it’s not a scale problem. If we’re seeing production issues but our queue time remains low, I can rule out our scale being the source of that issue. That’s powerful!

This article isn’t intended to be a sales pitch (and it’s not) but we do use Judoscale as a critical component of our monitoring workflow to simply know if our scale is right for the current time. We built it because we couldn’t find any other tools out there that would accomplish that. Especially for background jobs — it’s extremely helpful to open the dashboard and realize, “oh okay, nothing’s wrong, we just had a temporary backup and are currently scaling up to take care of it.” Moments like this:

Judoscale simply couldn’t be Judoscale without Judoscale! That’s why we love Judoscale. 😏

Hot Fudge All Day!

We’ve covered system monitoring, application monitoring, scale monitoring, and logs-based monitoring — let’s zoom out! How about monitoring your overall network and what-what-where is hitting your system? We use Cloudflare for that. Aside from managing our DNS and having a zillion other tools, Cloudflare’s network analytics are pretty great! Obviously continuing our now-painful metaphor, network monitoring is like the hot fudge on top of our other great tools:

Now, truth-be-told, we don’t hop in and check our network monitoring all that often. But Cloudflare’s UI and easy access to all of our network data (as our reverse-proxy DNS provider) makes it painless to give it the responsibility of ‘network monitor’ too!

We’ve even been able to leverage the advanced searching and filtering tools to help debug customer issues (turns out they had some rogue dynos running in another region they didn’t know about)! Using Cloudflare for network analytics is just too easy of a ‘win’ to not use it. We recommend giving it a shot and seeing what your own Cloudflare data looks like. Experience some ~~hot fudge~~ network monitoring!

Bonus: The Cherry on Top

With our bowl, spoon, scoops, and hot fudge all situated, we just need a cherry on top to complete this monitoring sundae. Holding true with the imagery, we’ll top it off with a super small, super simple tool that sparks joy: uptime monitoring. A tiny concept with a big impact.

Like a cherry on top of a sundae, uptime monitoring is simple but powerful: a third-party service that pings our servers fairly often just to make sure they’re active, responding, and serving requests correctly. While we’ve used different products for this in the past, we’ve actually settled on BetterStack here. We’re already using BetterStack for log storage (see the Spoon breakdown above), and uptime monitoring comes out-of-the-box with BetterStack. Might as well use the tools we already have!

I feel like it’s worth calling out some praise for their other tooling too, even if it’s not ‘monitoring’. In addition to replacing both our Log Storage and Uptime Monitoring, BetterStack also comes out-of-the-box with incident / on-call escalations, public, hosted status pages, and a few other goodies. BetterStack replaced several tools for us!

So that’s the ~~sundae~~ stack:

Sentry (error tracking)
BetterStack (logs and uptime monitoring)
AppOptics (infrastructure / general monitoring)
Scout (application performance monitoring)
Judoscale (scale monitoring)
Cloudflare (network monitoring)

A full suite of tools that keeps us aware of our overall application health with ease and beautiful graphs. That’s how we monitor our 1000+ RPS Heroku app!

Now, if you’ll excuse me, I’m craving some ice cream!

["How Our Amazon ECS Autoscaling Works"]

Jon Sully — Thu, 16 Nov 2023 00:00:00 +0000

Our new autoscaling service for Amazon ECS services and clusters is live! 🎉

But how does it work? It’s proprietary, of course… (prepares NDA)

Totally kidding. We’ve gotten several questions around the Judoscale/ECS integration, so we wanted to write a quick post that walks through some of the basics. Let’s start with the adapter.

Adapter and Queue Time

Running Judoscale requires the installation of an adapter into your application, and we offer several: Rails, Express, Django, Flask, and more. We’re excited to bring them to Amazon ECS! While the adapters do several background operations to track your various systems’ queue times, tracking queue time from incoming web requests requires a bit more configuration.

Our adapters passively and transparently read a special header present on incoming web requests: X-Request-Start. Amazon’s various routing systems don’t add this header for us the way some other platforms do. So we need to add it ourselves!

While there are several ways to accomplish the header addition, and many of them do work with our system, the method we recommend is the Sidecar Container pattern within the Task Definition of the web-server process. If your particular application manages incoming requests and routing in another style, feel free to configure the header there and we can help you determine if it’s working properly.

In a quick walkthrough of the Sidecar pattern, let’s start with a simple Task Definition that just runs a plain Rails Server process. It would look something like this:

In this case, the Task Definition has a single container (the Rails Server process) and that container fields incoming requests on port 80 directly. The Sidecar pattern essentially injects another container (the ‘sidecar’) into the Task Definition which acts as a proxy / intermediary before the request makes it to the application. That looks more like this:

The NGINX config at play here is extremely minimal:

server {
    listen 80;

    location / {
        proxy_set_header X-Request-Start "t=${msec}";

        proxy_pass http://localhost:3000;
    }
}

And, if this specific combination (listen on 80, pass to 3000) matches your needs, you’re welcome to use the publicly-available container we prepared here. If you prefer to build your own, you need only two lines of a Dockerfile to build the fully functional container (where nginx.conf is the above config):

FROM nginx:latest
COPY nginx.conf /etc/nginx/conf.d/default.conf

Once your sidecar is running, the Judoscale adapter should start reading the request header and observing request queue times automatically.

AWS Permissions and ENV

The second piece of the how-it-works puzzle revolves around AWS platform-level setup. This consists of three steps that deal less with your application code and more with your infrastructure itself, but all three steps are guided and automated by Judoscale (though not required to be if you prefer custom).

ECS Read Permissions

When you create a new Amazon ECS team in Judoscale, you’ll be greeted with this screen:

This is our default automation step that offers to run our pre-fab CloudFormation template for you. That template is linked directly in the view and you can see it here.

The tl;dr: is that this script simply creates ‘allow’ permission rules, specifically for Judoscale, for ecs:Describe*, ecs:List*, iam:ListAccountAliases, and account:ListRegions. These permissions allow us to populate the clusters list in the screen that follows:

And, upon clicking “Link” for a cluster, the same permissions allow us to list out and prepare the services within that cluster:

It’d be tough to autoscale an app we can’t read!

Judoscale ENV

Once you select a Service to link and get your framework-specific Judoscale adapter installed, you’ll be given a unique ENV value. That’ll look something like this in the UI:

This ENV value needs to be present in any task/container running a Judoscale adapter, but Judoscale doesn’t dictate the means by which you implement that requirement. We’ve worked with teams that hard-code the value into their Task Definitions, teams that add the key-value pair into their Terraform setups, teams that use a Parameter/Secrets Manager structure, and plenty of others. We leave that up to you, but we’re here to help if you’re not sure which path to take! (Did you know we have open office hours?)

Once you have the Judoscale adapter running and the environment variable in place, both the “Finished and Deployed” button and the page background will be your feedback confirmation. Clicking “Finished and Deployed” will either give you an error message of “We haven’t quite seen data come in yet…” or it will confirm that we are seeing data come in from your Service. Similarly, the charts in the background will begin moving and changing in real-time to reflect your Service’s traffic and queue times once your setup is working properly.

Write Permissions

Alright, so at this point we’ve got our Secret Sauce (™️) mostly formed — we have the AWS read-permissions we need to know about your clusters/services, we have the Judoscale adapter running in your application’s runtime, and we’ve confirmed the adapter is successfully reporting queue times… now we just need to scale! Automatically, even 😉

Changing your service’s scale count requires the last piece of first-time setup: AWS write permissions for your Service(s).

The first time you click the “Autoscaling On” switch for a given service, you’ll be presented with the final modal: another automation to add write permissions:

Once again that script is linked in the view and you can see the raw source here.

But the tl;dr: on this one is that we update the same permission (role) we created in the original “read permissions” script and add an ‘allow’ for ecs:UpdateService only. That singular permission is what authorizes our platform to change your service’s scale count, and we do that by simply setting the service’s “Desired Scale Count” field. That’s it!

The Fully Working Machine

We’ve got all the permissions and data sorted out, and autoscaling is running smoothly at this point! So let’s get back to the primary question; how does it work?

Let’s take an illustrative approach… let’s say you’ve got a small ECS cluster — just a web service and a single worker service:

But let’s not be too simple — we have multiple task instances running in each of those services, so our model might be better represented as so:

The first layer we’ll add here is the Judoscale adapter: its job is to transparently report queue time data from each individual instance in any given service to Judoscale:

As the data reaches Judoscale, it’s continuously scanned and monitored at a per-service level to ensure that the service is staying within its set scaling parameters. That data is also exposed in the real-time Judoscale UI so you can observe and understand your system health directly:

And as soon as one of your services begins reporting queue times that breach your chosen autoscaling settings, Judoscale automatically adjusts your scale count on AWS to compensate.. quickly. We generally prompt AWS to upscale within 10-20 seconds of detected slow-downs! Let’s illustrate that like this:

And that’s essentially it. The fully operational autoscaling machine is simple at its core: receive lots of data, continuously scan it all in real time, find breaches and outliers, and adjust the scale for services that need more (or less) instances depending on their reported queue times (the only truly reliable metric to scale on). Smooth and simple.

Thanks for tuning in to this little how-it-works! We’re really excited to GA this integration and we’re looking forward to supporting the teams that have been asking for it. Amazon ECS on Judoscale is a new frontier for fast, extremely-responsive infrastructure, and it’s ready right now. Scale on, friends!

The Best Caching Strategy On The Web

Jon Sully — Fri, 20 Oct 2023 00:00:00 +0000

It’s human-caching.

Hear me out.

We talk a lot about caching in the web-development ecosystem and that attention is usually well-warranted. Caching is a neat idea that helps our apps behave quicker in most situations (when well tuned). View fragment caching, database query caching, class variable memoizing, E-tagging, etc — these are all neat forms and implementations of the caching premise:

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster

Caching is just storing a copy of some data somewhere that’s faster to access than the original source. So what is human caching? And why is it the best caching strategy for any web app?

Human caching is simply the moment when one of your users doesn’t need to open your app because they remember the piece of information they were seeking. The data is cached in the mind 🧠. Performance wise, it’s unbeaten. Even a 50 millisecond request can’t beat the speed of neurons 😉

I know… this might not sound novel or groundbreaking, and this is a half-way gag post, but there’s actually some deeper insight and value to be had here. The human cache concept may actually be an analogue for how satisfied your users are with your application.

When the human cache has a hit, it means you’ve done a good enough job at delivering your key value proposition to your user that they’ve remembered it. When the human cache misses (they had to open your app again), it means that you didn’t previously deliver what they wanted in a way that was memorable. Giving users what they want is why they pay us money!

Let me give you an example. Which of these two screens do you think you’ll remember some information from in 10 minutes?

I know that’s a contrived example, but when the information is conveyed in pure simplicity, it seems much easier to remember. I’d bet you’ll still be able to recall the specific ‘available’ amount ($422) even hours after reading this article thanks to the simplistic design on the left.

The key takeaway with this concept is intuitive but we forget about it all too often as developers — most of us want to see all of the data, all of the time. Even here at Judoscale, we love data. When it comes to scaling and queue time, we give you all the data we can. But the reality is: it’s easier to remember less.

So I posit this question: what if the maximum value we can deliver to our users is the minimum amount of data required to convey the message, delivered in a format that’s uniquely memorable in their minds? This requires careful planning, a pragmatic mind, user feedback, and creativity. But it has the best payoffs.

And you know the feeling. We’ve all used apps before that we love and rave about. So often the key features of these apps is a sense of lasting value; the realization that you’ve actually retained insights from the use of the app.

When our team considers new features or new UI components, we try to bring a healthy dose of pragmatism to the table. Shiny things are neat, but they can degrade the simple-value-delivery mechanism. Questions like these help us hone our focus on the impact to the human cache (directly or indirectly):

Are developers coming to our app to keep up with their queue time, or only checking things when something goes wrong?
When a team wants Slack alerts, what exactly are they wanting to be alerted about?
How do we pursue marketing that conveys our purpose while not being forceful?

The questions tend to boil down to, “How do we deliver what the users want in a way that’s effective and memorable?” Or maybe, how do we prime the human cache best?

Oh, it’s worth mentioning too, the human cache concept is a total pivot from user-analytics-driven strategy. It goes without saying, but a human cache HIT yields zero user analytics data. Going further, embracing the human cache paradigm changes the perspective on the user-analytics data you do get. That data now represents the areas you should focus on distilling down! You’re essentially optimizing for users to not need to hit your app. The analytics you see are actually cache misses.

["How to Fix", "Heroku's Noisy Neighbors"]

Jon Sully — Wed, 16 Aug 2023 00:00:00 +0000

If you’ve had any kind of production-tier application running on Heroku with a moderate (or higher) level of traffic for the last several years, you’ve probably experienced Heroku’s noisy neighbors issues. A total phantom, this problem is difficult to track, invisible to most monitoring, hard to nail down as a root-cause, and a pain in the butt for the aforementioned reasons! “Why is my app responding slowly to some requests even after we’ve put months of development time into making our end-points more efficient!?” “Why does everything run totally smooth on a Perf dyno but choppier and slower on Std? We don’t want to pay for Perf long-term!” We’ve heard plenty of these stories. We’ve even experienced them ourselves! And, to make things even more complicated, Heroku doesn’t reveal much about their architecture or how resources are shared… so when they made changes to that system in the last year, it only added complexity to the timeline. Heroku’s Noisy Neighbors. Let’s talk about it.

First, let us give you some context into where our findings come from. We (Adam and Jon) are the Judoscale team — two developers helping to solve capacity issues and keep your app’s alerts at bay with autoscaling. Judoscale is the biggest and most capable autoscaling add-on available in the Heroku ecosystem; we’re processing well over three billion metrics per day across all kinds of different apps and configurations on the platform. And okay, we hear you, hand-wavy metrics with big numbers don’t actually mean anything without context and proper units — I’m not trying to be a sales person here! The point here is that we process enough data from Heroku dynos that we’re able to abstract some interesting insights about the platform itself and its health. Now, we’re not the IANA watching traffic patterns across the entire federated DNS system of the internet (nor are we statistics specialists), but we do see enough Heroku-specific data to capture what happens when Heroku quietly rolls out changes to some of their performance tooling!

Noisy Neighbors Defined

We won’t spend too long here since most veteran Heroku users should be familiar with the problem at hand, but the idea is this: Heroku ‘dynos’ are really just containers — like Docker containers. As with all all container-based infrastructure, you have many virtual boxes (the containers) sharing the underlying physical resources of a singular machine (the server). Generally, we call this resource-sharing (novel, right?). It looks like this:

And if that looks busy, it’s because it is! Each of the dynos is truly fighting for resources on the underlying host. They all want the CPU all the time; they all want all the memory. Heroku’s job is to try to grant each container an equal share of the pie.

But in reality, perfect sharing of physical resources between virtual hosts is algorithmically and programmatically impossible. This is one of those deep computer-science, algorithmic proof, NP-complete-etc type problems, so we’re not going to go any deeper here, but understand the implications of this statement. Noisy neighbors on shared hosts are inevitable to some degree. There is simply no way to perfectly share resources across many virtual hosts when each virtual host is constantly changing in how many resources it needs, uses, and holds… the chaos:

But, that doesn’t mean we can’t minimize the impact and severity of noisy neighbors! We can even use tooling to fix it when it happens! Hold that thought.

Noisy Neighbors Today

As of August 2023, Heroku noisy neighbors remain an issue that many apps face but Heroku has made some changes recently. As we noted before, Heroku doesn’t reveal much about their architecture or load balancing algorithms at all. We have no special access here as Heroku add-on developers (though we wish we did) — but Heroku did allude to some changes in this space a few months ago:

We previously allowed individual dynos to burst their CPU use relatively freely as long as capacity was available… This is in the spirit of time-sharing and improves overall resource utilization by allowing some dynos to burst while others are dormant or waiting on I/O.

Some customers using shared dynos occasionally reported degraded performance, however, typically due to “noisy neighbors”

To help address the problem of noisy neighbors, over the past year Heroku has quietly rolled out improved resource isolation for shared dyno types to ensure more stable and predictable access to CPU resources. Dynos can still burst CPU use, but not as much as before.

And we did actually see those changes impact apps on the platform — including ours. Put in more succinct terms, Heroku is now clamping down on apps that attempt to utilize more resources than they’re paying for. This is all about dyno load:

But hold that thought too! The key takeaway from these changes is that your application now needs to keep a closer eye out on its load. Heroku will ~~slap your wrist~~ throttle your dynos quicker than years prior!

Do You Have Noisy Neighbors?

If you’re having capacity, dyno, or response time issues, it’s possible you’re running into noisy neighbors. It’s also possible that you’re the noisy neighbor and Heroku is throttling your dyno(s) accordingly.

So first, and a big note here, is that you should always begin performance evaluations elsewhere. Noisy neighbors are not the first thing to point to for a slow application. You should only continue down this track if you know that your endpoints are performant, your scale is appropriate (as in, your queue time is very low), your requests aren’t triggering slow queries… or hundreds of queries per request (!), and all of your monitoring tools show normal stats in every typical way. If you’ve confirmed all of those criteria but still have inexplicably high response times sporadically, you could have a noisy neighbor problem.

A good place to start is with the Dyno Load metric chart that Heroku exposes in each app’s “Metrics” tab (see image above). This is one of very few monitors Heroku exposes for an app to know how much resource-pie it’s consuming across all of the hosts its various dynos are running within. If you’re running Free, Hobby, or Standard dynos and seeing a 1M LOAD MAX breaching 2.0 or a 1M LOAD AVG above 1.0 during the history of the chart, your app is likely getting throttled. You’re getting reduced dyno performance because your app is the noisy neighbor!

To really know if someone else is the noisy neighbor, you’ll need more specialized tooling. Some Application Performance Monitoring tooling may allow you to break down some metrics per dyno. This is super helpful to see if your issues are really only on a single dyno.

Since only one dyno in the group is behaving slowly here, it’s very likely that this dyno is actually the victim of a noisy neighbor. While our chart shows that the dyno recovered fairly quickly (hold that thought), a dyno experiencing neighbor noise tends to stay in a hurt-performance state for a long period of time — often many hours. Since many APM tools don’t even offer by-dyno metrics breakdowns, this situation can end up in a frustrating guessing game when most other metrics look totally normal! Not all dynos are equal 😬.

Fixing Your Noisy Neighbors Problem

Remember above where we said that sharing resources perfectly across virtual hosts is actually impossible? That’s still true. It is impossible to slice the pie perfectly for each host. But would you notice if the pie was only 1% or 2% off at any given time? Of course not! We don’t need to fix noisy neighbors — we need to optimize, mitigate, and scale. Optimize our infrastructure to be resilient against noisy neighbor issues, mitigate noisy neighbors when they inevitably pop up, and scale dynamically to handle our required loads!

Oh, and it is worth mentioning — throwing money at the problem can work well for noisy neighbors. If you’re willing to pay for them (though we don’t always recommend them), perf dynos are built around dedicated hardware and will not have noisy neighbors (there are no neighbors)! It can sometimes be worth a quick switch to perf dynos for a few hours just to see if your application’s metrics stabilize, too. That’s another helpful indication that what you’re facing on the std dynos is indeed noisy-neighbor-related.

Optimization

On the optimizing front, it’s all about Dyno Load. Let’s take another look at that chart. Heroku does tell us that Free, Hobby, and Standard dynos should not exceed a load average of 1.0. That may mean a different volume of requests since Std-1x dynos have a different quantity of resources than Std-2x, but the chart and average load value scale accordingly. Further, while Heroku doesn’t offer any more specific advice than,

if your dyno load is much higher than the number listed above, it indicates that your application is experiencing CPU contention

We can tell you, through both experience and platform observation, that what they really mean is “if you are hitting a 1M LOAD MAX of 2.0 (or higher), that dyno will be throttled.” Your app is essentially pushing for too much CPU in these cases. To put it visually,

The dark blue area (average load) shouldn’t exceed the yellow line and the light blue area (maximum load) shouldn’t exceed the red line. If your app’s maximum load does briefly exceed the red line (as this app does) you’ll likely be throttled, but minimally. You may or may not experience this as “noisy neighbors” or high-impact but you will have a less performant app than if you reduce your load. If your app’s average load exceeds the yellow line it will likely experience heavier throttling and reduced performance. Heroku’s changes this year simply mean that these lines / limits are more aggressively enforced. Trust us, bad things tend to happen when you cross these lines!

If your app currently lives above these lines and you’re experiencing noisy neighbor problems, the answer is before you: reduce your dyno load. If this is a web process, you’ve likely got too many requests for the endpoints they’re hitting. If it’s a background process, you may be pushing too many jobs to that dyno. We agree with Heroku’s own suggestions for lowering your dyno load:

Many applications have a way to tune the number of threads or processes that their application is attempting to use. Different languages have different performance characteristics, for instance, Ruby and Python have a GVL/GIL that prevents concurrent execution of program code by multiple threads. In general, if you are above the listed load, you will want to decrease process and/or thread counts until your application is under the given value.

The most universal answer to quickly lowering dyno load is to reduce your thread and/or process count.

We consider this process to be optimization because it both helps your app to not be the noisy neighbor (and thus be hurt by throttling) but also ensures that your app has a little more processing headroom just in case another container decides to become your noisy neighbor. Keeping your load average between 0.6 and 0.9 tends to be the right balance between headroom and performance.

Mitigation

Unfortunately, while well-optimized apps tend to be less impacted by noisy neighbors, they will still face neighbor noise to some degree. Also unfortunately, there’s almost nothing you can do when these instances crop up. Heroku gives us no tooling for managing our slices of shared resources 🙁.

We grew annoyed by this reality, so we built a feature into Judoscale called The Dyno Sniper. The idea is that Judoscale watches for individual dynos reporting metrics outside of the app’s typical measurements. If a dyno seems out-of-norm, it gets sniped! Essentially we built a way to sniff out dynos being impacted by noisy neighbors and kill them, causing replacement dynos to be spun up on a new server. We’ve been running this new feature on our own production applications for several months now with great impact — noisy neighbors happen more often than you might think. Here’s our last 24 hours. Over 20 snipes!

And here’s the metrics for one of our dynos during that time:

Take that, noisy neighbors! We’ve got an escape hatch out of your noise! Now we can mitigate noisy neighbor issues in seconds rather than waiting and hoping that said neighbor stops hogging all the CPU! Now we’re talking!

A couple of notes, though. The Dyno Sniper feature is still very much an early beta feature on Judoscale. We’ve been running it successfully for a while ourselves but if you’re interested in enabling it for your application, please let us know. In particular we’d be interested in users that are experiencing noisy neighbor issues under a fairly constant load and have APM systems capable of revealing per-dyno metrics. Applications running at least 5 dynos (of any type) make for the best candidates for sniping.

Check out the The Dyno Sniper in real-time:

(Auto) Scale

Alright, so you’ve lowered your Dyno Load to reasonable numbers, you’ve got sniping enabled, and you’re feeling good about getting the best possible performance and experience you can out of Heroku. What’s left? Autoscaling.

Picture it this way: you’ve got several dynos running throughout the day. For reasons we can’t guess, some randomly become less performant thanks to noisy neighbors. At the same time, The Dyno Sniper is sniping certain low-performance dynos, causing them to be unavailable for a short time while they restart on new hardware. Added to all that, you’ve audited and brought down your Dyno Load to make sure that noisy neighbors have as little impact as possible. That’s a lot of moving parts! Between the lower Dyno Load and the potential for more than one dyno to be temporarily halted, your app’s maximum traffic capacity may be in constant flux — not to mention that the traffic itself is likely in constant flux! We need to be autoscaling.

To put it shortly, The Dyno Sniper + Dyno Load efficiency will ensure that the quality of each dyno you have running is up to par, but autoscaling will ensure that the number of dynos you have running at any given time is appropriate for the volume of traffic you’re receiving. Missing either of these pieces would result in an application with great dynos, but not enough of them — or plenty of dynos, but none of them particularly performant.

The good news is that installing and running Judoscale is easier than ever and The Dyno Sniper comes at no extra cost. Judoscale will dynamically scale your app according to your request (or background job) queue time all day long:

It’s the kind of set-it-and-forget-it tool that just works. And that’s the same ethos we built into The Dyno Sniper.

Wrap it Up

Okay, quick recap. First, noisy neighbors will always be a thing on shared hardware — it’s technically unavoidable. But in all real sense and practice, we can totally solve the problem. Second, Heroku made changes this year that enforce their dyno resource limits more aggressively. We need to stay aware of that in how we program our application’s resource footprint! Third, it can be tricky to even identify if you’re having a noisy neighbors issue. Definitely worth spending some time with an APM that can break things down for you.

With those points out of the way, the recipe for (essentially) fixing your noisy neighbors problem is this:

Optimize your app’s Dyno Load footprint. Aim for 0.6 - 0.9 in your average load. Don’t breach 2.0 in your max load if you can help it.
Mitigate noisy neighbor events when they occur with The Dyno Sniper. This will make sure that the dynos you’re paying for are giving you the resources you need.
Autoscale your dynos based on queue time. This guarantees that you won’t have a capacity problem even amidst dyno churn.

These steps take a bit of time, but they will rid you of your noisy neighbor concerns!

Need help? Have questions? Want to talk more about noisy neighbors? Judoscale is a tiny team of two and we read every email that comes our way — no need to be a customer of ours. Give us a shout and we’ll do our best to help you!

["Autoscaling Heroku: ", "The Ultimate Guide"]

Jon Sully — Mon, 7 Aug 2023 00:00:00 +0000

So you’ve got your app up and running on Heroku, your web dynos are happily serving requests, and everything is smooth. Traffic is growing steadily, your SEO juice is increasing, sales seem to be on the up and up! But eventually the alerts begin. You’ve got too much traffic for your dynos to handle 😱. Requests are taking too long; some are outright failing. The app is on fire!

What now? You’ve got options.

Option 1: Throw Money at it

Before we get into automatic scaling, there’s always the “throw money at it” approach. And there are times when throwing money at a problem really is the best solution for the team. Granted, that’s often short-lived, but it’s worth briefly covering. When we say “throw money at it” in this context, we mean to manually set your Heroku dyno count beyond what’s currently necessary to avoid traffic-related issues on your app. We call this over-provisioning: to have a higher number of dynos than necessary for your current traffic level. This is referred to as throwing money at the problem because it’s expensive! Setting your app to run on more-than-necessary numbers of dynos costs… more than what’s necessary 😉.

The Pros

It’s super easy
It usually works for some amount of time
It can temporarily fix some traffic-related issues

The Cons

It’s risky: you don’t know when your next traffic spike will require capacity beyond what you’ve now hard-coded, even if that capacity level feels quite high right now. Your future traffic levels are a risk
It’s expensive: as mentioned, this approach wastes money — another form of risk depending on your business and revenue structure
It doesn’t fix all scale-related issues: for example, if your problem is that your DB has run out of available connections for the dynos you’re already running, scaling up more dynos is only going to make that worse
It can bite you later: manual over-provisioning can mask potential scaling issues that could rear their heads at the worst moment

Let’s take a look at this approach in a more visual manner. Here we plot example traffic levels (black line) against dyno scale levels (green boxes). Since we’ve set a hard-coded, high number of dynos, all of the green boxes are tall and constant over time — our scale never changes.

So sure, in that graph we do have enough capacity for all of the traffic over time, but that graph also shows how much capacity (paid with dollars❗️) is wasted over time. If we shade the area above the line red, we can better see how much capacity went unused over the course of this timeline. These dynos had the capacity to serve more requests than they received, but that availability was wasted since traffic simply wasn’t that high:

Now this is purely an example graph with no actual units of measure, but that’s probably half of the total hosting bill wasted! That’s what ‘throwing money at it’ is — accepting that fixing a problem with money alone will be more expensive and wasteful than fixing a problem the right way.

So when is throwing money at scaling the right solution? We generally only recommend this as a short-term, stop-gap solution while you work on integrating a better automatic scaling solution, as long as you’ve got the funds to do it safely. Staying at a single scale level, even if high, is a permanent risk.

So what are ‘traffic issues’?

For the purposes of this article, we’re using ‘traffic issues’ to describe any back-end slowness that arises from an increase in either incoming traffic, or downstream bottlenecks (APIs, databases, etc.). Any sort of conditions which would cause your dynos to slow down their service of incoming web requests would be considered ‘traffic issues,’ though not all traffic issues are equal. Generally, we’re after those issues which can be fixed by scaling up the number of dynos in your app.

As a classic example, consider that you kick off 1,000,000 background jobs but only have a single background job dyno running. That’s quite a backup — it’s going to take a while to churn through those jobs. It’ll go much faster if you have a hundred background job dynos! This is a case where your dyno scale can help you.

Conversely, if your app’s database is offline for some reason, adding more web dynos will not help you. In fact, if instead your app’s database has reached its connection limit (as described briefly above), scaling up your dynos will actually hurt the app! Not all traffic issues can be fixed with scaling your dynos!

What is ‘autoscaling’?

Simply put, autoscaling is the process of automatically determining, and provisioning, the number of dynos for your application based on various factors. It’s granting power to a system to answer “how many dynos should we be running right now?” for you — so that you don’t have to! Autoscaling is simply “automatic scaling”.

This is no different from the classic grocery store metaphor — as the checkout lines grow longer, the store manager opens new checkout lanes and alleviates the lines. In this way, the store manager is the autoscaler, the checkout tellers are the dynos, and the store owner that didn’t have to worry about any of this is you, the Heroku App owner! 😁

Option 2: Heroku’s Autoscaling

The second approach we’ll cover is Heroku’s own built-in autoscaler. For starters, Heroku only offers this feature for Perf dynos and for Private/Shield (enterprise-class) customers, and only for web dynos. That cuts out many of the apps running on Heroku, but ironically those apps that are running on perf / private dynos are the ones who should avoid Heroku’s built-in autoscaler the most! Hold that thought.

It’s all about efficiency. Let’s look at our graph again, but make it a little more realistic this time. The first thing to understand is that our green box above is actually several green boxes, each representing an active dyno. And the height of each green box varies depending on which dyno type you’re running: perf-l‘s can handle a lot more traffic than std-2x’s, which can handle nearly double the traffic of std-1x’s for most apps:

The second thing to understand about this chart is the width of the green boxes. This represents how fast your app can scale up and down according to its traffic levels; how responsive your autoscaling is. Note in this next chart the difference between how tightly the boxes can follow the traffic curve when they’re double-wide (on the left) vs. single-wide (on the right). Wider boxes can’t follow the traffic curve as tightly:

Box heights and widths combine to represent a scaling system’s efficiency: how tight it can cling to your traffic curve. To illustrate this point, here’s an example app using perf-l dynos (tall boxes) and a not-very-responsive autoscaler vs. the same app switching to std-2x dynos and a very responsive autoscaler:

The key point here is the relationship between the scale and the traffic curve. Since you have to pay for every dyno you’re running in real-time regardless of how much it’s actually being utilized, the express goal of an autoscaler should be to keep your scale boxes as close to that curve as possible, but never below it. If the scale were below the line, that’d be the everything’s-on-fire issue we started with. Conversely, any scale amount above that curve is wasted money! Let’s zoom in on the very top of the curve above where the app is using perf-l’s…

That’s a lot of money just flying away! If that same app and same traffic curve was using a more finite dyno type (shorter boxes) and more responsive autoscaler (skinnier boxes), it might look more like this:

Far fewer monies flying away (so long, monies! 💸).

Back to Heroku’s autoscaler

Circling back from autoscaling efficiency, let’s talk about Heroku’s autoscaler itself. As we mentioned before, Heroku’s autoscaler is a feature reserved only for perf+ dynos, and only for the web process. But unfortunately, it’s also not very responsive, and thus, not very efficient!

There are two factors that contribute to this. First, Heroku autoscales your web dynos based on response time. Second, Heroku’s autoscaling algorithm doesn’t actually trigger scaling very quickly. We’ve written pretty extensively on why queue time is the metric that matters for scaling, not response time, but suffice it to say that scaling based on response time can lead an app with naturally slow endpoints (say, file uploads or API calls) to trick an autoscaler into keeping the app over-provisioned (💸💸). Response-time autoscaling is simply less accurate and less efficient for keeping your app scaled correctly. To our second point, Heroku’s autoscaler simply doesn’t react very quickly. When we’ve run audits on Heroku’s autoscaler, we find that it typically takes multiple minutes to scale up a single dyno in response to response times slowing. That’s multiple minutes of requests hanging or failing and probably some alerts in your monitoring tools. Since Heroku’s autoscaler only jumps in increments of one dyno at-a-time, that 'multiple minutes’ can be painful if your app has large influxes of traffic that require several more boxes to accommodate!

Unfortunately, this means that Heroku’s native autoscaling is more like “scale up (slowly) once things are already in trouble” 😬. That said, this option has its pros and cons:

The Pros

It’s already built-in to Heroku and simply requires checking a box to activate
It’s free… for eligible dynos (which happen to be the most expensive!)

The Cons

The autoscaling algorithm is based on response time, which isn’t ideal
The autoscaling algorithm runs fairly slowly and could fail to scale fast enough for influxes of traffic
You cannot schedule any sort of scaling ahead of time
Autoscaling only works on your web process — no scaling of background jobs or processes
Autoscaling is only available for perf (or higher) dyno tiers

So when is Heroku’s autoscaler the right solution? Unfortunately, given the long list of cons above, we don’t ever recommend Heroku’s autoscaler as the right solution for any situation. If you’re already at a point of pursuing autoscaling, it’s both easy and very worth it to go just one step further. That one step is…

Option 3: Autoscaling Add-ons

So, knowing all that we know from the prior two sections, what exactly are we after? Here’s the list that comes to mind:

An autoscaler that’s simple to use
An autoscaler that’s very responsive to the traffic curve
An autoscaler that executes scales quickly to prevent slowness and downtime
An autoscaler that works on all dyno types that can be scaled

Essentially, we want our dyno scale to cling as tightly to our traffic curve as possible without ever being beneath it. That means both thin bars (responsive autoscaler) and short steps (using a dyno tier where scaling up or down by one dyno represents only a fraction of capacity). Visually, we’re after this:

(Keep in mind that this is proportional. If your app is getting thousands of requests per second those little blocks may well be perf-l‘s. If your app is smaller, those little blocks may be std-1x’s. The goal is to keep the blocks small relative to the traffic curve so that a responsive autoscaler can be as modular as possible in scaling!)

We should also be able to use autoscaling in all of our dynos and process types. Which means we want…

Infinite Background Job Volume

Autoscaling in our background processes can totally change the paradigm of our background job system. Under the purview of an efficient autoscaler it transforms into a fully elastic volume processor. Think of it this way: you might only need one dyno running your background jobs day-to-day, but if you suddenly need to process a million complex background tasks, you no longer have to think twice about it! Autoscaling will kick in and spin up as many dynos as it takes to get your queue back under control. From one to one thousand! This is another one might be easier to understand visually:

As cool as request and web dyno autoscaling is, background job autoscaling may well be an even more powerful tool for making your app elastic. The amount of raw work that can be accomplished is nearly limitless and allows you to design background jobs and architecture with scale in mind. We definitely want this feature.

Heroku Autoscaling Add-ons can accomplish these things for us.

But Which One?

While the Heroku Add-ons marketplace has several options these days, we believe that Judoscale, our own autoscaling add-on, is still the best-in-class. In fact, it was the desire for all the aforementioned features that lead us to build Judoscale in the first place. Judoscale is simple to use, responds to traffic spikes in 10-20 seconds, scales based on queue time, works on all scalable dyno types, autoscales background job systems, and works with all process types for Ruby applications, NodeJS applications, and Python applications thanks to our custom, open-source packages for each. Feel free to check out our demo app to see what Judoscale looks like in action — no login or info required:

Judoscale-specifics aside, let’s assess the pros and cons of using a third party autoscaler instead of Heroku’s.

The Pros

The UI will be designed and tailored specifically for scaling concerns
It should be more responsive than Heroku’s autoscaler (though which add-on you choose will make a difference here)
It should be quick to scale your app up and down (though again, the specific add-on you choose makes a difference here too)
It should work on all dyno types and process types
It should work for background jobs (though some autoscaling addons may not)

The Cons

They aren’t free (but they’re usually only a fraction of the cost of your dynos)
They typically require you to install a package into your app to get the best scaling results

So when is a third-party autoscaler the right solution? Obviously we might be a bit biased here, but we believe that any and every production application should have an autoscaling system and that Judoscale is the best of these. Compared to the costs saved, headaches defended, and traffic capacities freed, it’s a no-brainer. Every production-tier application should get a dedicated autoscaler.

Wrapping it up

So we’ve got three distinct options for our dyno scale on Heroku: 1) throw money at it, 2) use Heroku’s built-in autoscaling, or 3) use an add-on like Judoscale to keep your scale close to your traffic curve.

Any of those three choices is going to benefit your team and application, but each has its own limits and pros/cons that will impact your particular setup. Not all application problems are going to be fixable by scaling (or autoscaling), but many are. We recommend having autoscaling enabled for every production app in the wild and using the monitoring and tooling that comes with an autoscaler to get a better sense of your app’s behaviors and norms.

Autoscaling is a net win for any Heroku application. We should all take advantage of that!

Need help? Have questions? Want to talk about scaling efficiency? Judoscale is a tiny team of two and we read every email that comes our way — no need to be a customer of ours. Give us a shout and we’ll do our best to help you!

The Procfile Explained — Examples and Tips

Adam McCrea — Wed, 24 May 2023 00:00:00 +0000

Ah, the glorious Procfile. The starting point for any app running on Heroku and other cloud platforms.

At it simplest, the Procfile is where you declare the one or more processes to be run for your app to function. Heroku’s docs do a great job of explaining the Procfile format, so this post will focus on a bit more advanced usage.

Using Your Procfile for Local Development

One of the benefits of a Procfile is that it further allows for dev/prod parity when we run our local environment from our Procfile. Using a tool like heroku local (built in to the Heroku CLI) makes that a cinch. Given a Procfile such as:

We can simply run heroku local to spin up all of our app’s processes concurrently under a single umbrella:

Pro tip: If you’re a tmux user and want to take this to the next level, overmind is an awesome alternative.

Use Dev-Specific Processes in Your Procfile

Dev/prod parity is a worthy goal, but there are often processes you only want to run in development. We can actually throw those into our Procfiles:

We simply don’t allocate resources (dynos/containers) to those processes in our cloud environments — effectively ignored there:

Alternate-tip: you can also use a separate Procfile (often named Procfile.dev) to house dev-only commands, though you lose some of the dev/prod parity.

Inline Environment Variables

Config vars (a.k.a. environment variables) are generally shared amongst all process for your Heroku app, but in your Procfile you can specify per-process configs with shell variable substitution:

This allows you to specify more clever config variable values that can be different per process type. You can even use config variables to skip processes altogether:

Add Processes

Remember that you’re not limited to just one web and worker process! You can add as many processes to your Procfile as you want. This is especially helpful when considering background job processing systems. If you run multiple queues in your background job system, don’t be afraid to split each queue out to its own process type.

Besides the benefit of giving each queue its own dedicated engine, you also gain the benefit of isolating each queue from the next without having to get into the nitty-gritty of queue priority weighting. Additionally, you’ll be preparing your app to work seamlessly with automatic dyno scaling based on the job queue’s metrics (which is exactly why we made Judoscale). Each queue’s dynos will be scaled totally independent of one another to keep each queue running smoothly 24/7.

Heroku (Almost) Free: Understanding Eco Dyno Hours

Adam McCrea — Mon, 22 May 2023 00:00:00 +0000

Heroku remains the easiest way to deploy and host a web app, and it doesn’t have to cost an arm and a leg. In fact, for some apps you can do it for (almost) free.

This guide covers everything you need to know about using Heroku on the cheap, including Eco dynos, the Mini Postgres database plan, and free add-ons.

What’s ahead

What’s ahead
How does Heroku work?
Eco dyno limitations
How to prevent Eco dynos from sleeping/idling
How are Eco dyno hours calculated?
Check your dyno usage
Eco vs. Basic vs. Standard-1x dynos
SSL for Eco dynos
Running a production app on Eco dynos
When to upgrade your Eco dynos
Heroku’s (almost) Free Postgres database
Free Heroku add-ons
Putting it all together

How does Heroku work?

Heroku runs your app on “dynos”. This is just Heroku’s word for “container” or “server”. For your app to run and serve requests, you need at least one web dyno running. Adding more web dynos will increase your capacity to handle a higher volume of requests.

Heroku offers six dyno types, each increasing in price and performance characteristics. At the lowest tier are the Eco dynos.

Running an Eco (or Basic) dyno grants you the same performance characteristics as the Std-1x tier, but Eco dynos have unique guards around their allotted runtime.

Eco dyno limitations

Naturally, “Eco” comes with some… restrictions.

Eco web dynos will idle (sleep) after a period of inactivity. The next request will wake it up, but users will notice a delay while the app is waking.
You have a limited number of “Eco dyno hours” each month. When you’ve exhausted your quota, your app shuts down.
Eco dynos are not available for Heroku Teams.
Eco dynos do not support Heroku SSL for custom domains.
Heroku metrics are not available for Eco dynos.

Those limitations might seem like deal-breakers, but fear not! We have workarounds for many of them.

How to prevent Eco dynos from sleeping/idling

Eco dynos will sleep (shut down) after 30 minutes of inactivity. This is actually a feature to avoid using up your Eco dyno hours. This makes Eco dynos great for staging/test environments that don’t have constant usage.

If you want your Eco web dynos to run 24/7, you’ll need to ensure they receive at least one request every 30 minutes. The easiest way to prevent your dynos from sleeping is using an uptime monitor that pings your app all day long.

There are lots of uptime monitor services, and there’s a good chance you’re already using one. If you need a free one (you’re reading about low-cost dynos, right?), we’re big fans of Uptime by Better Stack.

Note that dyno sleeping only applies to web dynos. If you’re running a “worker” dyno (asynchronous/background job processing), it will not go to sleep. It’s very easy to consume your Eco dyno hours quota with worker dynos.

How are Eco dyno hours calculated?

Heroku prorates dyno usage to the second. If you run an Eco dyno for 30 minutes, that’ll use 0.5 hours from your Eco dyno hours quota. If you run an app with single Eco dyno 24/7 for 31 days, that’ll use 744 (24 x 31) of your Eco dyno hours.

So how many Eco dyno hours do you get?

From the official docs:

Subscribing to the plan immediately gives you 1000 dyno hours for the current month

Pretty simple. Since a full month is at most 744 hours, this means you can run a single web dyno 24/7 for $5/mo (based on current Eco pricing).

Check your dyno usage

So how many Eco dyno hours do you have available? Check out your Heroku billing page.

This is a great way to check up on any forgotten apps that are consuming your free dynos quota!

Eco vs. Basic vs. Standard-1x dynos

Heroku’s six dyno types are generally more powerful as they get more expensive, but that’s not the case for cheapest three.

Eco, Basic, and Standard-1X dynos are exactly the same hardware with the same performance characteristics. Basic dynos only lift a few of the feature limitations of Eco dynos:

Basic dynos do not go to sleep.
Basic dynos support Heroku SSL for custom domains.
There’s no “hours quota” for Basic dynos.

One shared limitation between Eco and Basic dynos is that you cannot scale to more than one dyno per process. If you need to scale up to two web dynos, for example, you must use standard or performance dynos.

SSL for Eco dynos

SSL is no longer optional these days—you must serve your web app via HTTPS. On Eco dynos, you have two options:

Serve your app from the default Heroku URL. Every Heroku app is available by default at [app-name].herokuapp.com, and all of these URLs include SSL support out of the box.
If you need a custom domain, you’ll need to use a service like Cloudflare’s Flexible SSL (totally free). Your users will connect via HTTPS to Cloudflare, and Cloudflare will connect via HTTP to your Heroku app.

If neither of these options are acceptable to you, you’ll need to use one of the more expensive dyno types.

Running a production app on Eco dynos

Since you can run an Eco dyno (which is equivalent to a standard-1x dyno) for 1,000 hours per month, that means you can run your production app 24/7 on Heroku for $5 per month, right?

Well, that depends.

Most production apps need at least a web process and a background worker process for things like sending email and communicating with third-party services. This requires a “worker dyno” in addition to your “web dyno”, will exceed your 1,000 hour allotment if you run them both all month. And Eco worker dynos do not automatically sleep like Eco web dynos do, so they’ll be running 24/7 unless you manually scale them down.

This is where you need to decide just how “production-scale” your app needs to be right now. Some tools (like good_job and sucker_punch for Rails) let you run async jobs within your web process. You could also choose to do these tasks synchronously during web requests if the user impact is minimal.

Some background work could also be performed on a periodic basis with Heroku Scheduler, which creates a “one-off dyno” and only uses dyno hours for the time it’s running.

If you really do need a separate worker process (and most full-stack production apps do), then you’ve probably outgrown Eco dynos.

When to upgrade your Eco dynos

If you’re already running your app on Eco dynos, how will you know when you’ve outgrown them? If your app is a successful product with growing traffic, you will certainly hit a point where a single Eco dyno is insufficient.

You can’t scale Eco dynos horizontally—in other words, you can’t add more than one Eco web dyno to your app. If one dyno is not enough for you, you’ll need to upgrade to standard dynos.

The best way to know when you’re outgrowing Eco dynos is by watching your metrics. Since Heroku’s metrics tab is only available on Basic+ plans, check out Librato’s free add-on for similar metrics and alerting.

If you’re seeing high request times and/or HTTP errors, there’s a good chance you’re outgrowing free dynos.

It’s also possible your app is just slow. An APM tool like Scout will help you drill down into specific bottlenecks. A full tour of Scout is beyond the scope of this article, but generally speaking, high request queue time indicates a capacity issue (need more dynos).

If you’re seeing lots of time spent in other parts of your stack (Ruby, Postgres, etc.) then adding more dynos will not help. You’ll need to investigate your app code to resolve the performance issue.

Heroku’s (almost) Free Postgres database

Whew! All that and we haven’t even talked about your database yet. Much like dynos, Heroku offers multiple tiers of Postgres, but did away with the free tier in 2022.

But… ouch. Heroku’s ‘Mini’ Postgres is limited to 10,000 records.

Even the smallest production apps are likely going to exceed this limitation. Here are your options:

Upgrade to a larger Heroku database plan, starting at $9/month for 10,000,000 records.
Use a third-party hosted database provider with a more generous free plan, like ElephantSQL or Amazon RDS. Just note that once you’re out of the free plan, the pricing between these providers and Heroku is very small.

There’s really no way around it. Your database will eventually be the most expensive layer of your stack, so be prepared to pay for it early.

Free Heroku add-ons

One of the best parts of Heroku is the add-on ecosystem, and many add-ons have excellent free plans. Here are some of my favorites:

Librato — As mentioned earlier, Librato gives you visibility into your app metrics above and beyond what Heroku provides (and Heroku provides no metrics for free dynos).
Scout APM — Monitor and diagnose app performance issues. I install this on every app.
Sentry — Fantastic error monitoring service with generous free plan.
Redis Cloud – Heroku has their own Redis service, but you’ll get more for your money (or for free) with Redis Cloud.
Judoscale – Every Heroku app needs a safety net if traffic spikes or a dyno has a problem, and Judoscale gives you that for free.
AutoIdle — This one isn’t a free option, but it’s super useful if you’ve upgraded to larger dynos and you want to save money when your app isn’t being used. It essentially replicates the “sleeping” behavior of free dynos. Very nice for staging/QA apps.

Putting it all together

So with all this in mind, when do Heroku’s cheapest offerings make sense for you?

Launching a side project. There’s a good chance the side project won’t get much traction at first, so throw it on Eco dynos and a Mini Postgres DB and see what happens! You won’t have enough free hours to run worker dynos, but who cares… until you have real users and revenue, you can do everything in your web process.
Running a staging or QA app. These are perfect for Eco dynos because they don’t need to run all the time. Your staging data might even fit within a Mini Postgres DB.

Beyond those use cases, you’re probably going to have to pay more for Heroku. The Eco/Mini options exist to try out the service, play with an idea, and run experiments. Once you’re running a “real app”, you’re hopefully at a point where you can spend a few more dollars on hosting.

Questions or thoughts on this article? Let’s chat on Twitter.

Heroku Dynos: Sizes, Types, and How Many You Need

Adam McCrea — Fri, 12 May 2023 00:00:00 +0000

Heroku makes it effortless to deploy our web apps when we’re just getting started, but anyone who’s scaled an app on Heroku knows that there are still lots of decisions to make. Topping the list are “which dyno type should I use?” and “how many dynos do I need?”

If you’re feeling overwhelmed or confused, you’re not alone. It’s not always clear which dyno type is best and how many we need, but by the end of this article, we’ll all be experts.

If you just need a quick tl;dr:, here it is:

Standard-2x dynos offer the best combination of price and performance for most apps. If you encounter memory quota warnings on 2x dynos, you should make the jump to Performance-L. For either dyno type, autoscaling is the only way to know how many dynos you should be running.

How did I land on that recommendation? Let’s dig in.

What’s ahead

What’s ahead
Heroku dyno types
How are the dyno types different?
Eco dynos
Basic dynos
Standard-1x Dynos
Heroku routing and “in-dyno concurrency”,
Multiple processes on a Standard-1x dyno
Standard-2x dynos
Performance-M dynos
Performance-L dynos
How many dynos?
Dyno calculations in the real world
Automation is the answer
Putting it all together

Heroku dyno types

Heroku offers six “Common Runtime” dyno types. These are often referred to as dyno “sizes” since the more expensive (“larger”) dynos typically offer more memory and CPU.

Heroku also offers “Private” and “Private Shield” dynos, which offer increasing levels of security compliance. The performance characteristics of these dyno types are almost identical to their Common Runtime counterparts though, so we’ll focus on the six Common Runtime dyno types.

How are the dyno types different?

A few important supplemental notes to the comparison chart above:

Monthly dyno cost from left to right: $5 (Eco), $7 (Basic), $25 (Standard-1x), $50 (Standard-2x), $250 (Perf-M), $500 (Perf-L).
Eco, Basic, and Standard-1x dynos are identical performance-wise, but Heroku imposes some feature limitations on Eco and Basic. More on these limitations below.
The two Performance-level dynos run on dedicated hardware. The four Standard dynos run on shared hardware, where we’re susceptible to “noisy neighbors” and somewhat ambiguous processing power (Heroku does not explain what the “Compute” metric represents).

With that as our high-level view, let’s go into each dyno type in detail.

Eco dynos

Eco dynos automatically shut down during periods of inactivity and can only run for a limited number of hours per month. These limitations make Eco dynos a poor fit for production applications. The Eco plan provides enough hours to run a single dyno continuously for a month, but if we need a worker dyno for background processing (most apps do), we will not have enough Eco dyno hours.

Eco dynos are for great for demos, experimentation, and perhaps a staging app.

Basic dynos

Basic dynos are identical in performance to Standard-1x dynos, and they don’t have the limited hours and automatic shutdown constraints of Eco dynos. Sounds great, right?

The catch is a limitation in how we scale our dynos. We can run multiple Basic dynos if they’re different process types (web and worker dynos, for example), but we can’t run multiple dynos of the same type. This means that if we ever need to scale to multiple web dynos, Basic dynos are not an option.

Even if a single Basic dyno is sufficient for our current traffic, running a production app on a single dyno is risky. Remember that Eco, Basic, and Standard dynos run on shared architecture. This means a “noisy neighbor” (more on this below) can slow our app down. Or maybe an unexpected spike in traffic has saturated our dyno. There are many ways a dyno can enter a “bad state”, and running on a single dyno is a single point of failure.

Later on we’ll go deeper on how many dynos to run, but for now let’s rule out Basic dynos since they prevent having any redundancy in our web dynos.

Standard-1x Dynos

Standard-1x dynos are the first of the “professional” dynos—dynos that don’t have any feature restrictions upon them. Feature restrictions aside, these dynos are identical to Eco and Basic dynos. I know I’ve said that multiple times, but it bears repeating.

I’ve also mentioned that Standard dynos run on shared hardware, making them susceptible to noisy neighbors—other tenants running on the same hardware that are consuming more than their fair share of resources. Noisy neighbors on Standard dynos are a real thing, and they’re tough to detect and mitigate.

That doesn’t rule out Standard dynos altogether, though. At a fraction of the price of Performance dynos, we can run many more Standard dynos for the same cost, helping mitigate possible performance issues caused by noisy neighbors.

The real problem with Standard-1x dynos is memory. With a limit of 512 MB, many apps will exceed that quota with even a single process. Running multiple processes per dyno is critical, and we’re going to take a little digression to discuss why.

Heroku routing and “in-dyno concurrency”,

When a user requests a page on our web app, Heroku’s router decides which of our web dynos receives the request. Ideally, the router would know how busy each dyno is, and it would give the request to the least busy dyno.

But that’s not how it works.

Heroku’s router uses a random routing algorithm. It doesn’t care about the size of the request, the path of the request, or how busy each web dyno might be. This means our dynos will inevitably receive an unfair share of large or slow requests, at least some of the time. It also means that if a web dyno can only process a single request at a time, we’ve introduced a dangerous bottleneck into our system.

If we have no “in-dyno concurrency”, a single slow request can cause other requests to back up inside the dyno. This is when we start to see request queue time increase. It’s a combination of not running enough web dynos, and those dynos not having sufficient concurrency.

No matter how many dynos we run, it’s critical that each dyno can process multiple requests concurrently. In Ruby, this means running multiple web processes—usually Puma workers. Running multiple threads doesn’t cut it. Ruby threads do provide a bit of concurrency (especially when there’s a lot of I/O), but due to the GVL, it’s not true concurrency.

Multiple processes on a Standard-1x dyno

This is an important point when considering a Standard-1x dyno, because most Rails apps will consume far more than 512 MB when running multiple Puma workers.

An easy way to test this is to run your app for 24 hours on Standard-2x dynos (which we’ll discuss next) with 2 Puma workers. Usually this is accomplished by setting WEB_CONCURRENCY to “2”, and ensuring that you’ve uncommented the “workers” line in your Puma config. Check your memory usage on your Heroku dashboard, and you should see it start to level off after a few hours (it’s normal for it to increase initially).

For most of us, this experiment will show 2 processes consuming somewhere between 500-1000 MB. Since running at least 2 processes is a must, and Standard-1x dynos are limited to 512 MB, Standard-1x is rarely a viable option.

Pro tip: One way to decrease memory usage for Ruby apps on Heroku is by using the Jemalloc buildpack.

Standard-2x dynos

Standard-2x dynos are identical with Standard-1x, with double the memory and CPU at double the price.

If you’re wondering how a single Standard-2x dyno is any different than two Standard-1x dynos, it’s all about the memory and concurrency. While most apps are constrained by memory to a single process in a 1x dyno, the 2x dyno opens up the possibility of running multiple processes.

Compare these scenarios:

Two 1x Dynos running 1 web process: Heroku routes requests randomly between the two dynos. Inevitably it will make bad decisions, sending requests to a busy dyno when another is available for work.
One 2x Dyno running 2 web processes: The web processes will balance the requests coming into the dyno, ensuring that an available process always gets the next request. Same cost, better concurrency.

Standard-2x dynos make in-dyno concurrency possible while costing a fraction of Performance dynos. That’s why I recommend them for most apps.

But not all apps can squeeze multiple processes onto a 2x dyno. Rails apps that consume 1 GB or more memory with two processes are not uncommon, and for those apps there’s Performance dynos.

Performance-M dynos

Perf-M dynos are the first of two “Performance” dyno options. Looking at the cost and performance characteristics relative to other dyno types, we can see there’s not a lot of value here.

Perf-M dynos have less than one fifth the memory of Performance-L dynos at half the cost.
Perf-M have only 2.5x the memory of Standard-2x dynos at 5x the cost.

Not much more to say about these. They’re just a bad deal.

Performance-L dynos

Perf-L dynos cost $500/month per dyno—10 times the cost ($50/month) of Standard-2x—but we get what we pay for. With a 14 GB memory quota, we can easily run several app processes in a single dyno.

Performance dynos run on dedicated architecture, so we don’t have to worry about noisy neighbors. With Standard dynos, performance differences can be noticeable between dynos, such as after restarting or deploying. Performance dynos don’t have this issue. The performance is very consistent from one dyno to the next.

Still, the performance variability with Standard dynos can be mitigated by running more of them, so I only recommend Perf-L dynos for apps that are memory-constrained on 2x dynos.

How many dynos?

Once we’ve selected a dyno type, the next logical question is “how many of them?” Let’s run a quick calculation.

We’ll assume an example app that takes 100ms to process each request. This translates to a capacity of 10 requests per second without any concurrency. If we’re running two processes per dyno, this takes us to a capacity of 20 requests per second per dyno.

We can look at Heroku’s throughput chart so see how many requests per second the app receives. Let’s assume this example app maxes out at 200 requests per second.

The math is straightforward: If each dyno has a capacity for 20 requests per second and the app is going to receive 200 requests per second, we’ll need 10 dynos. Here’s the formula:

The only problem is that this example app doesn’t exist.

Dyno calculations in the real world

If we looked at a real app instead of a hypothetical app, we’d see some stark differences:

Response times are not consistent. Some endpoints are slow, some are fast.
Requests don’t show up evenly. We might get a burst of 1,000 requests in one minute, then just a trickle of traffic for the next 10 minutes.
Traffic patterns change throughout the day, week, and year.

Attempting to use the calculation above in a real-world app is fraught with error. I’ve been there and felt the pain. We’re bound to find ourselves in one of these scenarios:

We under-provision our dynos, and our app struggles to keep up. Requests are queueing because we don’t have enough capacity to serve them, and our users experience slow page requests. Unsure of how to fix the problem, we crank up the dynos. Now we’re in the next scenario.
We over-provision our dynos, and our app is performing just fine. Unfortunately, we’re paying for extra capacity we don’t need. This can be really frustrating, and it leads to the many claims of Heroku being too expensive.

So how can we determine how many dynos to run, having confidence that our app will stay fast, without paying any more than necessary?

Automation is the answer

Instead of trying to manually calculate something that’s changing every second, let’s make software do it for us. That’s exactly what an autoscaler does: it continually calculates how many dynos are needed right now based on live metrics.

We built an autoscaler that does this better than anything out there, but this isn’t a sales pitch for Judoscale—it is a pitch for autoscaling in general. There’s just no reason to do it the hard way.

Every production app should have autoscaling in place. Even if your production app receives little traffic and runs fine on a single dyno, that dyno is a single point of failure. Running a single dyno without autoscaling is an invitation for slowdowns at best, a production outage at worst. Think of it as a low-cost safety net.

Autoscaling on Heroku is easy and cheap. With it we avoid the painful battles of being under-provisioned, and we avoid paying for unnecessary capacity when over-provisioned.

Putting it all together

We started with six dyno types, and narrowed them down to two: Standard-2x and Performance-L.

I recommend all apps try Standard-2x dynos first, running two app processes per dyno. If the 1 GB memory quota proves insufficient for your app, then make the jump to Perf-L. (Or better yet, find out why your app is consuming so much memory!)

Regardless of dyno type, always set up autoscaling on production apps. The cost of this setup will rival any non-Heroku option, and it’ll run smoothly for years with little manual intervention.

Autoscaling For GoodJob Queues

Adam McCrea — Thu, 9 Feb 2023 00:00:00 +0000

Do you use GoodJob? It’s a modern gem for running background jobs in Rails—similar to Sidekiq, but it’s designed for ActiveJob and uses Postgres instead of Redis as the queue backend.

We just added GoodJob support to Judoscale, and we’re super excited! This post assumes you’re already using GoodJob in your app, and you want to autoscale your queues.

In this post we’ll cover:

Why autoscale your GoodJob queues?
Why Judoscale?
How to set up autoscaling for your GoodJob queues
Autoscale configuration options
Monitoring your GoodJob queues
Next steps

Why autoscale your GoodJob queues?

Two main reasons:

You want to make sure that your background jobs are processed as quickly as possible, and you don’t want to have to manually scale your queues up and down as your job volume changes.
You want to save money by only paying for the resources you need.

Judoscale makes it easy to achieve both of these goals.

Why Judoscale?

As I write this post, Judoscale is the only autoscaling service that supports GoodJob. If that’s not enough, here are a few more ways Judoscale is a step above other autoscalers:

Judoscale works on all Heroku dyno types, while Heroku’s own autoscaling only works on Performance dynos.
Judoscale will autoscale your web and worker dynos. If you have several worker processes in your app, then you can autoscale each of them independently.
Judoscale is the fastest autoscaling service. When your queues start to back up, Judoscale will scale your dynos in seconds, not minutes.
Judoscale is integrated into Heroku’s add-on marketplace. This makes installation a breeze, gives you integrated billing, and it means you don’t have to grant Judoscale access to your Heroku account.

How to set up autoscaling for your GoodJob queues

Okay, let’s get to it! We’ll have your queues autoscaling in minutes.

First, install the Judoscale add-on to your Heroku app:

heroku addons:create judoscale --app my-app

This will install the free “White Belt” plan, which is perfect for getting started.

Next, launch the Judoscale dashboard:

heroku addons:open judoscale --app my-app

Judoscale will walk you through the setup process. Select “GoodJob” as your job backend, and paste the provided code into your Gemfile.

Once you’ve installed the gems and deployed your app, you’ll see your GoodJob queues in the Judoscale dashboard.

If your queues are empty, it might not look like much is happening. Try enqueueing a bunch of jobs, and you’ll see your queues start to back up.

Autoscaling is turned off by default, so you’ll need to enable it. But first, let’s look at the configuration options.

Autoscale configuration options

Scroll down to the “Autoscale Trigger” section. This is where you’ll set a queue time range that will trigger autoscaling. The default range of 1–5 seconds is a good starting point. It means that your dynos will scale up when jobs are enqueued for more than 5 seconds. Your dynos will scale back down when queue time is consistently under 1 second.

You also want to select the queues you want to monitor for autoscaling. By default, Judoscale will monitor all of your queues. If you have some low-priority queues that you don’t want to trigger autoscaling, then you can uncheck them here. If you have multiple worker processes that process different queues, then this is where you’ll assign the appropriate queues to each process.

Finally, you can set a minimum and maximum number of dynos. This is useful if you want to make sure that you always have at least one dyno running, or if you want to limit the number of dynos that can be scaled up.

If your worker process is often idle, you can even scale down to zero dynos. Judoscale will automatically scale back up when jobs are enqueued. This is a great way to save money on low-volume apps like staging and demo apps.

Monitoring your GoodJob queues

Once you save your changes, autoscaling will be enabled. You can monitor your queues and scaling activity in the Judoscale dashboard.

Judoscale will automatically scale your dynos up and down as needed. The “Dyno Count” graph shows the number of dynos that are currently running. The “Queue Time” graph shows the job queue time (also called “queue latency”) for the queues you selected earlier.

Next steps

You can use the “White Belt” plan for free as long as you like. It’s limited to 20 autoscale events per month, which is perfect for experimentation or low-volume apps. If you need more, check out our pricing page to determine which plan is right for you.

["Quick Tip: Fix ActiveRecord", "Connection Pool Errors For Good"]

Adam McCrea — Tue, 18 Oct 2022 00:00:00 +0000

You’ve almost certainly seen this error, perhaps more times than you can count.

In this quick tip article, we’ll dissect what causes the error and what you can do to fix it.

What causes an ActiveRecord::ConnectionTimeoutError?

The key word in the error message is “pool”:

could not obtain a connection from the pool within 5.000 seconds; all pooled connections were in use

Each Rails process maintains a pool of database connections. Why not just one connection? Because you’re probably running multiple threads in each Rails process, and you don’t want those threads contending for a single database connection.

The ConnectionTimeoutError is saying the pool doesn’t have enough connections. The error has nothing to do with your database. It’s a Rails configuration issue.

You generally want your connection pool equal to the number of Rails threads.

How do you ensure an optimal ActiveRecord connection pool size?

The pool size is specified in database.yml, which looks like this by default:

default: &default
  adapter: postgresql
  encoding: unicode
  # For details on connection pooling, see Rails configuration guide
  # https://guides.rubyonrails.org/configuring.html#database-pooling
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>

This means your connection pool is equal to the RAILS_MAX_THREADS environment variable. This is a good thing, because the default puma.rb uses the same environment variable to specify the number of threads:

max_threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 }
min_threads_count = ENV.fetch("RAILS_MIN_THREADS") { max_threads_count }
threads min_threads_count, max_threads_count

As long as you haven’t changed those files, you’re golden—at least for your Rails web processes. But what about worker processes?

Understanding Sidekiq concurrency

The ActiveRecord::ConnectionTimeoutError most often occurs in a worker process, and the culprit is Sidekiq concurrency. You can set Sidekiq concurrency in a few ways:

Specify concurrency in a sidekiq.yml configuration file.
Specify concurrency via the -c command line flag when starting Sidekiq.
Specify concurrency via the RAILS_MAX_THREADS environment variable.

If you do none of the above, concurrency defaults to 5 threads.

So what does “concurrency” really mean here? Sidekiq concurrency is the number of threads in your Sidekiq (Rails) process. And remember from above that we need our connection pool to have enough connections for all of our Rails threads.

If you’re seeing ActiveRecord::ConnectionTimeoutError, it’s very likely you’ve specified a Sidekiq concurrency that’s larger than your connection pool.

Fixing your database connection pool for Sidekiq

You probably already saw it. Since the Rails defaults already use RAILS_MAX_THREADS for both the database pool and Puma threads, you can keep everything in sync by using that environment variable for Sidekiq concurrency as well.

Don’t specify concurrency in a sidekiq.yml file or via the -c command line option. Just use RAILS_MAX_THREADS, and you won’t see this error.

What if you want to configure Sidekiq and Puma separately?

The RAILS_MAX_THREADS approach has a significant limitation: It assumes you always want Puma and Sidekiq running the same number of threads. This is a fine way to start out, but it’s too constraining when you’re trying to optimize your server resources.

The trick is to dynamically override RAILS_MAX_THREADS at runtime. Here’s an example Procfile:

web: bundle exec rails s
worker: RAILS_MAX_THREADS=${WORKER_THREADS:-${RAILS_MAX_THREADS}} bundle exec sidekiq

When the worker process is started, RAILS_MAX_THREADS is set dynamically using shell parameter expansion. If WORKER_THREADS is available, that value is used, otherwise, it falls back to the current value of RAILS_MAX_THREADS.

This approach is the best of both worlds. You can start out just using RAILS_MAX_THREADS, and that value will be used for your database pool, Puma threads, and Sidekiq concurrency. When you want to tweak your Sidekiq concurrency independently, you can set WORKER_THREADS, and your database pool will be kept in sync.

If you instead decided to adjust Sidekiq concurrency via sidekiq.yml or the -c CLI flag, your database pool and concurrency would be out of sync. And that’s probably why you’re here. 😉

Understanding Queue Time: The Metric that Matters

Adam McCrea — Mon, 25 Oct 2021 00:00:00 +0000

Monitoring and scaling a production web app can be daunting. With so many components and metrics to track, where do you focus?

Request queue time stands out as the essential metric for server health and horizontal scaling, but it’s also one of the least understood. In this post, I’ll explain what request queue time is, how to measure it, and how to use those measurements to improve your app performance.

Contents
What is request queue time?
Web request lifecycle
Web request metrics
Why is request queue time so important?
How is request queue time measured?
Factors that impact request queue time
How your code impacts request queue time
Using request queue time in practice

What is request queue time?

Request queue time is the time passed between a load balancer receiving a request and the application code processing the request. It’s how long a request has to “wait” before actual processing begins.

In an optimally configured and scaled infrastructure, 5-10ms queue time is typical. As we’ll see later, some queue time will always be present since it includes network time between infrastructure components.

To understand this better, let’s step back and look at the full web request lifecycle.

Web request lifecycle

We’re going to ignore things like browsers and DNS for this discussion and focus only on web requests from the perspective of your web server infrastructure.

The first component to receive a web request is your load balancer. If you’re running on AWS, this is probably an ELB (Elastic Load Balancer) instance. On your own infrastructure, you might be load-balancing with NGINX or HAProxy. On Heroku, load balancing is handled by the router layer.

This post will primarily use Heroku terminology, but the concepts apply to any infrastructure.

The load balancer (router) is responsible for forwarding the request to one of your active web dynos (a “dyno” is a Heroku container). Heroku uses a random routing algorithm, so any dyno—even the busiest one—might receive the next request.

A web dyno receives the request via a web server such as Puma. The web server buffers the request and hands it to an available application process, where your app code finally begins processing the request.

You can have many app processes running on a dyno or container, and it’s the web server’s job to queue the request if all of those app processes are busy. Waiting for an available app process is the heart of request queueing. It’s pure waste, and we’ll talk later about how to avoid it.

After your app code runs, the response is sent back through the same components until finally sent back to the client.

Web request metrics

You’ll typically encounter three important metrics on the server-side of a web request:

Service time is the total time to service the request, from when the load balancer receives it until the response is sent back to the client. It’s typically too broad a metric to be actionable on its own.
App time is the time spent in the application code and any upstream dependencies such as database queries. This is the metric you can impact directly with the code you write.
Queue time is the time spent before getting picked up by the application code. It includes network time between the load balancer (router) and web dyno, and it includes time spent waiting for an available app process within a web dyno. This metric is a direct reflection of server capacity, rather than app performance.

Notice that service time is the superset of queue time and app time.

Service Time = App Time + Queue Time

Service time might also include the network time sending the response back to the client. This is usually negligible and safe to ignore.

Why is request queue time so important?

What do these metrics mean in terms of app performance and scaling? If you take away one thing, remember this:

App time is all about your code, and queue time is all about your servers.

Service time is generally not a helpful metric on its own. A slow, inefficient app can have short queue times, and a very efficient app can have long queue times. Service time would be the same in either case, but you would address the issue with very different approaches.

Queue time is the only measurement that tells you if your servers have capacity for more requests or if they’re already pushed beyond their limits. CPU and memory might give you some hints about this, but they don’t tell you the actual impact on your users.

Queue time shows the user impact of your server capacity. If your request queue time is increasing, you very likely need more dynos. If your request queue time never increases, you might be over-provisioned and could reduce spending by decreasing dynos.

How is request queue time measured?

If you’re not already monitoring request queue time, it’s crucial to understand how to measure it. Most tools that measure request queue time use a request header added by the load balancer. This request header marks the time when the request entered the load balancer.

Heroku adds this header automatically, and it’s configurable in NGINX and HAProxy. ELB does not provide a way to set this header, making queue time hard to measure on AWS.

👀 Note

Capturing request queue time in AWS is still possible if you run NGINX alongside your application container. Read more about how Judoscale’s ECS autoscaling works.

Your application is aware of when it begins processing the request, so it can use this header to calculate the difference between the two timestamps. In Rails, you would typically do this calculation in a Rack middleware.

Several tools will handle this for you:

New Relic — Suite of products for observing your entire infrastructure.
Scout APM — Performance monitoring for several languages and frameworks.
Judoscale — Powerful monitoring and proactive, automatic scaling for Heroku, Render, and Amazon ECS.

For Rails apps, each of these tools provides a Ruby gem that adds middleware to your app. The middleware measures request queue time before any other app processing takes place.

Factors that impact request queue time

As you measure and monitor your request queue time, keep in mind the following factors that cause high queue times:

Too few dynos or containers. The most common culprit of high queue times. If you don’t have enough web dynos to support your current traffic load, requests will inevitably queue as they wait for available app processes.
Too few app processes per dyno. Since Heroku uses a random routing algorithm, requests are not distributed evenly across your web dynos. Even if you think you have sufficient web dynos for your current traffic, you also need some “in-dyno concurrency” to handle this unbalanced routing behavior. It’s ideal to have least three web processes running on each dyno.
Large request payloads. Queue time includes network transport time as the request makes its way to a web dyno, and this network time will be higher for large request payloads.
Slow clients. Just like large request payloads, slow clients (such as a phone on a poor cellular connection) will require more network time, thus higher queue time.
NTP drift. Heroku uses the Network Time Protocol (NTP) to synchronize system time across its entire architecture, but clocks sometimes still drift out of sync. This drift can cause queue time anomalies because queue time is usually measured using timestamps from two independent machines.
Slow app code. Wait! Didn’t I just say queue time is separate from app performance? Yes, but slow requests can create a bottleneck in your app servers, which causes other requests to queue. Let’s dive into this relationship…

How your code impacts request queue time

App time and queue time measure different things, but that doesn’t mean they exist independently of one another.

A few slow requests can cause subsequent requests to “stack up” behind them, sometimes even resulting in the dreaded H12 Request Timeout errors on Heroku.

Adding more dynos will not improve slow app endpoints, but it can help mitigate the cascading effects of request queueing. Horizontal scaling (adding dynos) is essentially a band-aid to stop the bleeding in this scenario, but it doesn’t fix the problem.

So if request queue time and app time are high, what do you do?

First, add more dynos to reduce request queue time and “stop the bleeding”. This will not improve app times, but it will improve user experience and avoid H12 errors.
If you’re not using an APM like Scout or New Relic, add one of these tools to identify which requests are slow and why they are slow.
Often, improving app performance can be as simple as moving some synchronous processing into a background job, or adding pagination to a request that’s querying and rendering far more data than is necessary.

Using request queue time in practice

As we’ve seen request queue time is the key metric for understanding server capacity. Service time is too broad, and app time is unrelated to server capacity. It’s critical to monitor request queue time to know if you’ve over-provisioned or under-provisioned your web dynos.

Request queue time is also the perfect metric for autoscaling. Make sure you’re using an autoscaler like Judoscale that leverages request queue time for autoscaling by default.

If you’re not at least monitoring your request queue time, that’s step one. Use Scout or New Relic to monitor your queue time alongside other app performance metrics, but remember not to conflate the two. “App performance” metrics are driven by the code you write. Queue time is a product of your infrastructure.

Use your queue time metrics to determine how many dynos or containers you need. And if you want to automate this process, consider using a queue-driven autoscaler like Judoscale.

Happy scaling!

Heroku Preboot Gotchas: Read This First

Taurai Mutimutema — Thu, 9 Sep 2021 00:00:00 +0000

Maintaining an engaging app experience involves creating and deploying new features regularly. To achieve this, developers and most software development methodologies have been adopting a continuous integration and deployment (CI/CD) mantra.

As straightforward as the idea of CI/CD sounds, it comes with a few complexities. For instance, integrating new features into a production environment is risky: so much that companies used to schedule and enforce system maintenance service breaks. The result was inevitable downtime that undoubtedly inflicts discomfort on the end-user side.

Clearly, somebody had to come up with an alternative deployment approach. That approach came through as the blue-green method, and that someone was the duo of Dan North and Jez Humble.

What Is Blue-Green Deployment

Over time, developers came up with the blue-green deployment “trick” that mitigated downtime and some errors consistent with seeking CI/CD. In a nutshell, a green environment would replace a live blue doppelgänger setup, implementing fresh features in the process.

Key:

A - The old connection between users and your app

B - The blue app’s environment transitions to green

C - New connection between users and the green environment

The transition phase has to happen over some time, during which both B and C are live and accessible to users. We’ll go into detail about this duality in a bit. For now, let’s discuss why blue-green is a worthwhile deployment strategy.

Advantages of Blue-Green Deployments

There are plenty of reasons why blue-green is a better deployment method than the old break service route.

1. Easy rollbacks. Since you’re replacing a functional (blue) environment, any fatal errors that might slip through your testing are easy to undo. As long as you don’t immediately delete the original environment, it would be a matter of redirecting traffic back to the old app version.

2. Maintains app uptime. This factor is in comparison to the old approach of shutting everything down before affecting new features. The blue-green deployment method’s best upside is how users maintain connections to services.

Disadvantages of the Blue-Green Approach

While it has some interesting benefits, the blue-green deployment approach is not without its thorns.

1. It can get expensive. The concept of running identical resource environments piles costs on your account. Even when done onsite, you may have to procure hardware and spend on technical assistance to get things done.

2. Database handling complexities. Every database schema change must be made to work simultaneously with the blue and green app versions. Typically this means that schema changes require at least two separate deployments—one for the schema change, and one for the application change.

3. Performance lags. Moving from one environment to another can cause dips in application performance. This cold switch often purges performance assets that take time to build (think of cache here). Although this was always the case before CI/CD came in fashion, the blue-green method on its own does very little to suppress it.

4. Unfinished requests. At the very second that you switch environments, chances are every user trying to run requests will be interrupted. Unless you build some continuity into your app, they’d have to be logged out and asked to redo every task they had previously attempted.

For Heroku-hosted applications, the blue-green deployment method would then be implemented using two sets of web dynos. Interruption is mitigated by directing traffic away from the blue dynos while they are shutting down but allowing them to finish existing requests. Heroku handles this by giving dynos thirty seconds to shut down.

Note how the usual Heroku dyno start behavior immediately sends traffic to fresh dynos, even while they are still booting and unable to handle requests. This is perceived as downtime. The newly started instance takes time to “warm-up” and perform at comparable levels to its predecessor.

Enter Heroku Preboot

Accessible in the standard dyno as well as the performance dyno types, Preboot makes possible that dual environment essential for blue-green deployment. Let’s explore how this change in dyno start behavior works.

Preboot hinges on the fact that each time you make changes to your app in production, Heroku initiates a release process (it boots a new version before the old one is shut down) for integration. As soon as the new build concludes, a new dyno is created automatically as a new version of the app. This way, the app appears to be online on the frontend regardless of the interchange of the two different versions.

At this point, you will have created an ephemeral blue-green scenario. In approximately three minutes, the traffic that was heading toward the old dynos are routed toward the newly formed replacements.

The process concludes when all services are accessible and the new dynos are stable, at which point the old ones are terminated. As you may have gathered already, there are pros and cons to this process.

The Benefits of Continuous Deployment with Preboot

There are two main advantages of preboot:

1. Preboot is efficient. The automated creation of dynos cuts a lot of time off the budget that is otherwise set aside for handling downtime.

2. It provides better application environment starts. Diverting traffic to new environments removes the performance dip issue consistent with starting a dyno and pushing traffic toward it at the same time.

The Gotchas of Preboot

Watch out for the following less-than-desirable outcomes when working with preboot.

1. Database migrations are more complicated. Synchronizing database schema alterations to facilitate the dual environment usage phase can be tricky. While this is subject to the complexity of your app, even having different database copies makes for further complexity. Preboot essentially leaves this part in your care.

2. Dual service window. In theory, Heroku stops the old dynos as soon as the new ones are primed for traffic. However, there’s a small gap during which you actually have services emanating from both environments. This doubles the number of concurrent connections, and could cause issues at the database layer if the connection limit is insufficient.

3. Lack of control. When using Preboot, you’re completely reliant on Heroku to make the routing switch to the new dynos. This happens roughly three minutes after a deploy, and you can’t change this, nor do you have any visibility into whether or not the switch has taken place.

The Take Home

Clearly, there’s a trade-off of complexity vs. momentary slowness when using Preboot to handle continuous deployment in Heroku. Ultimately it depends on what’s most important to your team—simplicity and control, or maximum uptime.

Heroku Pipelines Success Guide

Kasper Siig — Tue, 17 Aug 2021 00:00:00 +0000

Back when Heroku started they got popular for a variety of reasons. They were offering a hosting service with a great free tier, they were heavily involved in providing opportunities for students, and most notably, they introduced the git push deployment model.

Normally, you would add a repository hosting service like GitHub as your git remote and then use a continuous integration tool to deploy to your server. Heroku’s “git push” model simplified the process, allowing you to add Heroku as your remote. Simply push your code and your application is deployed. It’s just that easy.

Over time, it’s become apparent that while the “git push" model is very efficient for quick deployments, it doesn’t fit into complex deployment scenarios where you have to manage multiple environments. Since then, Heroku has introduced Pipelines to help alleviate some of these issues.

In this tutorial, I’ll show you how to take advantage of Heroku’s Pipelines to improve your deployment process. I’ll start by showing you how to automate the “git push” deployment model, and then I’ll introduce Pipelines. Finally, I’ll show you how to use pipelines to promote deployments from staging to production. Along the way, you’ll learn some of the pros and cons of each approach so you can pick the model that’s best for your Heroku application.

Deploying a single Heroku app

Before understanding how the new process works, it’s important to first understand how you would traditionally deploy to Heroku. This will give you some context and the ability to argue for your decision if someone asks you to defend your choice to move to Pipelines.

First, create a new Heroku app:

$ heroku create

This will create an app within Heroku, initialize an empty Git repository in the current directory with the Heroku remote set. You can verify this by running git remote -v.

Once you’ve built the application and you’re ready to deploy, simply run:

$ git push heroku main

Your application is now deployed.

As you can see, this deployment method is very simple, but as discussed above, it can become tedious to manage as your team grows and the application becomes more complex. While Heroku now allows you to connect to GitHub, that alone doesn’t provide a way of managing multiple environments.

These issues led to the creation of Heroku Flow, which encompasses six different capabilities for deployment, of which Heroku Pipelines is the first.

The mental model for Heroku Pipelines

Within a Heroku Pipeline, there are two main terms you need to know: stages and promotions.

Before diving into what a promotion is, you need to understand what a stage is. A stage is, simply put, your application at various points in its lifecycle. In a typical workflow, and the one that will be recommended later, you will be using two stages; staging and production. If you want you could also have development and review as a stage. Once your code is deployed to a stage, you move it on in the workflow by promoting it.

Instead of the traditional build and deploy workflow to get code to production, the code that lives in the staging environment is essentially swapped and becomes production. This makes deployments to production incredibly fast.

Review apps

Another useful feature in Heroku Pipelines is review apps. When you make a pull request, it’s very common to request a colleague’s code review. But, code reviewers may not catch everything just by looking at the code—they are only humans after all.

Fortunately, Heroku review apps help you solve this problem. When enabled, Heroku spins up a new app using the code in your pull request and gives it its own unique URL. Reviewers can check out the deployed application and manually test features or UI changes that automated tests or code reviews might not catch.

Once the pull request is closed, the review app is deleted. This process can help you catch bugs before they’re merged, but even without review apps, Heroku Pipelines are still a very valuable tool for orchestrating a staging-to-production workflow.

A simple Pipelines deployment workflow

Before detailing the steps to set up a Heroku Pipeline, let’s take a closer look at what the pipeline will be doing. Heroku doesn’t force any particular flow onto you, so you’ll have to figure out a flow that works for you and your organization.

Here’s an example of a recommended workflow to get you started:

Step 1: Create a feature branch

As in almost any software project, you should have a main (or master) branch. In this flow, your main branch will always contain the code that’s in staging and usually in production (once promoted). When you start to develop a new feature, create a new branch from main and do the development work there.

Step 2: Review the code and review app

Once the feature is done, you will open a pull request. If you have review apps enabled, Heroku will spin up a review app which you and your colleagues can look at as part of the review process. Your CI provider may also run automated tests and linting checks.

Step 3: Deploy to staging

Once the review is done, the tests pass, and your team agrees that the feature can be deployed, you will merge the pull request. This will trigger Heroku to build the application, perform release phases that are configured, and deploy the app to one of your staging slots.

Step 4: Promote to production

Now comes the final step, which is typically done manually: promoting staging to production. While this can be automated, it needs to be closely monitored so that someone is ready to step in and perform a rollback if needed.

Using this workflow, you gain the advantage of incredibly quick deployments to production, since you don’t have to build your application again after verifying it in staging. Not only that, you can have more confidence in what you’re deploying since it’s already been run in a production-like environment.

Setting up a Pipeline

Now that you know what a Heroku Pipeline is, how it works, and have seen a recommended flow, it’s time to understand how to set up your own Heroku Pipeline.

Start by going to the Heroku dashboard and clicking “New” in the top right corner, then “Create new pipeline”.

On the next page, give the pipeline a fitting name. Next, connect to GitHub and choose the repository that you want a pipeline created for. For this guide, I’ve just forked the example Node.js project from Heroku.

Search for your repository and click “Connect” and then “Create Pipeline”.

The next screen will show you an overview of the pipeline that’s just been created:

Next, enable review apps by clicking “Enable Review Apps” and check the checkboxes that apply to you. I recommend checking both “Create new review apps for new pull requests automatically” and “destroy stale review apps automatically”. Only check off “Wait for CI to pass” if you’ve actually set up Continuous Integration. This is a bit of a design flaw, but Heroku allows you to check it even if you do not have CI enabled. It won’t show any errors and you’ll still be able to create review apps, but Heroku won’t do it automatically.

Next, create an app for staging and another for production by clicking “Add app” under staging and production respectively.

Finally, press the arrows under staging, click “Configure automatic deploys…”, and click “Enable Automatic Deploys”. Here you can also choose to check “Wait for CI to pass before deploy”, but you’ll run into the same issue as with review apps if you do not have CI enabled.

At this point, everything is configured to run automatically, but Heroku doesn’t deploy anything until you push an update to your repository. So, to deploy your application for the first time, click the arrows under staging again. This time, click “Deploy a branch…”, and then “Deploy”. Now your app has been deployed to staging, and the only step left is to promote it to production.

3 Ways of promoting staging to production

There are three main ways of promoting an app from staging to production: via the CLI, Chat Ops, or the Heroku GUI. You can read about each below or find examples in the Heroku documentation.

1. Promoting via CLI

For traditional developers, this is likely the easiest way of promoting an app. From your terminal, run:

$ heroku pipelines:promote -a

Because an app name is universally unique, you don’t need to specify the pipeline it’s related to. Here you can see how I’ve done it for this example pipeline.

2. Promoting via Chat Ops

Heroku has developed a Slack integration that you can use to promote staging to production and access many other features that Heroku Flow provides. For example, using the Slack integration to promote this demo application would require the command /h promote heroku-pipeline-demo.

You can read more about how Chat Ops works for Heroku by reading the official documentation.

3. Promoting via GUI

The last way to promote an application is perhaps the simplest for those not deep in the code on a daily basis. From the Heroku Dashboard, click “Promote to production” under the staging slot. In the popup, you can verify that it’s the right commit, and when you’re sure everything is correct, click “Promote”.

Before you finally click “Promote”, you can click “Compare on GitHub” which will bring you straight to the “files” view of the Pull Request. This is one of the added benefits you get from the GUI.

Conclusion

There are many benefits to using Heroku Pipelines. They live up to the original philosophy of Heroku—to make developers’ lives easier—while allowing teams to use more sophisticated testing and deployment processes. Using Pipelines, you can easily manage multiple environments and swap code between staging and production as it’s ready for the real world. While I showed you a simple two-stage scenario with a single app, you can create other environments like “development” or “QA”.

When used in conjunction with review apps and promotions, Heroku Pipelines ensure that you not only have a fast workflow, but also a consistent one that will help you ensure that production is always operating as intended.

Heroku Log Tail — Advanced Tools & Tips

Adam McCrea — Fri, 30 Jul 2021 00:00:00 +0000

Your log data is a treasure-trove of information about your application, but it can be overwhelming. This post will dig into several strategies for extracting metrics and other helpful information from your logs. We’ll start with the basics of the heroku logs command, then we’ll dig into the real fun using a tool called Angle Grinder.

How to view your Heroku logs

heroku logs on its own just prints the most recent logs from your app and then exits. Generally that’s not very useful. I almost always want the -t (or --tail) option to continually tail my logs. Additionally I usually want it scoped to a specific dyno process, so I’ll include -d router, -d web, or -d worker so I’m only seeing relevant logs.

Here’s how I would tail my router logs:

heroku logs -t -d router

2021-07-28T16:23:07.870849+00:00 heroku[router]: at=info method=POST path="/api/REDACTED/v2/reports?dyno=web.8&pid=4" host=api.railsautoscale.com request_id=0ce66277-877c-4d4f-91c4-2c1075089b41 fwd="3.84.54.241,172.70.34.122" dyno=web.7 connect=1ms service=156ms status=204 bytes=358 protocol=https
2021-07-28T16:23:07.774247+00:00 heroku[router]: at=info method=POST path="/api/REDACTED/v2/reports?dyno=web.3&pid=81" host=api.railsautoscale.com request_id=fe46b69d-8938-4d41-a566-4c837050f6da fwd="3.85.98.203,172.69.62.61" dyno=web.14 connect=1ms service=14ms status=204 bytes=358 protocol=https
2021-07-28T16:23:07.627308+00:00 heroku[router]: at=info method=POST path="/api/REDACTED/v2/reports?dyno=web.1&pid=11" host=api.railsautoscale.com request_id=f5b69be4-8283-48f6-b683-30c051b4f51d fwd="34.232.107.232,172.70.42.94" dyno=web.11 connect=0ms service=327ms status=204 bytes=358 protocol=https
2021-07-28T16:23:07.740752+00:00 heroku[router]: at=info method=POST path="/api/REDACTED/v2/reports?dyno=web.4&pid=4" host=api.railsautoscale.com request_id=89d2da0d-e1d8-484d-99f2-bf26921ba9a5 fwd="3.249.54.29,162.158.158.123" dyno=web.11 connect=0ms service=354ms status=204 bytes=358 protocol=https
2021-07-28T16:23:07.881220+00:00 heroku[router]: at=info method=POST path="/api/REDACTED/v2/reports?dyno=web.1&pid=24539" host=api.railsautoscale.com request_id=37171239-4524-46cf-9dce-7cdda8bd4ace fwd="3.95.158.194,172.70.34.173" dyno=web.20 connect=2ms service=34ms status=204 bytes=358 protocol=https
2021-07-28T16:23:07.743590+00:00 heroku[router]: at=info method=POST path="/api/REDACTED/v2/reports?dyno=web.1&pid=50" host=api.railsautoscale.com request_id=bb063a51-8d83-4284-b26a-db31c3f4b485 fwd="54.204.194.19,172.70.34.122" dyno=web.8 connect=1ms service=19ms status=204 bytes=358 protocol=https

Scoping it further

Even with the -d filter, it’s usually way too much data to be useful, so I’ll want to filter it even more. This is where chaining with grep is useful.

Here I’ll tail my worker logs, but only show the entries that include “StatWorker”:

heroku logs -t -d worker | grep StatWorker

2021-07-28T16:28:11.718082+00:00 app[worker.18]: pid=4 tid=1fzk class=StatWorker jid=bae74463b8a3c977aaaac6b4 args=3061,2021-07-28T16:27:50Z measure#sidekiq.queued=1069 tag#worker=StatWorker tag#queue=default INFO: start
2021-07-28T16:28:11.763375+00:00 app[worker.18]: pid=4 tid=1fy4 class=StatWorker jid=2c4de245191802b57344bae8 args=2344,2021-07-28T16:27:50Z measure#sidekiq.queued=933 tag#worker=StatWorker tag#queue=default elapsed=0.181 INFO: done
2021-07-28T16:28:11.782176+00:00 app[worker.18]: pid=4 tid=1fy4 class=StatWorker jid=c704e40d09cf93b628d023f3 args=3000,2021-07-28T16:27:50Z measure#sidekiq.queued=1129 tag#worker=StatWorker tag#queue=default INFO: start
2021-07-28T16:28:11.809741+00:00 app[worker.18]: pid=4 tid=1fzk class=StatWorker jid=bae74463b8a3c977aaaac6b4 args=3061,2021-07-28T16:27:50Z measure#sidekiq.queued=1069 tag#worker=StatWorker tag#queue=default elapsed=0.092 INFO: done
2021-07-28T16:28:11.820178+00:00 app[worker.18]: pid=4 tid=1fzk class=StatWorker jid=7af1852ff673a9f747d8f4ad args=1920,2021-07-28T16:27:50Z measure#sidekiq.queued=1171 tag#worker=StatWorker tag#queue=default INFO: start
2021-07-28T16:28:11.719832+00:00 app[worker.20]: pid=4 tid=13c4g class=StatWorker jid=c531a39bf63886c8ce386c13 args=3826,2021-07-28T16:27:50Z measure#sidekiq.queued=922 tag#worker=StatWorker tag#queue=default elapsed=0.149 INFO: done
2021-07-28T16:28:11.721947+00:00 app[worker.20]: pid=4 tid=13c4g class=StatWorker jid=4e2884278d99fb616eb452b1 args=1546,2021-07-28T16:27:50Z measure#sidekiq.queued=1073 tag#worker=StatWorker tag#queue=default INFO: start
2021-07-28T16:28:11.761234+00:00 app[worker.20]: pid=4 tid=8rk60 class=StatWorker jid=9f54c9b8418801e4274f0fe7 args=2191,2021-07-28T16:27:50Z measure#sidekiq.queued=928 tag#worker=StatWorker tag#queue=default elapsed=0.184 INFO: done

This uses a “pipe” to send the output of heroku logs as input to the grep utility. Grep searches plain text data using regular expressions. We’re not using any regexp syntax in this example, but we certainly could.

👀 Note

Check out the Better Stack guide to logging on Heroku for an exhaustive overview, including logging basics and best practices.

Chaining heroku logs -t with grep is how I’m most often interacting with my Heroku logs, but sometimes I need more power.

Extracting specific values

It’s time to meet Angle Grinder, a real-time log analysis power tool. It can parse many formats, extract values, and even perform aggregations. Let’s just dive right in with an example.

Heroku’s router logs are formatted the same for every Heroku app, so let’s use those. Here’s the same router log output we saw above:

heroku logs -t -d router

2021-07-28T16:23:07.870849+00:00 heroku[router]: at=info method=POST path="/api/REDACTED/v2/reports?dyno=web.8&pid=4" host=api.railsautoscale.com request_id=0ce66277-877c-4d4f-91c4-2c1075089b41 fwd="3.84.54.241,172.70.34.122" dyno=web.7 connect=1ms service=156ms status=204 bytes=358 protocol=https
2021-07-28T16:23:07.774247+00:00 heroku[router]: at=info method=POST path="/api/REDACTED/v2/reports?dyno=web.3&pid=81" host=api.railsautoscale.com request_id=fe46b69d-8938-4d41-a566-4c837050f6da fwd="3.85.98.203,172.69.62.61" dyno=web.14 connect=1ms service=14ms status=204 bytes=358 protocol=https
2021-07-28T16:23:07.627308+00:00 heroku[router]: at=info method=POST path="/api/REDACTED/v2/reports?dyno=web.1&pid=11" host=api.railsautoscale.com request_id=f5b69be4-8283-48f6-b683-30c051b4f51d fwd="34.232.107.232,172.70.42.94" dyno=web.11 connect=0ms service=327ms status=204 bytes=358 protocol=https
2021-07-28T16:23:07.740752+00:00 heroku[router]: at=info method=POST path="/api/REDACTED/v2/reports?dyno=web.4&pid=4" host=api.railsautoscale.com request_id=89d2da0d-e1d8-484d-99f2-bf26921ba9a5 fwd="3.249.54.29,162.158.158.123" dyno=web.11 connect=0ms service=354ms status=204 bytes=358 protocol=https
2021-07-28T16:23:07.881220+00:00 heroku[router]: at=info method=POST path="/api/REDACTED/v2/reports?dyno=web.1&pid=24539" host=api.railsautoscale.com request_id=37171239-4524-46cf-9dce-7cdda8bd4ace fwd="3.95.158.194,172.70.34.173" dyno=web.20 connect=2ms service=34ms status=204 bytes=358 protocol=https
2021-07-28T16:23:07.743590+00:00 heroku[router]: at=info method=POST path="/api/REDACTED/v2/reports?dyno=web.1&pid=50" host=api.railsautoscale.com request_id=bb063a51-8d83-4284-b26a-db31c3f4b485 fwd="54.204.194.19,172.70.34.122" dyno=web.8 connect=1ms service=19ms status=204 bytes=358 protocol=https

You’ll notice that the metadata is all formatted as [key]=[value]. This format is called logfmt. It’s embraced heavily by Heroku, and it’s one of the many formats that Angle Grinder understands.

Go follow the Angle Grinder installation instructions then come back to follow along.

Let’s use Angle Grinder’s logfmt parsing for Heroku router logs.

heroku logs -t -d router | agrind '* | logfmt'

[2021-07-28T16:50:11.654197+00:00=None][at=info][bytes=358][connect=1ms][dyno=web.2][fwd=54.154.157.222,162.158.159.109][heroku[router]:=None][host=api.railsautoscale.com][method=POST][path=/api/REDACTED/v2/reports?dyno=web.3&pid=79][protocol=https][request_id=d4ecf840-666b-430f-81c3-082b4e58c876][service=29ms][status=204]
[2021-07-28T16:50:11.450697+00:00=None][at=info][bytes=358][connect=4ms][dyno=web.18][fwd=34.233.123.109,172.69.62.81][heroku[router]:=None][host=api.railsautoscale.com][method=POST][path=/api/REDACTED/v2/reports?dyno=web.5&pid=242][protocol=https][request_id=1599b9f0-17ec-48ae-8341-5d917bcbd598][service=19ms][status=204]
[2021-07-28T16:50:11.644107+00:00=None][at=info][bytes=358][connect=1ms][dyno=web.16][fwd=3.84.186.26,172.70.38.19][heroku[router]:=None][host=api.railsautoscale.com][method=POST][path=/api/REDACTED/v2/reports?dyno=web.1&pid=9][protocol=https][request_id=2bef6858-458f-47f9-9e97-8c35c4d9ebef][service=12ms][status=204]

Okay, that’s really not any better than before. We can see that the key-value pairs were parsed, but it’s still just a wall of text.

Let’s try again, this time using the --format option to only display the “connect” and “service” metrics from the router logs.

heroku logs -t -d router | agrind '* | logfmt' --format '{connect} {service}'

2ms 9ms
1ms 160ms
1ms 15ms
0ms 6ms
4ms 192ms

Now we’re getting somewhere! Let’s expand the formatting string to display even more data, along with some defined widths so we have control over the alignment. We’ll also split the command into multiple lines to make it more readable.

heroku logs -t -d router | agrind \
  '* | logfmt' \
  --format '{dyno:<6} {connect:6} {service:7} {status:4} {method:5}  {path}'

web.3     0ms    52ms  204  POST  /api/REDACTED/v2/reports?dyno=web.1&pid=37
web.23    1ms    17ms  204  POST  /api/REDACTED/v2/reports?dyno=web.2&pid=4
web.4     1ms    40ms  204  POST  /api/REDACTED/v2/reports?dyno=web.3&pid=186
web.1     1ms    17ms  204  POST  /api/REDACTED/v2/reports?dyno=web.1&pid=86
web.13    1ms    10ms  204  POST  /api/REDACTED/v2/reports?dyno=web.21&pid=65

Beautiful! We’ve broken the wall of text into a nicely-aligned, human-readable stream of output. You can use this to watch your web traffic in real-time. From this we could quickly see if a particular dyno is struggling, or if a particular path is being requested more frequently than we expect.

How does Angle Grinder work?

Okay, let’s peel apart that last example.

First we tail Heroku logs using the Heroku CLI, just like before:

heroku logs -t -d router

Then we pipe the output to agrind (just like we were piping to grep before). agrind takes a single argument—called the “query"—which tells Angle Grinder how to process the incoming text. Queries can get complex as we’ll soon see, but this example keeps it simple:

heroku logs -t -d router | agrind '* | logfmt'

Here, the query is * | logfmt, and we enclose it in single quotes to separate it from the rest of the command. It might look a bit odd seeing the "pipe” character within the query string, so take care to not confuse it with a command line pipe.

The Angle Grinder query syntax is a composition of an initial filter with a series of operators. Here’s how this query breaks down:

* – This is the initial filter, and it tells Angle Grinder to operate on every input line.
logfmt – This is an operator, and it extracts key/value fields using the logfmt syntax.

The first operator is generally a parser. Next we’ll see how subsequent operators would then operate on the extracted fields.

Aggregating your log data in real-time

With a steady volume of requests, what we’ve seen so far can still be an overwhelming amount of information. Depending on what we’re looking for, we can use Angle Grinder’s aggregate operators to summarize log data in real-time.

For example, here’s a snapshot of 50th and 95th percentile service times by dyno:

heroku logs -t -d router | agrind \
  '* | logfmt | p50(service), p95(service), count by dyno | sort by p50 desc'

dyno         p50        p95        _count
-------------------------------------------------
web.2        57         369        622
web.3        40         287        622
web.1        36         114        598
web.7        18         306        612
web.5        15         93         641
web.6        13         41         631
web.4        9          25         576

Now our Angle Grinder query has gotten a bit more complex, with several operations chained together:

* – Operate on every input line.
logfmt – Extract fields using the logfmt parser.
p50(service), p95(service), count by dyno – Calculate three aggregates grouped by the “dyno” field. This assumes a “dyno” field was extracted via logfmt, which it was. The output of this operation is three fields: “p50”, “p95”, and “_count”.
sort by p50 desc – Sort the aggregate results by the “p50” field.

We could change the order of these fields, rename them using “as”, or query multiple fields:

heroku logs -t -d router | agrind \
  '* | logfmt | count as count, p95(connect) as connect, p95(service) as service by dyno | sort by p50 desc'

dyno          count        connect        service
---------------------------------------------------------
web.3         618          1              84
web.1         693          1              93
web.6         623          2              62
web.4         624          1              20
web.5         674          2              561
web.2         673          2              573
web.7         621          4              191

The query is getting hard to read, so I’d split it to many lines:

heroku logs -t -d router | agrind \
  '*
    | logfmt
    | count as count,
      p95(connect) as connect,
      p95(service) as service
      by dyno
    | sort by p50 desc'

Parsing more than just logfmt

One query I find very handy is grouping by the second. This let’s me see a live second-by-second summary of how my app is performing. I’ll use this when there’s a production bottleneck so I can see exactly when the issue is resolved.

heroku logs -t -d router | agrind \
  '*
    | parse "T*." as time
    | logfmt
    | count as count,
      p95(connect) as connect
      p95(service) as service
      by time
    | sort by time'

time            count        connect        service
-----------------------------------------------------------
19:20:36        354          2              1452
19:20:37        492          2              825
19:20:38        519          1              740
19:20:39        513          3              270
19:20:40        435          2              74
19:20:41        451          4              76
19:20:42        468          4              82

This query introduces a new parser called parse. For this aggregation, I needed the time string truncated to the second, and that’s not available in the logfmt data. parse gives us the power of arbitrary string matching to pull out values from anywhere in the log line.

Here an example raw router log for reference:

2021-07-28T16:23:07.870849+00:00 heroku[router]: at=info method=POST path="/api/REDACTED/v2/reports?dyno=web.8&pid=4" host=api.railsautoscale.com request_id=0ce66277-877c-4d4f-91c4-2c1075089b41 fwd="3.84.54.241,172.70.34.122" dyno=web.7 connect=1ms service=156ms status=204 bytes=358 protocol=https

Notice that the timestamp precedes the logfmt data, and it is not formatted as a logfmt key/value pair.

parse "T*." as time extracts a single field, “time”, using the pattern T*.. The asterisk (*) is a wildcard, and anything it consumes is extracted as a field. So a single asterisk means a single field is extracted. The text between T and . is 16:23:07, so that becomes the value for “time”.

From there, it’s all what we’ve seen before: logfmt to parse the logfmt data, then aggregate operations, then sorting the output.

Conditional aggregates

One last example to show how we can use conditional logic in our queries.

heroku logs -t -d router | agrind \
  '*
    | parse "T*." as time
    | logfmt
    | count(status == 200) as successful,
      count(status != 200) as failure,
      p95(service) as service
      p95(connect) as connect
      by time
    | sort by time desc'

time            successful        failure        service        connect
-------------------------------------------------------------------------------
19:41:58        189               244            1803           1
19:41:57        182               120            2076           1
19:41:56        191               163            2479           1
19:41:55        174               122            2300           1
19:41:54        176               97             2196           1

This example is taken from a different app, and it’s not doing so well. Over half of the requests are failing. 😱

Still, it’s pretty cool that we can use conditionals in our count aggregate using extracted logfmt fields like “status”.

Your turn

Hopefully by now you’ve installed Angle Grinder, and you’re having some fun slicing and dicing your own log data.

Our Heroku log stream can be completely overwhelming, but tools like heroku logs, grep, and especially agrind give us everything we need to get useful and actionable information from the firehose.