<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>Judoscale Dev Blog</title>
    <description>The Judoscale Dev Blog</description>
    <link>https://judoscale.com/</link>
    <language>en-us</language>
    <item>
      <title>Heroku: What’s Next</title>
      <description>Heroku is shifting to a sustaining engineering model. Here’s what that means, whether you should migrate, and how the top alternatives compare.</description>
      <pubDate>Fri, 27 Feb 2026 00:00:00 +0000</pubDate>
      <link>https://judoscale.com/blog/heroku-whats-next</link>
      <guid>https://judoscale.com/blog/heroku-whats-next</guid>
      <author>Jon Sully</author>
      <content:encoded>
        <![CDATA[<p>In a move that surprised many of us — and one which I still can’t determine the business sense in making at all — Salesforce <a href="https://www.heroku.com/blog/an-update-on-heroku/" target="_blank" rel="noopener">officially announced</a> last week that Heroku will be moving into a “<em>sustaining engineering model</em>”. That’s essentially giant-software-corporation-speak for, “we’re putting this into maintenance mode”. The platform that taught a generation of developers to “push to deploy” has reached its investment limit from its owners 😕.</p>

<p><figure>
  <img alt="An AI-generated image of a pencil-sketch style scene, with a single server rack in a large space, and a sign hanging on that server rack which reads “Heroku Servers”, while several wrenches and tools are on the ground next to the rack, likely to be left there and not used again" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/e62be00b-a705-4806-f10b-d9bde603fd00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/e62be00b-a705-4806-f10b-d9bde603fd00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/e62be00b-a705-4806-f10b-d9bde603fd00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
            <figcaption class="text-center text-sm">
            "Our work here is done"
          </figcaption>

</figure>
</p>

<p>Now, before you jump straight to “abandon ship!!”, there are real questions we should think about when looking ahead. Heroku is still an excellent platform, runs very stably, and, to this day, has the smoothest DX for getting an application into production. For those of us with production apps currently running on Heroku, we need to be pragmatic about what this announcement means for our present, our future, and our time! </p>

<p>Salesforce’s announcement should ultimately drive a calm, collected conversation around both timing and execution. Heroku isn’t a sinking ship, it’s just done shipping new features.</p>

<h2 id="let-s-be-honest-about-urgency">Let’s Be Honest About Urgency</h2>

<p>Urgency itself is a function of two inputs: having a thing to do and believing that you must do that thing <em>soon</em>. The sooner you believe you must do it, the more urgent it will feel. So allow me to reiterate the point I made above and mix in some urgency:</p>

<p><strong>Heroku is not dying today, tomorrow, next month, or next year.</strong></p>

<p><strong>It is <em>not</em> urgent that you migrate away from Heroku</strong>.</p>

<p><figure>
  <img alt="An AI-generated image of a pencil-sketch drawing depicting a person taking a deep breath, with arrows that indicate ‘inhale’ and ‘exhale’, while they have a smile on their face as air leaves them" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/2ac05764-25ef-438a-c3a2-d127f4901a00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/2ac05764-25ef-438a-c3a2-d127f4901a00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/2ac05764-25ef-438a-c3a2-d127f4901a00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
            <figcaption class="text-center text-sm">
            "Breathe"
          </figcaption>

</figure>
</p>

<p>The Salesforce announcement might serve to give you the first component of urgency: we’ll all have a ‘thing to do’ at some point: migrate to another platform. But it certainly does <em>not</em> give the second component (‘do that thing soon’). Heroku isn’t going anywhere. And, if you recall the late two-thousand-teens, this isn’t even the first time that Heroku will spend some years running without major feature improvements! We sincerely believe it’ll be a few <em>years</em> before there’s any real pressing need to migrate off Heroku if you’re already successfully running your production app there.</p>

<p>I don’t want to come off like a Heroku shill here, so let me clarify why I’m pushing back against the hype and panic. It has nothing to do with Heroku’s bottom line or expensive servers. It has to do with your team’s time spent shipping useful features that will grow the value of your app and/or business.</p>

<p>Even in the best of circumstances and setups, migrating platforms takes time. It requires testing, planning, mapping, and careful execution to ensure that you’re not dropping traffic or upsetting customers along the way. It’s <em>work</em>. All of this work has opportunity cost: you <em>won’t</em> be building and shipping the features and enhancements that your customers want. You <em>won’t</em> be improving your application or business. At the end of the day, your customers don’t care how or where you host your app. They just want it to work and provide them value!</p>
<blockquote><p>Okay fine but give me an actual recommendation here?</p>
</blockquote>
<p>Sure. Deep breath. Let the panic subside: most applications currently running on Heroku <em>shouldn’t worry about migrating until next year</em> (2027) at the earliest. If you have an enterprise contract, you should renew it in 2026.</p>

<p><figure>
  <img alt="An AI-generated image depicting a simple block-lettered message as a pencil sketch on paper, reading: “don’t worry about migrating yet.”" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/44757617-96bc-4e18-0168-f5a2ae4c8700/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/44757617-96bc-4e18-0168-f5a2ae4c8700/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/44757617-96bc-4e18-0168-f5a2ae4c8700/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>You already chose Heroku, you’re already setup on Heroku, your app is already running <em>fine</em> on Heroku. You should try to capitalize on <em>those</em> gains as long as possible (especially if you have enterprise/discount pricing!). “Heroku isn’t going to get any new major features” doesn’t actually prevent you from realizing the value of your initial investment into “I want managed hosting I don’t have to worry about”. Moving to another PaaS would still satisfy the “I want managed hosting&hellip;” but the migration itself is an additional investment and cost that you simply don’t need to make yet. Take a deep breath and go build your app / business! That <em>does</em> reap value <em>today</em>.</p>

<p>😮‍💨</p>

<h2 id="looking-at-the-alternatives">Looking at the Alternatives</h2>

<p>Nonetheless, I know many readers are still going to queue up migrations in the coming months. Maybe that’s discomfort, simply having time available to migrate, or a bad taste in the mouth. I get it! Even as I wrote the paragraphs above I felt some of those same tensions. Honoring those thoughts (and knowing that the future will come eventually) it feels worthwhile to talk through some of the migration paths an existing Heroku app has ahead.</p>

<p>We’re going to evaluate each option in three primary lenses:</p>

<ul>
<li><strong>Migration effort</strong>: how painful it would be to migrate a full production Heroku app to this new setup</li>
<li><strong>Ongoing operational load</strong>: how it <em>feels</em> (subjectively) to use over time — things like CLI, “hop into prod console”, control and tweakability, etc.</li>
<li><strong>Cost structure</strong>: how expensive is this new setup compared to Heroku, and how is it billed differently?</li>
</ul>

<p>Then we’ll give our general take on each path outside of those three parameters. Today’s challengers:</p>

<ul>
<li>Render</li>
<li>Fly.io</li>
<li>Railway</li>
<li>Run-it-Yourself Systems</li>
</ul>

<p>But today’s look isn’t our one time “here’s the truth” post, it’s just a preview. We’ll give you our opinions here today based on our work integrating with most of these platforms and running various apps on them over the last three years, but we’re planning on going deeper in the coming months: Judoscale is going <a href="#judoscale-on-tour">on tour</a>. More on that below, but we’ll be moving our 3,000+RPS production app to each of these platforms to <em>really</em> feel out what it looks like for a production app that can’t go down!</p>

<h2 id="render-the-obvious-choice">Render: The Obvious Choice</h2>

<p><figure>
  <img alt="An AI-generated image of a simple sketch, the Heroku logo on the left, and an arrow in the middle pointing toward the Render logo on the right" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/6cabdc59-2557-4533-b412-54752a5ba900/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/6cabdc59-2557-4533-b412-54752a5ba900/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/6cabdc59-2557-4533-b412-54752a5ba900/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>If you asked me for a simple, single-sentence recommendation for most teams, it’s going to be Render. <em>Many</em> folks have described Render as, essentially, “the natural progression of Heroku” — perhaps what Heroku could’ve become had it never been acquired by Salesforce. I think this is mostly due to Render sharing many of the same philosophies as Heroku (fully managed PaaS, auto build detection, etc.) but just having been built fresh many years <em>after</em> Heroku: the Render devs had the chance to reimagine the Heroku UX from the ground up with plenty of Heroku experience to draw from.</p>

<p><strong>Migration effort</strong>. Any migration is going to take effort, but things are pretty smooth here. Heroku to Render is a <em>well</em>-trod path at this point and Render’s own team offers <a href="https://render.com/docs/migrate-from-heroku" target="_blank" rel="noopener">migration assistance</a> for those coming from Heroku. The mental model is broadly the same and you’ll feel at home within a few minutes of logging into the Render dashboard. The only gotcha to keep in mind is around buildpacks and system dependencies. Render does supply some base-level buildpacks that should cover most apps, but if your app requires specific system dependencies beyond their <a href="https://render.com/docs/native-runtimes#tools-and-utilities" target="_blank" rel="noopener">included set</a>, you may need to build out a Dockerfile. Where on Heroku buildpacks themselves can be composable, Render’s approach is simply, “stay on the rails or bring your own <code>Dockerfile</code>” (more <a href="https://render.com/docs/docker#docker-or-native-runtime" target="_blank" rel="noopener">here</a>). </p>

<p><strong>Ongoing operational load</strong>. Again here, this one’s going to feel just like Heroku. They handle the infrastructure, you just merge to <code>main</code>. Metrics and web dashboard UI are all friendly and available, logs can be pushed wherever you need, manual rollbacks are simple and accessible, there’s a broad CLI for control if you prefer that style, you can take your favorite <a href="/render">autoscaler</a> with you… the list goes on. Essentially everything you love about Heroku exists in Render in parallel or enhanced form.</p>

<p><strong>Cost structure</strong>. Of all that platforms and paths we’ll look at today, Render’s cost structure and setup matches Heroku’s the most. Like Heroku, their pricing revolves around pre-set, <a href="https://render.com/pricing#services" target="_blank" rel="noopener">per-month pricing</a> depending on which instance types (e.g. “dyno type”) you need. <em>Unlike</em> Heroku, they’re actually clear about how many vCPU cores you’re paying to hold (🎉). In terms of real cost, our rough estimate is that, depending on the composition of your app and resources you need, you’ll likely save 20-30% off your current Heroku bill for similar resources on Render.</p>

<p>Our general takeaway on Render is that it’s the right choice for the grand majority of currently-on-Heroku apps. It’s a near-seamless transition, the billing operates the same, the operational overhead for engineers learning the new platform is very low, and most apps will be able to get up-and-running within a day.</p>

<h2 id="fly-io-a-little-more-complicated-a-little-more-interesting">Fly.io: A Little More Complicated, A Little More Interesting</h2>

<p><figure>
  <img alt="An AI-generated image of a simple sketch, the Heroku logo on the left, and an arrow in the middle pointing toward the Fly.io logo on the right" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/5df41937-3257-41df-e17c-19462c7fc300/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/5df41937-3257-41df-e17c-19462c7fc300/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/5df41937-3257-41df-e17c-19462c7fc300/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>Still mostly on the high-level-PaaS layer, Fly.io was built to accomplish a different goal. Fly’s whole <em>thing</em> is distributing your app geographically so that your users will always hit an application server close-by, and doing so with “Fly machines” — micro VM’s with much smaller footprints than full-on Docker containers. Fly is also heavily optimized for its powerful CLI and config tooling. Fly is <em>tremendously</em> flexible and configurable, but comes with the cost of complexity: a steep learning curve!</p>

<p><strong>Migration effort</strong>. Like Render, Fly has written <a href="https://fly.io/docs/getting-started/migrate-from-heroku/" target="_blank" rel="noopener">guides</a> specifically for those migrating from Heroku, including framework specific guides in many cases (<a href="https://fly.io/docs/rails/getting-started/existing/" target="_blank" rel="noopener">Rails</a>, <a href="https://fly.io/docs/django/getting-started/existing/" target="_blank" rel="noopener">Django</a>, <a href="https://fly.io/docs/python/frameworks/fastapi/" target="_blank" rel="noopener">FastAPI</a>, <a href="https://fly.io/docs/python/frameworks/flask/" target="_blank" rel="noopener">Flask</a>, etc.) to help explain nuances. And these guides are certainly helpful, but there’s no getting around the paradigm shift: Fly is a fundamentally different platform from Heroku and doesn’t operate quite the same. There <em>is</em> going to be a learning lift as you get familiar with its UI tooling and <code>flyctl</code> CLI tool — the latter of which you <em>absolutely will</em> want to become highly familiar with.</p>

<p><strong>Ongoing operational load</strong>. Like other PaaS’s, Fly can absolutely be configured to do the simple deploy-on-<code>main</code> thing and includes built in metrics dashboards, logging basics, and standard machine health checks, but you’ll find a lot of utility in <code>flyctl</code>. Restarting instances, changing environment variables, spinning up secondary production instances&hellip; all simple <code>flyctl</code> commands once you learn them! If you’re not already a heavy terminal user, dive on in. Fly exposes more primitives and control around lower-level constructs than most PaaS’s (think: direct VM controls, volumes, storage, regions, etc) and most of that is controlled via <code>flyctl</code>. So there’s more flexibility, but again, a steeper learning curve. Oh, also, you can still take your favorite <a href="/fly">autoscaler</a> with you!</p>

<p><strong>Cost structure</strong>. Fly walks a sort of middle-ground between resource tier-based pricing and metered usage, which makes it easy to jump around to difference scale sizes, tweak your RAM levels, and scale vertically as needed. Prices are <a href="https://fly.io/docs/about/pricing/#started-fly-machines" target="_blank" rel="noopener">per second</a> of machine runtime, extra RAM can be added wherever you want (very cool), and Fly offers everyone a (massive) <a href="https://fly.io/docs/about/pricing/#machine-reservation-blocks" target="_blank" rel="noopener">40% discount</a> when you opt to pre-reserve compute time — no enterprise contract required. If that sounds like a lot of levers to pull and tweak, that’s because it is. Again, Fly’s schtick here is configurability.</p>

<p>My take: if you’re the kind of person that was driving an automatic Honda Civic and already felt for years like you just wanted more of a car-person’s kind of car, then it’s probably true that Heroku’s recent announcement didn’t change anything for you — your Civic is still a Civic. But it’s understandable that Salesforce has, in some way or another, shaken you into realizing your dream. If you’re after that ‘69 Big Block Mustang with a four-barrel carb that you can tune <em>juuuuust</em> right&hellip; then Fly might be for you. This metaphor may have gone too far. Fly is complex. There are neat value-adds with that complexity, but it comes at the cost of complexity — there’s more to learn, more to understand, and more to manage.</p>

<h2 id="railway-not-exactly-our-way">Railway: Not Exactly Our Way</h2>

<p><figure>
  <img alt="An AI-generated image of a simple sketch, the Heroku logo on the left, and an arrow in the middle pointing toward the Railway logo on the right" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/f08c1223-608c-43d6-ddab-c45f65abc100/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/f08c1223-608c-43d6-ddab-c45f65abc100/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/f08c1223-608c-43d6-ddab-c45f65abc100/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>We’re not out to bash any hosting providers, especially ones that <a href="/railway">we support</a> autoscaling on, but we also need to be honest: our experience with Railway has been pretty lackluster. All other bells and whistles aside, we had the worst actual system performance on Railway. Not because of dependent services or database latencies or anything like that, we just found our real, pure compute performance to be worse on Railway than any other platform. <strong>It was just plain slower</strong>.</p>

<p>We can’t tell you why that’s the case, and at the same time, we love that Railway’s schtick is running their own metal in datacenters rather than reselling metal they rent from the big three. That’s awesome! But we suspect that economies of scale are a relevant factor here.</p>

<p><strong>Overall</strong>, we would not recommend Railway at this time. We love the mission and the goal, but we had a less-than-great-time. For the sake of being positive-outlook community members, we’ll simply leave it at that!</p>

<p>Oh, and we <em>do</em> still plan on taking another full crack at Railway when we go <a href="#judoscale-on-tour">on tour</a> — see more below.</p>

<h2 id="the-more-hiy-stuff">The More HIY Stuff!</h2>

<p><figure>
  <img alt="An AI-generated image of a simple sketch, the Heroku logo on the left, and an arrow in the middle pointing toward a small rack of servers with the simple label “Your Servers” above them" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/fa4d1951-78f4-4102-1364-4f3e80321700/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/fa4d1951-78f4-4102-1364-4f3e80321700/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/fa4d1951-78f4-4102-1364-4f3e80321700/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>We live in a wonderful time of <em>options</em>! There are so many great options in the <strong>H</strong>ost-<strong>i</strong>t-<strong>Y</strong>ourself world, in many flavors, and at many levels of even hosting-it-yourself. The bring-your-own VPS tooling like <a href="https://dokku.com" target="_blank" rel="noopener">Dokku</a>, <a href="https://hatchbox.io" target="_blank" rel="noopener">HatchBox</a>, <a href="https://coolify.io/" target="_blank" rel="noopener">Coolify</a>, and <a href="https://caprover.com" target="_blank" rel="noopener">CapRover</a> offer lightweight PaaS-like experiences with great flexibility, each with their own distinct tradeoffs and workflows. Going even more complex, container orchestrators (e.g. they coordinate the Kubernetes for you) like <a href="https://northflank.com" target="_blank" rel="noopener">Northflank</a>, <a href="https://www.porter.run" target="_blank" rel="noopener">Porter</a>, and <a href="https://www.qovery.com" target="_blank" rel="noopener">Qovery</a> can allow you to “bring your own cloud” (be it your own metal, rented Hetzner boxes, or AWS API keys, etc.) while still handling most of the complexities of Kubernetes cluster orchestration for you. And, of course, the big world of AWS itself — “Hop onto ECS Fargate!” or “Elastic Beanstalk, baby!” among other choices. There’s truly never been so many ways to run the “Heroku experience” yourself!</p>

<p>Honestly, there are a <em>dizzying</em> number of ways to make the technologies at this level of hosting control work. For the sake of this article not turning into a book, we’re going to mostly leave them unmentioned here. The reality is that <strong>if you’ve been a happy Heroku customer, you shouldn’t go looking down this path</strong>. I know that’s a strong statement that might make a few of the “come to the DIY-side!” folks upset, but it’s a pragmatic truth. These are two wholly different worlds with different levels of time and skill involved. Going ‘down’ a single layer in the hosting stack (as we perceive it) and getting into <em>Fly’s</em> ecosystem is already going to add overhead to your workflow as you need to learn to understand and handle their config complexity. Going all the way down to the HIY tooling is only going to add more ops time (or people!) to your app’s needs. If you’re happy with your PaaS-level at Heroku, stay up there!</p>

<h2 id="the-real-answer">The Real Answer</h2>

<p>Let’s zoom out and take a deep breath. I still <em>fully</em> stand by my original sentiment above: Heroku isn’t going anywhere and will remain stable for years to come. There’s no urgency to move, and doing so will only detract from the hours you could be spending on your product itself at this point. Moving takes work. We can’t ignore that reality amidst the hype here.</p>

<p>Then, of course, conceding to those who are <em>for sure</em> going to move soon out of principle, spite, or otherwise disdain for Salesforce (which… I get), we covered some options. Render is the clearest, clean-cut, easy choice. Fly is more complex but more complicated. Railway isn’t recommended at the moment. Host-it-yourself and bring-your-own-cloud solutions are way more effort than a Heroku team should look at.</p>

<p>So&hellip;. move to Render and call it a day? <strong>Not exactly</strong>.</p>

<p>As your resident auto-scaling experts for the last decade, who have integrated deeply with and provide autoscaling services for nearly all of the platforms previously mentioned, we have some opinions.</p>

<p><figure>
  <img alt="" src="https://media2.giphy.com/media/v1.Y2lkPTc5MGI3NjExYno0cHd1cGt4b2VuYWZjZmZ1NmxiamQ3MDVydnc5YmV4YmQwb2MwZyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/LpkBAUDg53FI8xLmg1/giphy.gif">
  
</figure>
</p>

<p>But our opinions are from last year (or prior). And they’re based on integration work with Judoscale. And, who knows, they might just be wrong. So we’re going to do something that we haven’t seen done before: <strong>we’re going on tour</strong>.</p>

<h2 id="judoscale-on-tour">Judoscale On Tour</h2>

<p>As much as I wish that meant a music tour around the US with <a href="https://www.linkedin.com/posts/adamlogic_railsconf2025-activity-7349094411043037185-vWMQ" target="_blank" rel="noopener">our kazoos</a>, we actually hatched up a better idea. Judoscale is a 24/7 real-time reactive production application. We receive well over 3,000 RPS every moment of every day. Our downtime is <em>exceedingly</em> rare (generally only when Cloudflare or Heroku themselves have issues), but then, it darn well should be! We’re an auto-scaler! We <em>need</em> to be online, regardless of traffic load, so that we can reactively scale our clients’ applications correctly and appropriately any time of day.</p>

<p>Sounds like the perfect app to move to each of these platforms / services to test some things out.</p>

<p>To be clear: our “going on tour” means that we’re going to migrate the Judoscale production application, including all traffic, DNS, configs, background workers, etc, to each of Heroku’s competitors, one at a time, and document every step along the way for you all.</p>

<p><figure>
  <img alt="" src="https://media3.giphy.com/media/v1.Y2lkPTc5MGI3NjExbW9udHZvNjJuNTc5cHY3c2g0NW5iajQzbWZvM3F4aGxjZXpjZjEzNiZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/J0BRQ3cXBycPm/giphy.gif">
  
</figure>
</p>

<p>So, again, our real recommendation here is simply to hang tight on Heroku. We’re going to take the plunge for you (many times over) and move our real-time, high traffic application ourselves. We’re going to find the rough edges. We’re going to feel the performance bottlenecks. We’re going to foot the literal bill and feel the DX each of these new platforms provides compared to ol’ purple.</p>

<p>If that sounds exciting to you, make sure you subscribe to our newsletter below. We’ll start with a full breakdown of all the things we love and use on Heroku, which will set forth our rubric for how to evaluate other platforms.</p>
]]>
      </content:encoded>
    </item>
    <item>
      <title>Latency-based Celery Queues in Python</title>
      <description>If you plan your Celery task queues around latency, you'll have more predictable (and scalable) results. Learn how to plan your Python queues around latency!</description>
      <pubDate>Tue, 17 Feb 2026 00:00:00 +0000</pubDate>
      <link>https://judoscale.com/blog/latency-based-celery-queues-in-python</link>
      <guid>https://judoscale.com/blog/latency-based-celery-queues-in-python</guid>
      <author>Jeff Morhous</author>
      <content:encoded>
        <![CDATA[<p>If you’ve worked with Celery in production with real traffic, you’ve probably hit one of its many sharp edges. Maybe you’ve watched a simple background job silently pile up in an unmonitored queue.</p>

<p>Or maybe you’ve built out a tidy set of queues only to find your high-priority jobs are getting stuck behind slow (and unimportant) ones. Celery gives you powerful tools, but few guardrails.</p>

<p>These pain points usually stem from <strong>queue planning problems</strong>. Most teams slap labels like <code>high_priority</code> or <code>emails</code> on queues without defining what those mean.</p>

<p>If you plan your <a href="/blog/choose-python-task-queue">Python task queues</a> around latency, you&rsquo;ll have more predictable (and scalable) results. Ready to get started?</p>

<h2 id="the-basics-of-celery-queues">The basics of Celery Queues</h2>

<p>Before we get into queue planning, let’s clarify some Celery terminology. If you already have a great understanding of how Celery works, feel free to skip to the next section.</p>

<p><figure>
  <img alt="Celery queue diagram, showing a Celery queue, full of tasks, with worker processes" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/fc5eacfb-840c-4636-e0a6-e7a5b018cb00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/fc5eacfb-840c-4636-e0a6-e7a5b018cb00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/fc5eacfb-840c-4636-e0a6-e7a5b018cb00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<h3 id="celery-tasks">Celery tasks</h3>

<p>In Celery, a <strong>task</strong> is a single unit of work. For example, <code>send_email_task</code> might send a welcome email.</p>

<h3 id="celery-queues">Celery queues</h3>

<p>A <strong>queue</strong> in Celery refers to a named channel on the broker (like a Redis list or RabbitMQ queue) where tasks wait to be processed. By default, Celery uses a queue named <code>&quot;celery&quot;</code> (if you don’t specify one).</p>

<h3 id="celery-workers">Celery workers</h3>

<p>A <strong>worker</strong> is a Celery process that runs tasks. A worker can run multiple tasks concurrently, depending on its concurrency setting.</p>

<h3 id="celery-concurrency">Celery concurrency</h3>

<p><strong>Concurrency</strong> refers to the number of tasks a worker can process at the same time. In prefork mode, this is the number of child processes (often defaults to the number of OS-reported CPUs).</p>

<h3 id="decisions-you-have-to-make-when-using-celery">Decisions you have to make when using Celery</h3>

<p>In a typical deployment, you must decide <strong>how many queues</strong> to use and what they are called, <strong>which tasks go to each queue</strong>, and <strong>how many worker processes</strong> will consume each queue.</p>

<p>You also choose how many threads/processes each worker has (concurrency) and how many total containers to run (horizontal scaling). That’s a lot of decisions!</p>

<p>So let&rsquo;s dig into how you can make these decisions with scaling in mind.</p>

<h2 id="why-celery-queues-run-into-problems-at-scale">Why Celery queues run into problems at scale</h2>

<p>Out of the box, Celery will use a single queue (usually named <code>&quot;celery&quot;</code> by default). If a task doesn’t specify a queue, it goes to the default queue. If you start a worker without specifying <code>-Q</code>, it will consume the default queue. </p>

<p>Could you build an app with just one queue? <strong>Sure.</strong>  But please don&rsquo;t.</p>

<h3 id="not-every-task-is-created-equal">Not every task is created equal</h3>

<p>For a brand-new project, one queue might work fine for a short while. But very soon, you’ll encounter scenarios that push you to create additional queues:</p>

<ul>
<li>You have a task that needs to run <strong>quickly</strong> (a high-priority job), so you want it processed before other tasks.</li>
<li>You have a task that takes a long time to run (perhaps several seconds or minutes), and you want it to have <strong>lower priority</strong> or even separate handling so it doesn’t block faster tasks.</li>
</ul>

<p>In response, teams might eventually create ad-hoc queues like <code>&quot;urgent&quot;</code> for high priority and <code>&quot;low&quot;</code> for slow tasks.</p>

<h3 id="ambiguous-queue-names">Ambiguous queue names</h3>

<p>However, there’s a big problem. <strong>Those queue names are ambiguous</strong>.</p>

<p>How urgent is “urgent”? What does “low” mean, exactly? As your application grows, you’ll find there are varying degrees of priority. One developer might add <code>very_urgent</code> or <code>critical</code> queues; another might introduce a queue for a specific feature like <code>reports</code> or <code>emails</code>.</p>

<p>Before you know it, you have a <strong>sprawl of Celery queues</strong> without a clear hierarchy or expectations.</p>

<h2 id="latency-based-queues">Latency-based queues</h2>

<p>Take a step back and consider what metrics define the “health” of a task queue. Three key metrics are commonly used:</p>

<ul>
<li>Worker CPU: How taxed is the CPU for worker processes?</li>
<li>Queue depth: How many tasks are waiting in the queue (queue length).</li>
<li>Queue latency: How long a task waits in the queue before a worker starts processing it (sometimes called queue time).</li>
</ul>

<p>CPU can be used, but it doesn&rsquo;t actually tell everything about <em>the queue</em>. It simply gives an indication (and often a trailing indication) of the worker process during an individual task. And task queues often back up without spiking CPU at all, giving a false sense of worker health.</p>

<p>Queue depth is easy to visualize (a simple count of jobs), so many people focus on it. Queue depth can be very misleading. The number of tasks doesn’t tell you how <em>long</em> they’ll take to clear.</p>

<p>For example, imagine two queues, each handled by one worker process:</p>

<ul>
<li>Queue A has 10 jobs enqueued, and each job takes ~1 second to run.</li>
<li>Queue B has 10,000 jobs enqueued, but each job takes ~1 <em>millisecond</em> to run.</li>
</ul>

<p>Queue B might look “backed up” at a glancem, but in reality, both queues will finish their work in about 10 seconds. <strong>The <em>latency</em> (wait time) for jobs in both queues is the same ~10 seconds</strong>, which is the metric that truly matters.</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    ✅ Tip
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p><strong>Queue latency</strong> tells the real story about how well a queue is doing.</p>

  </div>
</div>

<p>So, is a 10-second wait time good or bad? <strong>It depends.</strong></p>

<p><figure>
  <img alt="It depends meme, showing Celery queue latency health is a complicated decision" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/b71aa293-84a1-4982-edee-567358874700/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/b71aa293-84a1-4982-edee-567358874700/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/b71aa293-84a1-4982-edee-567358874700/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>The acceptable latency for a queue is a business decision. It depends on what the tasks are doing and how quickly that work needs to begin. This brings us back to the notion of “urgency”, but now we can quantify it. Instead of calling a queue &ldquo;urgent&rdquo; in a vague sense, we decide what latency is acceptable for that queue’s tasks.</p>

<h2 id="latency-sla-queue-names">Latency SLA queue names</h2>

<p>If you&rsquo;re convinced <strong>queue latency</strong> is the right metric to measure performance, you should fix the ambiguity in your queue names. Naming your queues after their latency targets (SLAs) is a great way to set yourself up for success.</p>

<p>For example:</p>

<ul>
<li>“urgent” becomes <code>within_5_seconds</code> (tasks should start within 5 seconds)</li>
<li>“default” becomes <code>within_5_minutes</code> (tasks should start within 5 minutes)</li>
<li>“low” becomes <code>within_5_hours</code> (tasks should start within 5 hours)</li>
</ul>

<p>If I push a task to the <code>within_5_seconds</code> queue, I’m explicitly saying I expect that job to begin processing within five seconds. The name of the queue communicates the expectation.</p>

<p>You can choose whatever latency thresholds make sense for your app, the specifics aren’t as important as the explicitness of the naming.</p>

<p>By communicating latency expectations in the queue names, we get a few important things.</p>

<p>First, <strong>you&rsquo;ll end up with fewer queues</strong>. You’re far less likely to create a new queue per feature or whim. Almost every new task will fit into an existing latency category. This should remove the temptation of one-off queues that don&rsquo;t serve a strategic purpose.</p>

<p>Second, each queue now has a <strong>performance target</strong> (its name). This gives clarity for monitoring. If the <code>within_5_minutes</code> queue starts seeing 10-minute latencies, you have an unambiguous problem.</p>

<p>Of course, naming queues “within_X” doesn’t magically make tasks start within X time – <strong>you have to ensure enough worker capacity to meet those targets</strong>. That’s where scaling comes in.</p>

<p>Fortunately, this strategy makes it crazy easy to decide when to spin up more (or fewer) workers to scale, but we&rsquo;ll talk more about that later.</p>

<p><figure>
  <img alt="Diagram showing latency-based celery queues with different tasks in each queue" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/d67d1259-b77a-4f82-9f4a-bff4267fa800/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/d67d1259-b77a-4f82-9f4a-bff4267fa800/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/d67d1259-b77a-4f82-9f4a-bff4267fa800/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<h2 id="simple-ways-to-scale-celery-queues">Simple ways to scale Celery queues</h2>

<p>Typically, scaling a Celery worker pool is with the goal of avoiding a queue backlog.</p>

<p>Now that our queue names encode latency expectations, we can define a clear scaling goal for each queue:</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    ✅ Tip
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p>Each queue’s latency should stay within its target (as named), without having overprovisioned resources.</p>

  </div>
</div>

<p>For most people, traffic and job volumes fluctuate too much to maintain this manually. You’ll want to <strong>autoscale</strong> your workers based on queue latency. With autoscaling in place, meeting those latency targets becomes trivial.</p>

<p>When jobs start waiting too long, spin up more workers; when the queues are empty, spin them down.</p>

<p>For example, if the <code>within_5_seconds</code> queue’s jobs are waiting &gt;5 seconds, your autoscaler should add another worker process (or increase concurrency) for that queue. If the queue’s latency stays under 5 seconds, you can maybe scale down. We’ll talk about how to assign workers to queues next, which affects how you set up autoscaling triggers.</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-sky-50 dark:bg-sky-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-sky-900 dark:text-sky-400">
    👀 Note
  </h4>
  <div class="mt-2.5 text-sky-800 prose-a:text-sky-900 dark:prose-a:text-white prose-code:text-sky-400 dark:text-sky-200 dark:prose-code:text-gray-300">
    <p>Built-in autoscalers default to CPU usage for scaling. <a href="https://judoscale.com/python" target="_blank" rel="noopener">Judoscale</a> is a great autoscaler add-on that can scale your queues based on queue latency!</p>

  </div>
</div>

<p>Speaking of queue assignment, how should we split up queues across Celery workers? I have a few opinions!</p>

<h2 id="your-options-for-matching-workers-to-queues">Your options for matching workers to queues</h2>

<p>When it comes to queue-to-worker assignment, you have a couple of options. On one hand, you have <em>one set of workers pulling from all queues</em>. On the other hand, you have <em>dedicated workers for each queue</em>.</p>

<p>In between these two extremes, you might run some workers that each handle a subset of queues.</p>

<h3 id="running-a-single-worker-pool-for-all-queues">Running a single worker pool for all queues</h3>

<p>Running a single worker pool for all queues is the simplest setup. It’s resource-efficient since any free worker can work on any task, and you don’t need to worry about balancing workers between queues.</p>

<p><figure>
  <img alt="Diagram showing a single Celery worker pool consuming from multiple queues" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/34499f20-7103-4537-e2f0-7c7e38a83a00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/34499f20-7103-4537-e2f0-7c7e38a83a00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/34499f20-7103-4537-e2f0-7c7e38a83a00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>However, the downsides are significant. You risk <strong>long-running tasks blocking high-priority tasks</strong>, plus it’s harder to autoscale effectively for all latency goals at once.</p>

<p>For example, suppose one Celery worker (with concurrency 4) is consuming <code>within_5_seconds</code>, <code>within_5_minutes</code>, and <code>within_5_hours</code> queues. If it picks up several very slow <code>within_5_hours</code> tasks (say tasks that each take minutes to execute) on all its worker processes, and then a bunch of new <code>within_5_seconds</code> tasks arrive, those fast tasks <strong>can’t start until a process is free</strong>.</p>

<p>All processes are busy churning on slow jobs, so even though the <code>within_5_seconds</code> queue is the highest priority, it’s effectively blocked. This defeats the purpose of having a fast queue!</p>

<h3 id="dedicated-workers-per-queue">Dedicated workers per queue</h3>

<p>In this setup, each queue gets its own Celery worker process (or pool).</p>

<p>For example, you might start one set of workers with <code>-Q within_5_seconds</code>, another with <code>-Q within_5_minutes</code>, and so on. This <em>completely isolates</em> each latency tier.</p>

<p>The slow jobs in the 5-hour queue can never block the 5-second jobs, because they’re handled by different workers on possibly different machines.</p>

<p>Autoscaling becomes much cleaner because you can <strong>scale each worker deployment based on <em>that queue’s</em> latency threshold.</strong> The <code>within_5_minutes</code> workers only care about keeping that queue under 5 minutes latency, and if they’re idle, you can scale them down without affecting the queue time of unrelated queues.</p>

<p>The mental model is simpler, and each queue’s performance can be managed separately. The primary downside is the <strong>cost</strong> of running more separate processes.</p>

<p>The cost difference between one big worker vs. multiple smaller dedicated workers is often minor, and it’s far outweighed by the performance improvements. With dedicated per-queue workers, you also avoid starving out fast tasks with long-running ones.</p>

<h3 id="a-bit-of-both">A bit of both</h3>

<p>One strategy is to try to group certain queues together on workers and isolate others. For example, maybe combine the <code>within_5_seconds</code> and <code>within_5_minutes</code> queues on one worker type, but keep the <code>within_5_hours</code> queue separate.</p>

<p>While this can work, any time you put multiple latency tiers on one worker, you reintroduce the possibility of interference. It also complicates autoscaling (which latency do you scale on for that combined worker?).</p>

<h3 id="my-recommendation">My recommendation</h3>

<p>In summary, I <strong>recommend dedicated Celery workers per latency-based queue</strong>. It makes it straightforward to maintain each queue’s SLA.</p>

<p><figure>
  <img alt="Diagram showing Celery workers dedicated to their own queues" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/636416f4-eda8-403a-7a52-82e2c5e2fd00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/636416f4-eda8-403a-7a52-82e2c5e2fd00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/636416f4-eda8-403a-7a52-82e2c5e2fd00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>If you’re on an autoscaling platform, set each worker deployment to scale up whenever its queue latency exceeds the target. To mitigate the <em>potentially</em> higher resource usage of this setup, I also recommend autoscaling your lower-priority workers (5 minutes, 5 hours, etc.) down to zero when the queues are idle. (Of course Judoscale makes this super easy 😁.)</p>

<p>If you’re doing this manually, you still benefit from clarity: you can monitor each queue’s wait time and add resources accordingly without guessing which queue is starved.</p>

<p>You should also look into other ways to <a href="/blog/scaling-python-task-queues">effectively scale Python task queues</a>, like fanning out large jobs.</p>

<h2 id="one-thing-to-keep-in-mind-for-celery-queues">One thing to keep in mind for Celery queues</h2>

<p>One Celery-specific consideration that doesn&rsquo;t apply to every queuing system is task acknowledgment timing. By default, Celery acknowledges a task as &ldquo;received&rdquo; when a worker picks it up. If the worker crashes mid-task, that task is dropped.</p>

<p>Setting <code>acks_late=True</code> (either globally or per-task) delays acknowledgment until the task <em>completes</em>. This means crashed tasks get redelivered, but it also means <strong>your tasks need to be idempotent</strong>, since they might run more than once.</p>

<p>If you&rsquo;re using <code>acks_late</code> with Redis as your broker, pay attention to the <code>visibility_timeout</code> setting. This controls how long Redis waits before assuming a task was lost and redelivering it. The default is one hour. If you have tasks that need to run longer than your visibility timeout, they&rsquo;ll get redelivered while still running.</p>

<p>For latency-based queue planning, the practical advice is that tasks in your fast queues (like <code>within_5_seconds</code>, <code>within_5_minutes</code>) should be short enough that the visibility timeout is irrelevant. For your slow queue, make sure your longest-running tasks finish well under the visibility timeout, or increase the timeout accordingly.</p>

<h2 id="shipping-performant-celery-queues">Shipping performant Celery queues</h2>

<p>This opinionated guide for setting up your Celery queues is very much inspired by the <a href="/blog/planning-sidekiq-queues">strategies we know work well in the Sidekiq world</a>. I hope this gives you some fresh ideas and a solid game plan for taming your Celery queues.</p>

<p>Remember, planning your queues boils down to:</p>

<ul>
<li>Name queues by expected latency.</li>
<li>Isolate latency tiers on separate workers to avoid cross-interference.</li>
<li>Monitor and autoscale by latency.</li>
</ul>

<p>Follow these steps, and you’ll avoid most of the common background job headaches that plague teams as they scale up.</p>
]]>
      </content:encoded>
    </item>
    <item>
      <title>Node.js Hosting Options</title>
      <description>Choosing where to host a Node.js is a high stakes decision. This guide will show you how to pick the best hosting option for your app AND your team.</description>
      <pubDate>Wed, 4 Feb 2026 00:00:00 +0000</pubDate>
      <link>https://judoscale.com/blog/node-js-hosting-options</link>
      <guid>https://judoscale.com/blog/node-js-hosting-options</guid>
      <author>Jeff Morhous</author>
      <content:encoded>
        <![CDATA[<p>Choosing the right hosting environment for a Node.js application will define much of both your development workflow and application performance. The hosting option you choose directly affects the developer experience (how easy deployments and updates are), the cost model of running your app, its scalability under load, and how much control (and responsibility) you have over your infrastructure.</p>

<p>For example, a fully managed platform can eliminate server maintenance at the cost of less flexibility and more money, whereas running your own server gives maximum control but demands more operational work.</p>

<p>Your goal in deciding on where to host a node app is to align your hosting choice with your app’s <strong>technical requirements</strong> and your <strong>team’s capacity</strong> to manage the underlying infrastructure.</p>

<h2 id="different-types-of-node-apps-have-different-needs">Different types of Node apps have different needs</h2>

<p>APIs built with Node are stateless request/response services and are a good fit for most hosting models. A Node.js API can run on anything from a cheap VPS to serverless functions, since each request is independent and typically short-lived.</p>

<p>Real-time apps (like those with WebSockets), on the other hand, need persistent connections. Things like chat apps or live dashboards require hosting that supports long-lived network sockets. Traditional servers or container-based platforms are often necessary here as pure serverless platforms often don’t allow WebSockets or constant connections. For example, Vercel’s serverless functions cannot hold always-on WebSocket connections, but they do support WebSockets through their Edge Runtime.</p>

<p>Server-rendered apps (think Next.js) are certainly a special case. Frameworks like Next.js generate (most) pages server-side and often do well with serverless deployment. <strong>Next.js is tightly integrated with Vercel</strong>, which offers zero-configuration deployment, serverless functions for API routes, and edge caching for static assets. Many teams choose serverless platforms for these SSR apps to leverage features like automatic CDN distribution and on-demand scaling without managing servers. However, this serverless approach comes with tradeoffs in execution time limits and statefulness, which we’ll discuss later.</p>

<p>First, let&rsquo;s talk about the option that demands the most of you.</p>

<h2 id="hosting-node-apps-on-a-vps-or-similar-cloud-service">Hosting Node apps on a VPS (or similar cloud service)</h2>

<p>Running a Node.js app on a VPS (Virtual Private Server), Amazon EC2, or cloud virtual machine gives you <strong>maximum control</strong> over the environment. But with that comes maximum responsibility.</p>

<p>On a VPS, you get root access to install any OS packages, configure the stack exactly as you want, and run any background processes you need. This flexibility is powerful for custom setups, but the maintenance burden on you or your team is high. You are in charge of everything under the hood.</p>

<p>Applying OS security patches, monitoring disk and CPU usage, setting up firewalls, managing backups, and handling scaling manually are all things you should be prepared to manage if you go this route.</p>

<p>Using infrastructure-as-code and containers can ease some pain, but won’t eliminate ops work. Tools like <a href="/blog/kamal-vs-paas">Kamal can simplify deploying a containerized app</a> to a VPS. However, <strong>Kamal doesn’t handle the surrounding infrastructure needs</strong>. You still need to set up things like load balancers, databases with backups, log aggregation, and system monitoring yourself.</p>

<p>Containers help by packaging your Node.js app with its dependencies, making it portable and consistent across environments. But the VPS still needs to have everything the container needs. You’ll still be responsible for orchestrating containers, scaling them, and managing the host VM’s health.</p>

<p><figure>
  <img alt="Hosting a Node.js app on a VPS" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/ebf2bc38-68a6-455e-a28a-5d9eeac9a300/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/ebf2bc38-68a6-455e-a28a-5d9eeac9a300/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/ebf2bc38-68a6-455e-a28a-5d9eeac9a300/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>Hosting on a VPS or cloud VM is fine if you need fine-grained control or have specialized requirements that platforms don’t support. But it&rsquo;s not an option I can recommend unless you have a dedicated ops team (or you just really love that sort of thing). I&rsquo;ve hosted small projects on a VPS, and it&rsquo;s always been more headache than the cost savings I faced.</p>

<h2 id="hosting-your-node-app-on-a-paas">Hosting your node app on a PaaS</h2>

<p>Platform-as-a-Service (PaaS) offerings strike a middle ground by handling most infrastructure concerns while still letting you run a “server-like” app. Platforms like Heroku, Render, Amazon ECS Fargate, and Fly.io are PaaS leaders.</p>

<p>They allow you to push your Node.js code (via Git or container image) and then they build, run, and serve your application in a managed environment. Platforms abstract away the server (or VPS) management.</p>

<p><figure>
  <img alt="Hosting Node.js apps on a platform" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/aa8a522b-9748-44f4-fe05-6d4436b80a00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/aa8a522b-9748-44f4-fe05-6d4436b80a00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/aa8a522b-9748-44f4-fe05-6d4436b80a00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>Most platforms give you the option between using containers or not, so the above image could be even simpler, with you only managing the app itself.</p>

<p>With platforms, there&rsquo;s very little manual configuration and management. You get a deployment platform that automates scaling, security updates, and (some) monitoring, usually through a web dashboard or CLI. Developers can focus on code and let the platform handle the “ops” heavy lifting.</p>

<p>Using a PaaS still provides you with the flexibility to run long-running processes and <a href="/blog/node-task-queues">async job queues like BullMQ or Bee-Queue</a>, which are things that pure serverless platforms don’t support.</p>

<p>The general-purpose nature of PaaS means it doesn’t matter whether you’re deploying a frontend, a Node API, or a background worker. This makes platforms the best option for <em>most</em> Node apps.</p>

<p>You get persistent Node.js processes that can maintain state in memory, hold database connection pools, handle WebSocket connections, and even schedule cron jobs without worrying about hitting an execution timeout or some vendor constraint. Essentially, it offers the convenience of managed hosting <em>without the severe limitations on process lifespan</em> that come with serverless function environments. </p>

<p>You get a managed environment that dramatically reduces your operations overhead, but you <strong>keep quite a bit of control.</strong></p>

<p>But serverless <em>is</em> right for some apps! Let&rsquo;s look into that next.</p>

<h2 id="hosting-serverless-node-apps-on-vercel-or-netlify">Hosting serverless Node apps on Vercel or Netlify</h2>

<p>Serverless platforms like <strong>Vercel and Netlify</strong> have gained popularity, especially for frontend-oriented and Jamstack applications. Vercel hired much of the React core team away from Meta and has stewarded the development of both React and Next.js, which positions them well to support Next apps in particular.</p>

<p>In a serverless model, you don’t maintain a running server process. Instead, your Node.js code is deployed as functions that execute on demand in response to requests (or events) and then terminate. This model brings <strong>automatic scaling per request</strong> – every incoming request can spin up a new isolated function instance if needed, so capacity can increase seemingly without bound, and you never pay for idle time.</p>

<p>Vercel and Netlify both provide an experience where you connect a Git repo, and they build and deploy your site with serverless functions backing any dynamic endpoints or API routes. This gives a fantastic developer experience for certain use cases. Frontend-heavy apps get static hosting plus dynamic capabilities without ever thinking about servers, and things like CI/CD, CDN distribution, and SSL are handled for you out of the box.</p>

<p>I host my personal site and a few simple projects on Vercel and am quite happy with how hands-off it&rsquo;s made hosting. For my simple Next.js app, Vercel is a very good fit and also free.</p>

<p><figure>
  <img alt="Hosting a Node app on vercel" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/bfc0a270-5518-4a41-59e3-58061a143700/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/bfc0a270-5518-4a41-59e3-58061a143700/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/bfc0a270-5518-4a41-59e3-58061a143700/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>That being said, if I want to expand this application to include more functionality, I&rsquo;d probably run into some limitations.</p>

<p>The first major limitation is that <strong>serverless functions on these platforms have hard time limits.</strong> This means you cannot do long processing jobs directly. If your Node app needs to generate a large report or process a big file, you’ll likely exceed these limits and the platform will kill the function.</p>

<p>Long-running tasks have to be offloaded to external services or broken into much smaller jobs. But Vercel and Netlify do <strong>not allow running arbitrary background worker processes</strong>. You can’t have a worker listening to a queue or a scheduler that continuously runs in the background. “Background Functions” on Netlify simply allow a single function invocation to run longer (<a href="https://docs.netlify.com/build/functions/overview/#default-deployment-options" target="_blank" rel="noopener">up to 15 minutes</a>) asynchronously, but they are not equivalent to a always-on worker process.</p>

<p>Vercel recently introduced scheduled functions, which are cron-like triggers, but these are just periodic invocations of serverless functions, not persistent jobs. Any asynchronous or delayed work in a serverless architecture has to be handed off to another system (using an external job queue service, or triggering an AWS Lambda via event).</p>

<p>This is a fundamental design difference. Traditional platforms (like Heroku, Render, etc) let you run a worker indefinitely, whereas on Netlify/Vercel, you might schedule a function to run every few minutes, but it will start fresh and then terminate each time.</p>

<p>Both Vercel and Netlify abstract away containers and don’t let you deploy a custom Docker image to their platform. You are limited to the runtimes and languages they support and the build process they provide. While the support is often sufficient, the platform’s provided environment is the only environment. Vercel and Netlify focus on source-based deployment and static assets, not running arbitrary containers.</p>

<p>They are great at what they do (fast frontend deployments), but aren’t general-purpose hosting for any kind of app.</p>

<h2 id="autoscaling-a-node-app">Autoscaling a Node app</h2>

<p>Scalability is a big question for web developers, and different platforms scale Node apps in different ways. Understanding your autoscaling options and their implications for performance and cost matters a bit for choosing a host.</p>

<p>On traditional setups like VPS or self-managed servers, scaling is usually manual unless you build your own scripts or use cloud vendor tools to spin up new VMs. By contrast, PaaS platforms typically offer some form of horizontal autoscaling for Node apps, but the responsiveness to load can vary.</p>

<p>Heroku, for example, has a built-in autoscaler (available on certain tiers) that can add or remove dynos based on response time thresholds. The caveat with this metric is that they might react sluggishly or scale at the wrong times.</p>

<p>This is why third-party solutions like <a href="https://judoscale.com/node" target="_blank" rel="noopener">Judoscale</a> have emerged. Judoscale focuses on <strong>request queue time</strong> as the metric to decide scaling, which directly measures if requests are backing up due to a lack of capacity. </p>

<p><figure>
  <img alt="Scaling Node.js apps" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/08389d89-bccb-4f95-e8aa-987a06e35e00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/08389d89-bccb-4f95-e8aa-987a06e35e00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/08389d89-bccb-4f95-e8aa-987a06e35e00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>Judoscale will add more web processes as capacity demands it, and we also watch your job queues to autoscale worker processes. If you want reliable autoscaling on a PaaS, you want Judoscale.</p>

<h3 id="scaling-on-serverless-is-weird">Scaling on serverless is weird</h3>

<p>Serverless platforms scale very differently.</p>

<p>Essentially, they scale <em>per request by default</em>. There’s no “instance” for you to add.</p>

<p>Every incoming event will find capacity by the provider launching more copies of your function as necessary. This leads to effectively unlimited concurrency out of the box, which is great for absorbing traffic spikes without any configuration. The flip side is limited control over this scaling.</p>

<p>Normally, every request that comes in will result in a new Node.js runtime starting if the existing ones are all busy. This is an awesome way to ensure reliability in a scenario where you traffic increases quickly.</p>

<p>However, there are two big tradeoffs: cold starts and cost unpredictability.</p>

<p>When serverless scales, many of those new function invocations might incur a cold start delay (a few hundred milliseconds or more to initialize a Node environment). In a high-traffic scenario, you could have lots of functions cold-starting, which might cause latency for some requests. More importantly, from a cost perspective, serverless billing is usually metered by time and memory per execution, plus any external service calls (like database or bandwidth).</p>

<p>If you get 1000 concurrent requests frequently, you pay for 1000 function runs in parallel, which can add up quickly. I see <a href="https://x.com/mattwelter/status/1949850488654143932" target="_blank" rel="noopener">developers on X</a> and Reddit all the time complaining that their Vercel bills ballooned under heavy load.</p>

<p><figure>
  <img alt="A post on X complaining about a big increase in their Vercel bill" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/d18bd154-fcd2-4736-2ca7-1ec9ce343600/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/d18bd154-fcd2-4736-2ca7-1ec9ce343600/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/d18bd154-fcd2-4736-2ca7-1ec9ce343600/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>This isn’t to say serverless can’t be cost-effective. For super volatile but low-average traffic, it can be the cheapest option.</p>

<p>If you require tight control and predictability, a PaaS with the right autoscaling tool might be preferable. If you need to handle unpredictable surges and are okay with the stateless function model, serverless will do it out of the box. Just keep an eye on those usage metrics!</p>

<h2 id="picking-your-hosting-option-based-on-developer-experience">Picking your hosting option based on developer experience</h2>

<p>I&rsquo;ve thrown a bunch of information at you, but I don&rsquo;t want to make my opinion unclear.</p>

<p>I think you should prioritize developer experience. Whether you&rsquo;re trying to decide where to host a solo project or influence a decision for an enterprise, put real weight behind the developer cost that comes with the &ldquo;cheaper&rdquo; options.</p>

<p>Beyond that, the decision comes down to your application’s type and its traffic profile.</p>

<p>Ask yourself a few questions about your Node.js app:</p>

<p>Does your app require persistent connections or background processes? If it does, then a serverless platform (Vercel/Netlify) likely <em>won’t</em> serve you well. You’d <strong>lean towards a PaaS</strong> or even your own VPS if you&rsquo;re okay being pretty hands-on.</p>

<p>How much ops work are you (or your team) willing to take on? If you have a strong DevOps skillset or an ops team, hosting on VPS or some pure cloud solution might be a good fit. You’ll get full flexibility to tailor the environment and potentially save on high-volume costs by squeezing more out of each server. But if you’d rather <em>not</em> deal with server management, then PaaS or serverless is attractive.</p>

<p>What are your scaling and traffic patterns? For relatively steady, predictable traffic, it can be more cost-effective and simpler to run a fixed number of servers (or dynos) on a PaaS or VPS. You won’t get surprises in the bill, and you can ensure they’re always warm and performant. For spiky or highly variable traffic, serverless is an option.</p>

<p><strong>Choose the platform that fits the shape of your app and your team.</strong> For a typical web API or monolithic Node app that has a mix of web requests and background jobs, a PaaS will provide the least friction. If you’re building a highly interactive frontend-heavy app (especially with Next.js), deploying the frontend on Vercel or Netlify can be great for the static+serverless benefits, possibly complemented by a separate backend for any heavy lifting. </p>
]]>
      </content:encoded>
    </item>
    <item>
      <title>Choosing the Right Node.js Job Queue</title>
      <description>So you've got a Node.js app, and you know what needs to be passed off to a job queue. But do you know what job queuing system to use? Learn how to choose the right one for your needs..</description>
      <pubDate>Mon, 5 Jan 2026 00:00:00 +0000</pubDate>
      <link>https://judoscale.com/blog/node-task-queues</link>
      <guid>https://judoscale.com/blog/node-task-queues</guid>
      <author>Jeff Morhous</author>
      <content:encoded>
        <![CDATA[<p>Modern Node.js apps often need to perform background jobs. Offloading to a job queue is a great way to preserve web performance when faced with sections of code that are too slow or resource-intensive to handle during an HTTP request. If your app needs to send emails, generate PDFs, process images, or aggregate data, you probably need background jobs.</p>

<p>Offloading these jobs (sometimes called <em>tasks</em>) to a <strong>job queue</strong> ensures your web process remains responsive and keeps latency down. A typical setup is to have your web processes enqueue jobs to an external system, and one or more <strong>worker</strong> processes consume and execute those jobs asynchronously.</p>

<p><figure>
  <img alt="Node job queues" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/c6f68ade-abd2-48de-cfbf-fcc2b0f1b600/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/c6f68ade-abd2-48de-cfbf-fcc2b0f1b600/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/c6f68ade-abd2-48de-cfbf-fcc2b0f1b600/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>This works well for keeping your web processes free and performant.</p>

<p>So you&rsquo;ve got a Node.js app, and you know what needs to be passed off to a job queue. But do you know what job queueing system to use?</p>

<p>If you&rsquo;re looking for a quick answer, I won&rsquo;t make you wait. BullMQ is right most of the time. But let&rsquo;s take a look at our options!</p>

<h2 id="bull-and-bullmq-for-job-queues">Bull and BullMQ for job queues</h2>

<p><a href="https://bullmq.io/" target="_blank" rel="noopener">BullMQ</a> is definitely the <strong>most popular Node.js job queue</strong> (especially if you also consider Bull).</p>

<p>It is a powerful queue library backed by <em>Redis</em>, known for its high performance and rich feature set. Bull can process a large volume of jobs quickly by leveraging Redis and an efficient implementation under the hood.</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-sky-50 dark:bg-sky-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-sky-900 dark:text-sky-400">
    👀 Note
  </h4>
  <div class="mt-2.5 text-sky-800 prose-a:text-sky-900 dark:prose-a:text-white prose-code:text-sky-400 dark:text-sky-200 dark:prose-code:text-gray-300">
    <p><strong>Understanding Bull vs BullMQ:</strong> One really important thing to note is that <strong>Bull’s original library is now in maintenance mode</strong>. The authors have moved efforts to <strong>BullMQ</strong>, a modern TypeScript rewrite that will receive new features going forward.</p>

  </div>
</div>

<p>Jobs are persisted in Redis, so they won’t be lost if a worker crashes. Bull provides job persistence, automatic retries, error handling, and priority queues. Together, this gives you an unbeatable expectation of reliability.</p>

<p>BullMQ also supports multiple workers consuming the same queue, and you can configure concurrency (the number of jobs a single worker can process in parallel). This horizontal scaling ability means BullMQ can handle a lot of load and is also perfect for autoscaling, which we&rsquo;ll get into later.</p>

<p><figure>
  <img alt="Scaling BullMQ" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/e1b189e9-baab-4161-bb9b-e964f1757300/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/e1b189e9-baab-4161-bb9b-e964f1757300/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/e1b189e9-baab-4161-bb9b-e964f1757300/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>BullMQ is essentially a new (major) version of <a href="https://github.com/OptimalBits/bull" target="_blank" rel="noopener">Bull</a>, with mostly the same API and using Redis, but with improved internals. If you&rsquo;re already using Bull, that&rsquo;s fine. But if you&rsquo;re starting fresh, consider BullMQ so you get long-term support and benefit from the improvements.</p>

<p>Since they&rsquo;re Redis-based, Bull and BullMQ are naturally suited for modern web apps that may run across multiple processes. It&rsquo;s no surprise <a href="https://judoscale.com/blog/ultimate-guide-scaling-sidekiq" target="_blank" rel="noopener">Ruby&rsquo;s Sidekiq uses Redis too</a>.
All workers connect to the same Redis instance, so adding more worker processes (whether permanently or by autoscaling) increases the throughput of job processing. Jobs will be pulled by any available worker.</p>

<p>BullMQ includes mechanisms to detect stalled jobs, such as requeueing failed jobs. For most web applications, a single Redis-backed queue can coordinate dozens of workers reliably. If your app already uses Redis, BullMQ fits in nicely. If not, you&rsquo;ll need to introduce Redis just for the queue, which is probably a worthwhile tradeoff for the reliability it provides in most cases.</p>

<h2 id="bee-queue-for-job-queues">Bee-Queue for job queues</h2>

<p><a href="https://github.com/bee-queue/bee-queue" target="_blank" rel="noopener">Bee-Queue</a> is another popular Redis-backed job queue for Node. It&rsquo;s designed with a focus on simplicity and speed, inspired by the shortcomings of older libraries. Like BullMQ, Bee-Queue requires a Redis instance to operate, a common theme we&rsquo;ll continue to see.</p>

<p>Bee-Queue intentionally has a smaller feature set than BullMQ, trading breadth of features for low complexity and high performance. It gives us all of the core job queueing capabilities, but leaves out some of the advanced features of BullMQ.</p>

<p>This tradeoff is right for some people, as it&rsquo;s notably easier to get started.</p>

<p>The library’s API is relatively straightforward. You create a queue, define a job processor function, and enqueue jobs. My time reading Bee-Queue’s examples and documentation has been stress-free as they&rsquo;re very easy to understand. This can translate to faster initial setup and less overhead in learning the tool, something that&rsquo;s really underrated in medium-sized software projects.</p>

<p>Despite being lightweight, Bee-Queue does include essentials for production. You get persistence in Redis, job completion callbacks, and even rate limiting and retry logic. It supports job timeouts, retry attempts, and will handle <em>“stalled job”</em> detection.</p>

<p>What it lacks is some features of Bull and BullMQ, like built-in priority levels or repeatable (scheduled) jobs.</p>

<p>Multiple Bee-Queue worker processes can consume from the same queue even if they&rsquo;re on different machines, making scaling as simple as running more workers. This makes it a great fit for autoscaling scenarios.</p>

<p>In practice, you’d run one or more worker processes with Bee-Queue. If you need more throughput, just increase the number of workers, and jobs will be distributed across them. If you’re okay with using Redis (and most Node apps can add Redis via a managed service fairly easily), Bee-Queue provides a nice balance of <strong>simplicity and performance</strong>.</p>

<p>Still, it&rsquo;s been 2 years since the last release of Bee-Queue, and the lack of recent maintenance/development may put off a lot of developers.</p>

<h2 id="agenda-for-job-queues">Agenda for job queues</h2>

<p><a href="https://github.com/agenda/agenda" target="_blank" rel="noopener">Agenda</a> is a different breed of job queue for Node when compared to BullMQ and BeeQueue. It is primarily a job scheduler built on <a href="https://www.mongodb.com/" target="_blank" rel="noopener">MongoDB</a>, <em>not Redis!</em> It focuses on scheduling jobs (think cron jobs and delayed jobs), but it also supports immediate job queuing with concurrency control.</p>

<p>Agenda is a popular choice, especially for teams already using MongoDB, since it uses your MongoDB database to store job information. If I were in a project not already using MongoDB, this wouldn&rsquo;t be my first choice.</p>

<p>Agenda’s features overlap with BullMQ and Bee-Queue in some areas, but it has its own philosophy. Agenda stores jobs in a MongoDB collection, so if your application already uses MongoDB, you don’t need an extra infrastructure component for the queue. Jobs are persisted to the database, which ensures durability.</p>

<p>Agenda can also work with other databases (it supports a few Mongo-like interfaces), giving <em>some</em> flexibility in persistence. Still, it shines in scheduling future or recurring jobs. It offers a human-readable syntax (but still supports cron syntax) and the ability to schedule jobs at specific dates or intervals.</p>

<p>For example, you can schedule a job to run every day at 8 am, or run once a week, all using cron patterns or (close to) plain English. This makes Agenda ideal for background jobs that need to run on a schedule.</p>

<p>Agenda runs as a single process scheduler. It pulls jobs from Mongo and processes them in the same process. It does support concurrency (multiple jobs at once in one process) and can be scaled to multiple processes using MongoDB’s locking mechanism (to ensure two processes don’t run the same job).</p>

<p>However, scaling horizontally with Agenda is not as straightforward as with Redis queues. Agenda is generally single-master, meaning one instance should be scheduling to avoid duplicate scheduling of recurring jobs, though multiple workers can cooperate on different jobs. It&rsquo;s not impossible to scale horizontally, of course, but the path isn&rsquo;t as straightforward.</p>

<p>Agenda is probably best suited for applications that need cron-like scheduling and already use MongoDB. If you have a Node app in production that&rsquo;s already using Mongo, you can use Agenda to schedule jobs without introducing Redis. It’s great for things like daily reports, periodic cleanup jobs, or any job that must run X times a day/week without needing to support another infrastructure piece.</p>

<h2 id="using-a-message-broker-like-rabbitmq">Using a message broker like RabbitMQ</h2>

<p>Instead of using a Node-specific library, you can opt for a <strong>message broker service</strong> such as <a href="https://www.rabbitmq.com/" target="_blank" rel="noopener">RabbitMQ</a>, <a href="https://aws.amazon.com/sqs/" target="_blank" rel="noopener">Amazon SQS</a>, or <a href="https://docs.cloud.google.com/tasks/docs" target="_blank" rel="noopener">Google Cloud Tasks</a>. These are not Node.js libraries. They&rsquo;re external systems that Node can interface with through their APIs or client libraries.</p>

<p>For example, RabbitMQ is a robust open-source message queue that many large systems use. In a Node app, you might use a package to publish and consume messages from RabbitMQ.</p>

<p>The advantage of brokers like RabbitMQ is primarily reliability and advanced messaging patterns like acknowledgments and dead-letter queues.</p>

<p>Similarly, cloud services like AWS SQS or even Google Cloud Tasks are fully managed queues. They remove the need to run Redis or RabbitMQ yourself, which is attractive to a lot of people. These can scale virtually indefinitely and handle autoscaling scenarios by design.</p>

<p>The trade-off with using external cloud queues is that you’ll have to implement some features in your application code, like deciding how to schedule jobs or doing retries. Also, there’s a bit more latency as calls go over the network. Developer experience might not be as seamless as using a Node library, but if you prefer not to manage any infrastructure, they are a very reasonable option.</p>

<h2 id="autoscaling-your-workers">Autoscaling your workers</h2>

<p>Scaling Node job queues is a necessary part of running them in production. Offloading intensive jobs to queues doesn&rsquo;t do much for the performance of the queue processing itself, which isn&rsquo;t that performant.</p>

<p>There are two big levers you can pull to scale your Node job queues. <strong>Vertical scaling</strong> means using more powerful workers with more threads/processes. Meanwhile, <strong>horizontal scaling</strong> increases the number of worker processes or machines. Comprehensive solutions require attention to both.</p>

<p>As we talked about above, the major Node job queues support horizontal scaling without too much hassle, so it&rsquo;s worth putting some effort into. You can do this manually, but it&rsquo;s best practice to set up an autoscaler.</p>

<p>This lets you keep your hands off, adding worker processes when your existing processes can&rsquo;t keep up with demand, and removing them when demand allows, which saves you costs. Still, most autoscalers leave much to be desired. Heroku&rsquo;s autoscaler doesn&rsquo;t work for workers, and other major platforms that have support use CPU as the autoscaling metric, which is not an optimal way to measure demand on asynchronous worker processes.</p>

<p>Judoscale is a powerful autoscaler that you can add to most any hosting setup. The autoscaling algorithm <strong>scales based on queue latency</strong>, which is a much better indicator of queue well-being than CPU usage. If you&rsquo;re running a Node app in production, try <a href="https://judoscale.com/node" target="_blank" rel="noopener">Judoscale&rsquo;s free plan</a> to see if it&rsquo;s right for you.</p>

<h2 id="comparing-node-job-queue-options-and-making-a-decision">Comparing Node job queue options and making a decision</h2>

<p>My opinion here is somewhat controversial in that I think you should value developer experience <em>a lot</em> in your decision-making. That means <strong>using BullMQ</strong> unless you <em>really need</em> a ton of extra features, in which case use a message broker like RabbitMQ.</p>

<p>If your app environment already includes a certain datastore, leaning into that can simplify setup. For instance, if you use Redis, Bull or BullMQ will be straightforward to add. If you use MongoDB, Agenda might integrate more naturally. A solution that fits your existing stack usually means less friction for you, which I think you should place a premium on.</p>
]]>
      </content:encoded>
    </item>
    <item>
      <title>Black Box Hosting vs. Glass Box Hosting: An Interview With Judoscale's Adam</title>
      <description>Founder interview comparing Heroku vs Fly/Render/Railway for bootstrapped SaaS: cost, control, portability, third-party services, simple rules.</description>
      <pubDate>Fri, 2 Jan 2026 00:00:00 +0000</pubDate>
      <link>https://judoscale.com/blog/black-box-hosting-vs-glass-box-hosting-an-interview-with-adam</link>
      <guid>https://judoscale.com/blog/black-box-hosting-vs-glass-box-hosting-an-interview-with-adam</guid>
      <author>Jon Sully</author>
      <content:encoded>
        <![CDATA[<p>Greetings, Judoscale readers! While we usually write our posts as a team, I (Jon) wanted to take a novel approach this time around. I wanted to interview Adam, Judoscale’s founder and still the head of our tiny team, to get his outlook on the marketplace of hosting as we begin 2026.</p>

<p>The goal here wasn’t to host a cage match between the various PaaS vendors currently on the market. It was to setup a scenario:</p>
<blockquote><p>Let’s frame this conversation as a thought experiment: if you were starting a new startup today — something like Judoscale, but fresh — would you still choose Heroku? We’ll look at that decision through the lens of a founder building a real business, not a hobby app — meaning time to profitability, team velocity, cost structure, and technical tradeoffs all matter.</p>

<p>This isn’t a bashing session; I want to explore how the landscape has evolved and changed over the years, and what you might do today.</p>
</blockquote>
<p>Then simply chat through it. I think we ended up with some interesting and valuable insights at both the technical layer as well as the business-leadership layer (e.g. solo dev trying to start a profitable app).</p>

<p>That said, I didn’t want to post a typical back-and-forth style Q&amp;A article. Instead you’ll find concepts grouped together below, each with a little context beforehand. Enjoy!</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-sky-50 dark:bg-sky-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-sky-900 dark:text-sky-400">
    👀 Note
  </h4>
  <div class="mt-2.5 text-sky-800 prose-a:text-sky-900 dark:prose-a:text-white prose-code:text-sky-400 dark:text-sky-200 dark:prose-code:text-gray-300">
    <p></p>

<p>One last note before we dive in — one of my (Jon) express goals in this interview was to be deliberately antagonistic. In reality, Adam and I believe mostly the same things (sorry Adam, I’m still not sold on <a href="https://www.phlex.fun" target="_blank" rel="noopener">Phlex</a>&hellip;), but the goal was to tease out some reasoning by prodding and gentle pushing.</p>

  </div>
</div>

<p>Okay, let’s dive into this thing!</p>

<h2 id="the-black-box-dividend">The Black-Box Dividend</h2>

<p>Possibly the most important thing when spinning up a new bootstrapped business is actually <em>making money</em>. That is, getting your product running and live — providing value for people that are willing to pay for it — as soon as possible. When it comes to your application architecture and hosting, then, paved roads will get you to your destination faster than carving out your own from scratch.</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>So&hellip; you’ve mentioned before that Heroku can be thought of as a “black box”, where I think you’re describing the lack of fine-grain control that Heroku gives, right? When you started Judoscale back in 2016, what did the black box buy you — and would it still buy the same thing today?</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Heroku’s value was super simple: <code>git push</code> and you get a URL. No server naming exercises, no AMIs to patch, no cluster ceremony. Buildpacks detected my Rails app and just… did the right thing.</p>

<p>I was building a product nights and weekends; I didn’t want to think about deployment or scaling. The black box let me ignore everything that wasn’t shipping. If I were starting that same kind of small, bootstrapped SaaS today, the black box still buys the same thing: focus.</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Okay, but more specifically, what did it actually remove from your plate?</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Whole categories of work. TLS is handled. Rollbacks are boring and reliable. Runtime upgrades don’t feel like heart surgery. Logs show up where I expect them. Scaling from one dyno to a handful doesn’t require a new playbook. You <em>do</em> pay a tax for that, but you’re buying back <strong>time</strong>. For a solo dev or tiny team, that trade is almost always worth it early on. I just didn’t have that much time to spend.</p>

  </div>
</div>

<h2 id="the-glass-box-leverage">The Glass-Box Leverage</h2>

<p>Of course, here in 2026 the landscape isn’t simply Heroku vs. run-your-own-hardware-at-home. <a href="https://fly.io" target="_blank" rel="noopener">Fly</a>, <a href="https://render.com" target="_blank" rel="noopener">Render</a>, <a href="https://railway.com" target="_blank" rel="noopener">Railway</a>, and <em>several</em> other platform-based hosting services exist now. There’s competition! And there’s nuance. Many of these platforms are more open to complexity: bringing your own Docker images, choosing far more granular server resource tiers, and selecting geographical constraints, among <strong>so many</strong> other choices. That transparency (and complexity) can be good or bad.</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Let’s contrast the “black box” with the “glass box” — platforms that give you far more control and allow you to get inside the box and tweak things. Do you think these ‘glass box’ platforms can actually beat the black box?</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>I think the glass box is going to win if you really need portability and/or really specific resource granularity. Most of the glass-box options right now are built around, or at least support, Docker containers. Docker containers are sort of the common denominator between all of them. But that can be helpful because it means it’s easy to switch from one platform to another — you own the build script and take it with you. That leads to the second point. When you can switch providers fairly seamlessly, you can take advantage of whoever has the best price and/or resource tiers that your specific application needs. Just switch to another platform with your same Docker container and you’ll likely save some money.</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Buuuuuut what’s the price of that flexibility?</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Well, it’s more surface area. Images, volumes, networking, health checks — you own more of it. Day-2 operations take more intention. You can absolutely beat Heroku on cost and control, but you’ll pay for it in time. And everything I just described is probably all more time than I’d want to spend on production infrastructure when bootstrapping a new app. I have features I need to build for my customers! But it’s nice that these platforms and strategies are all available right now in case I did want, or need, to go that route.</p>

  </div>
</div>

<h2 id="unbundle-the-risk">Unbundle the Risk</h2>

<p>One thing I know Adam’s been a pretty big advocate for the last few years is using disparate third-party service providers <em>detached</em> from your hosting solution. So I wanted to dive into that here with a historical view: what he did previously vs. today.</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Okay, let’s pivot to auxiliary hosting tooling. If you were starting fresh again today, would you still use add-ons from a PaaS marketplace, or would you buy direct from vendors?</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>That one’s changed a bit over time. I now avoid marketplace add-ons whenever I can. Judoscale is a direct customer for almost all of our services: Sentry for exceptions, Scout for monitoring, BetterStack for logs and uptime, etc. Two reasons for that, really. First, it’s usually cheaper. Second, it’s portable. When our third party services are separated from our compute, we don’t have to worry about moving them when we move our compute.</p>

<p>Same with databases: I want a third-party provider, be it CrunchyData, PlanetScale, Tiger Data, etc. The teams behind those database services only care about their database services. It’s not a side-product for them. The UI’s, metrics, and controls are <em>way</em> better than the bolted-on database services offered by most hosting providers. </p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>But doesn’t adding a bunch of third-party providers and connection inevitably add a lot of complexity to your mental understanding of your app?</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>I think it tends to add an account and a connection string. But at the same time, it removes a migration nightmare if you should ever want to move your compute. If compute and data are decoupled, you can move one without detonating the other. I think that’s worth it.</p>

  </div>
</div>

<h2 id="on-leaving-the-black-box">On Leaving the Black Box</h2>

<p>We’ve covered some of the nuances of the “black box” and “glass box”, but I’m still curious what might drive people to actually migrate across the chasm, auxiliary services aside&hellip;</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Adam, what actually pushes people <em>off</em> Heroku?</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Granularity. The jump from a $50 dyno to a $250-ish dyno is harsh, and it’s often just to buy memory headroom. Fly/Render give you more intermediate steps. If you’re scaling on thin revenue—which is normal early on—it’s hard to justify that cliff. That’s the moment teams start looking over the fence.</p>

  </div>
</div>

<p>Interjecting here for a moment — Adam’s referencing the lack of options <em>between</em> Heroku’s <strong>std-2x</strong> dyno type and their <strong>perf-m</strong>. For many users, <strong>std-2x</strong> dynos lead to headaches when trying to process large files and/or data, while jumping to <strong>perf-m</strong> feels like overkill both in terms of capacity and cost.</p>

<p>If that’s something that resonates with you, we actually just published a strategy for getting the best of <em>both</em> worlds: <a href="/blog/priced-out-of-heroku">“Dealing With Heroku Memory Limits and Background Jobs”</a>.</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>So does that make Heroku the wrong choice?</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>No, it makes Heroku a great early choice and a question <strong>later</strong>. If you’re pre-revenue and traffic is modest, the black-box dividend (focus) is worth the tax. If you’re high-traffic/low-ARPU (Average Revenue Per User), the math flips fast. That’s when a glass-box platform’s pricing steps feel sane.</p>

  </div>
</div>

<h2 id="compute-is-commodity-dx-is-not">Compute Is Commodity; DX Is Not?</h2>

<p>One thing that all PaaS’s obviously have in common, regardless of what we call them or how we pay for them, is raw compute power. But how we developers can efficiently <em>leverage</em> that compute, and how fast we can do so might be a different question altogether.</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Do you care what the container is called (“dyno”, “machine”, “pod”, whatever) and/or how it’s built?</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Call it whatever&hellip; It’s all compute. What I care about is: how much work do I have to do to set up and maintain it?</p>

<p>Heroku’s buildpack approach is still a great default for Rails. Docker is great for portability — especially on platforms that want you to bring an image. All that to say, I don’t obsess over containers or their construction; I optimize for how much developer energy managing them consumes.</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Sure but it’s 2026 — many years after you started Judoscale. If you were starting again today, like we said, would you go Docker/Dockerfile from day one?</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Honestly I’m not sure. I really like the “cloud native buildpacks” that seem to be cropping up, and having moved Judoscale across Heroku, Render, Fly, Railway, and ECS, I’ll be the first to tell you that having a Docker file ready to go is <em>extremely</em> handy.</p>

<p>I’d probably recommend just keeping a Docker file ready even if you don’t use it. It feels like a good spare tire.</p>

  </div>
</div>

<h2 id="support-sales-and-the-human-stuff">Support, Sales, and the Human Stuff</h2>

<p>We’d be remiss to ignore the soft edges (support and sales) because they become hard edges during incidents and procurement.</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Any lingering frustrations with Heroku?</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Two. First is compute granularity, which we already covered. Second: <strong>support and enterprise sales</strong> have a reputation for being slow and not particularly helpful. We run a small team and prefer transparent, self-serve pricing; I don’t want to talk to sales to get a number&hellip; I don’t want an enterprise contract to just <em>use</em> the service. Anecdotally, other teams have had rough experiences there. It’s not a deal-breaker for a small shop on self-serve, but it’s part of the picture.</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Have you found better elsewhere?</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>I don’t have enough firsthand experience with Fly/Render support to compare. What I do know is that the <em>product</em> choices—granular compute, Docker-first—have reduced the number of times I’d need support in the first place.</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Fair point!</p>

  </div>
</div>

<h2 id="simple-rules-we-actually-use">Simple Rules We Actually Use</h2>

<p>Let’s start wrapping this whole thing up! I wanted to ask Adam to summarize some of the topics above into a straightforward path&hellip; <em>specifically</em> how he might go about starting Judoscale today if he was starting Judoscale today:</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Okay, let’s say that you were launching Judoscale again today: trying to bootstrap a real, profitable business from scratch, just you. What’s the plan?</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>My default choice is going to be to start on Heroku and optimize for time-to-first-dollar. I want to get the app built and delivering value as soon as possible, and I don’t want to waste time on infrastructure details. The only caveat there is if I <em>know</em> I’m going to have high traffic and thin margins from the start. In that case, I might choose Fly. Either way, the goal is to get to first-dollar <strong>fast</strong>.</p>

<p>Otherwise I’d unbundle my services: third-party, direct account for DB, logs, error-tracking, etc.</p>

<p>Finally, I’d take a strong stance of <a href="https://martinfowler.com/bliki/Yagni.html" target="_blank" rel="noopener">YAGNI</a> around most scaling and infra concerns. I wouldn’t build for scaling issues I don’t have yet — I’d flip on a simple autoscaler (like <a href="/">Judoscale</a>!) and move on to my next feature.</p>

<p>Oh, also, no Kubernetes. Hard line here. It’s way too much surface area and a waste of time for small teams just getting their footing.</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-indigo-50 dark:bg-indigo-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-right text-indigo-900 dark:text-indigo-500">
    🕵️‍♂️ Jon
  </h4>
  <div class="mt-2.5 text-indigo-800 prose-a:text-indigo-900 dark:prose-a:text-white prose-code:text-indigo-900 dark:text-gray-300 dark:text-indigo-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>That last one is going to ruffle some feathers.</p>

  </div>
</div>

<p><nbsp;></p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    👨‍💻 Adam
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>That’s fine. Complexity makes us feel important as developers. It also makes us slow. Keep it simple until reality—paying customers, not theoretical scale—forces your hand.</p>

  </div>
</div>

<h2 id="wrap-up">Wrap Up</h2>

<p>I started this interview assuming we’d land on a winner. I thought for sure, after all these years, Adam would still land on Heroku! But Adam nudged me to a better question: How much of the machine do you need to control <em>right now</em>? Early on, “black box” hosting buys momentum you can’t afford to lose. As traffic grows and dollars stay stubborn, “glass box” hosting might make the math worth looking at again&hellip; especially if you’re already unbundling other services and can spin up a Docker image quickly.</p>

<p>Anyway, thanks for joining us for this candid conversation with Adam, and we hope it lends some clarity as you navigate your own hosting choices and business journeys! As always, keep building and keep questioning, because sometimes the best answers come from challenging the assumptions we hold most dear.</p>

<p><em>Totally disagree with us? Think Adam’s way off base about something? Let us know over on Reddit, <a href="/">here</a>.</em></p>
]]>
      </content:encoded>
    </item>
    <item>
      <title>Process Utilization: How We Actually Track That</title>
      <description>Deep dive into Judoscale’s utilization autoscaling: sampling pitfalls, edge-based tracking, thread safety, and accurate low-overhead metrics.</description>
      <pubDate>Tue, 25 Nov 2025 00:00:00 +0000</pubDate>
      <link>https://judoscale.com/blog/process-utilization-in-rails-how-we-actually-track-that</link>
      <guid>https://judoscale.com/blog/process-utilization-in-rails-how-we-actually-track-that</guid>
      <author>Jon Sully</author>
      <content:encoded>
        <![CDATA[<p>Over the last few months we’ve published a couple of articles talking about our new “Utilization”-based autoscaling option. The first talked through the use-cases for this new option — when it’s useful and who it’s for (“<a href="/blog/introducing-proactive-autoscaling">Autoscaling: Proactive vs. Reactive</a>”). The second was a bit more nitty-gritty, explaining the high-level concept for how we’re tracking this ‘utilization’ metric (“<a href="/blog/how-utilization-works">How Judoscale&rsquo;s Utilization-Based Autoscaling Works</a>”)&hellip;</p>

<p>This post is the nerdy sequel to the latter: the actual boots-on-the-ground / nuts-and-bolts of how we attempted to track process utilization, how that proved to be a bad setup, and the clever idea that lead us to a <em>way better</em> v2. This is the story of low-level measurement with sampling, thread safety, and lackluster results leading to new ideas 😅. </p>

<h2 id="the-job-to-be-done">The job to be done</h2>

<p>As per our second post in this saga, our definition of ‘utilization’ is based around an idle-state. Paraphrased, it’s essentially:</p>
<blockquote><p>Measure the fraction of time a web-server process is handling at least one request, then aggregate that across all processes over time.</p>
</blockquote>
<p>Two constraints forced us to think carefully:</p>

<ol>
<li><strong>Extremely low overhead</strong>. Judoscale is a performance tool; it’s an autoscaler that’s intended to help your application soar. It is <em>not</em> something whose client code should impact your application! The Judoscale package should have a perceivably <em>invisible</em> performance impact on the app running it. Full stop. No compromises.</li>
<li><strong>Correct values in a multi-threaded world</strong>. While Ruby, Python, and Node can operate in an asynchronous fashion, and that asynchronosity <em>can</em> be valuable for serving many web requests at once, we need to be <em>very</em> careful in collecting values. It’s easy to accidentally collect <em>thread</em>-level metrics which then overlap and become <em>very</em> confusing. We need to be careful to stay up at the <em>process</em> level.</li>
</ol>

<p>So&hellip; now we need to actually write some code: how do you actually <em>capture</em> the idyllic “idle time” of a process in a real application receiving real traffic?</p>

<h2 id="attempt-1-background-sampling">Attempt 1: Background Sampling</h2>

<p>Our first proof-of-concept was built around running a mostly dormant background thread. It would essentially wake up every few hundred milliseconds, ask “is this process handling any requests right now?”, record that yes-or-no, then go back to sleep. Voilá: utilization!</p>

<p>It was easy to ship, but it had issues. Notably&hellip;</p>

<p><strong>Aliasing difficulties.</strong> Bursty traffic and short requests can fall between samples. Imagine a process that handles a flurry of 30–50 ms requests. With a 250 ms sample rate, many bursts are invisible; you under‑count busyness simply because you looked away at the wrong moments. Whoops!</p>

<p><strong>Jitter vs. overhead trade‑off.</strong> If we increased the sampling rate to reduce aliasing, we <em>immediately</em> hike CPU wakeups, heap churn, and lock contention (on every process, 24/7!) even when your app is idle. Oof ☹️</p>

<p><strong>Low signal‑to‑noise.</strong> Inherently, sampling produces a staircase approximation of a curve. Real utilization is a smooth “busy/idle timeline.” Our samples were a blurry thumbnail of a scene that actually mattered.</p>

<p>I personally tend to visualize this, oddly enough, as a mathematical curve on a chart (oh how my high-school math teacher would be proud). Imagine we have some <em>real</em> curve of data, perhaps like this:</p>

<p><figure>
  <img alt="Example chart with a curve going up and down in various sections, labeled “ACTUAL Data Over time”" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/09890d90-25a4-4f2c-fb63-f50e826f7100/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/09890d90-25a4-4f2c-fb63-f50e826f7100/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/09890d90-25a4-4f2c-fb63-f50e826f7100/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>Okay, great. Now let’s pretend we don’t actually know what that curve looks like and we’re taking a sampling-based approach to figuring it out. What we end up with is a bunch of samples. That might look like this:</p>

<p><figure>
  <img alt="Same example chart now shown without the original data curve and instead with a handful of sample-points (dots) that are spread out a bit; you can no longer see the nuance or details of the curvature as the samples are too far apart to have captured that curve in high detail" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/a5508055-f044-4205-206d-acaf7a6e6e00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/a5508055-f044-4205-206d-acaf7a6e6e00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/a5508055-f044-4205-206d-acaf7a6e6e00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>Which might be fine for some cases, but we’ve clearly lost several details from the original curve — the fast spikes and drops, in particular. Thus the issue of sampling rates is seen: sample too slowly relative to how fast your data <em>actually changes</em> and you won’t capture a high-detail image. Sample too quickly&hellip;</p>

<p><figure>
  <img alt="Same example chart now shown without the original data curve and instead with a ton of sample-points (dots) that are tightly packed and follow every detail of the original curve" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/53ce13f0-fb5c-47a2-484a-db2ade14be00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/53ce13f0-fb5c-47a2-484a-db2ade14be00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/53ce13f0-fb5c-47a2-484a-db2ade14be00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>You end up with a great representation of the curve, but you took up <em>way</em> too much horsepower constantly waking up and reading those samples. It’s hard for an app to actually serve its requests when the thread scheduler is <em>constantly</em> switching back to a background thread asking “HEY ARE YOU SERVING A REQUEST?!” (“I’M FREAKING TRYING TO, THANK YOU VERY MUCH!!!”).</p>

<p>When we’re talking about requests that might take 5ms, 50ms, or 150ms to fully handle and deliver, a sample rate of 250+ms just doesn’t capture the details. And a faster sample rate feels heavy-handed. This wasn’t going to work&hellip;</p>

<h2 id="attempt-2-event-edges-a-tiny-counter">Attempt #2: Event edges + a tiny counter</h2>

<p>Okay, to be fair, the line curve I gave above was a little disingenuous to the actual type of data we’re trying to track. Utilization, as we’ve defined it, isn’t a curve with smooth radii and roller-coaster-esque waves. As we’ve defined it, instantaneous utilization is either a zero or a one. A process is either busy, or it is not. If we were to plot that on a chart, it would actually look more like this:</p>

<p><figure>
  <img alt="Example chart where the line observed is not a curve but a straight line which jumps between 0 and 1 on the Y axis with straight, vertical jumps; more like a state-representation over time line" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/2b038993-ae8d-4a96-1ba7-6c558726f700/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/2b038993-ae8d-4a96-1ba7-6c558726f700/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/2b038993-ae8d-4a96-1ba7-6c558726f700/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>That is, a square wave representing a binary signal. Unfortunately, a square wave signal can actually make sampling results even <em>worse</em>. Check out how wrong an ill-timed sampling pattern can get:</p>

<p><figure>
  <img alt="Example chart similar to the above, a square wave line, but with sampling dots only landing on where the signal is in the ‘1’ / ‘on’ position, leaving the impression that the line is always 1" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/f699a679-8bc7-4897-3d26-5e9745fbff00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/f699a679-8bc7-4897-3d26-5e9745fbff00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/f699a679-8bc7-4897-3d26-5e9745fbff00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
            <figcaption class="text-center text-sm">
            I left the green line slightly opaque for reference
          </figcaption>

</figure>
</p>

<p>If you believed your sample data in that case, you’d think the signal is almost always “on”, but that’s not true.</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-sky-50 dark:bg-sky-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-sky-900 dark:text-sky-400">
    👀 Note
  </h4>
  <div class="mt-2.5 text-sky-800 prose-a:text-sky-900 dark:prose-a:text-white prose-code:text-sky-400 dark:text-sky-200 dark:prose-code:text-gray-300">
    <p></p>

<p>Fun math fact: the fewer <em>possible</em> points on a Y-axis there are, the worse the infrequent-sampling-effect (observing statistically incorrect data because you’re sampling too infrequently) can become. When your Y-axis range is just <code>0-1</code> you actually <em>need</em> to sample far more frequently to capture the binary signal with any real integrity. It’s much harder than a flowing curve!</p>

<p>If you’re curious for more of the math here, read up on Bernoulli distributions and binomial variance 🤓</p>

  </div>
</div>

<p>Anyway, the novel idea ended up being beautifully boring: don’t poll at all, just record state transitions cleverly. If we simply track the timestamps of when a process leaves and returns to idle, we can realize the real, true value of “how much time was it non-idle”? That looks like this:</p>

<p><figure>
  <img alt="Example chart showing the same square wave line now with arrows pointing to where the wave goes high or low, indicating “leaving idle” and “returning to idle”, respectively, and blue shading underneath the “busy” portions of the line: the boxes created when the line shifts up to ‘high’ state then back down to ‘low’ state" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/123fce1e-d6c6-4ad1-dff7-8ae55730f400/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/123fce1e-d6c6-4ad1-dff7-8ae55730f400/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/123fce1e-d6c6-4ad1-dff7-8ae55730f400/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>And once we have the blue blocks, we can simply add them all together for a given timespan, then say <code>active_time = (blue_block_total) / total_time</code>. Sum the rectangles! Boom!</p>

<h3 id="the-benefits-of-edge-tracking">The Benefits of Edge-Tracking</h3>

<p>Tracking the state-changes (we’ll call them “edges” for math’s sake) has some really fantastic benefits over polling.</p>

<ul>
<li><strong>Computational cost</strong>: instead of constantly waking up a thread to check in on current requests (which requires stack shifting, single-threaded locking switches, etc.), we instead can simply read and/or write against a process-global timestamp register when any request starts or ends.</li>
<li><strong>Correctness</strong>: instead of hoping a reasonable sample rate provides a decent guess at the actual curve being modeled, we instead know the <em>exact</em> amount of time that a given process is non-idle! There’s no guess. </li>
<li><strong>Reliable for all traffic shapes</strong>: Sudden request waves, thin bursts, long I/O waits — they all work. If a worker is non‑idle, it gets counted correctly and appropriately.</li>
</ul>

<p>Once we realized this route, we quickly understood that it was all upside. There’s no catch here! A purely better approach born of a realization that we’re tracking binary signals, not actual curves.</p>

<h2 id="let-s-see-some-code">Let’s See Some Code</h2>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    ✅ Tip
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>Just a note before we dive into the code: we developed our utilization-based tracking and scaling in Ruby first, so these examples are going to be in Ruby. But since this new approach is agnostic to any language specifics, we have the same implementations for Node and Python 🎉 it’s all the same when you’re just tracking edges!</p>

  </div>
</div>

<p>The great news with this new approach is that it’s so simple I can share the real code that implements it here in a blog post. This code is taken straight from the <a href="https://github.com/judoscale/judoscale-ruby" target="_blank" rel="noopener"><code>judoscale-ruby</code></a> Github repository, which houses all of the Ruby packages Judoscale publishes.</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-sky-50 dark:bg-sky-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-sky-900 dark:text-sky-400">
    👀 Note
  </h4>
  <div class="mt-2.5 text-sky-800 prose-a:text-sky-900 dark:prose-a:text-white prose-code:text-sky-400 dark:text-sky-200 dark:prose-code:text-gray-300">
    <p></p>

<p>One caveat in this code: while my diagram and example above focused on showing that we track “busy time”, our actual implementation is inverted: we track “idle time” rather than “busy time”.</p>

<p>Tracking “busy time” is slightly easier to grok (and build diagrams for!), but in reality our code does this:</p>

<p><figure>
  <img alt="Example chart showing the same square wave line now with no shading “under the boxes” as above, but instead with arrows pointing to the segments of the line that are in the ‘low’ state, highlighted as “Idle Time”" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/36ad2aeb-bc77-412e-9f42-043184c44f00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/36ad2aeb-bc77-412e-9f42-043184c44f00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/36ad2aeb-bc77-412e-9f42-043184c44f00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>It’s the inverse, so the math still all checks out, but understanding both “busy time” and “idle time” are useful for us! We just went with idle-side tracking for our code because it ended up slightly simpler. Check it out!</p>

  </div>
</div>

<p>First, we have a <a href="https://github.com/judoscale/judoscale-ruby/blob/c59a52025c4843506c915d85eb0f7c97f6d89d4a/judoscale-ruby/lib/judoscale/utilization_tracker.rb#L6" target="_blank" rel="noopener"><code>Judoscale::UtilizationTracker</code></a> class. It has a few methods and helpers in it, but the important parts start with the <code>incr</code> method (short for “increment”):</p>
<div class="highlight"><pre class="highlight ruby"><code><span class="k">module</span> <span class="nn">Judoscale</span>
  <span class="k">class</span> <span class="nc">UtilizationTracker</span>
    <span class="c1"># ...</span>
    <span class="k">def</span> <span class="nf">incr</span>
      <span class="vi">@mutex</span><span class="p">.</span><span class="nf">synchronize</span> <span class="k">do</span>
        <span class="k">if</span> <span class="vi">@active_request_counter</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="vi">@idle_started_at</span>
          <span class="c1"># We were idle and now we're not - add to total idle time</span>
          <span class="vi">@total_idle_time</span> <span class="o">+=</span> <span class="n">get_current_time</span> <span class="o">-</span> <span class="vi">@idle_started_at</span>
          <span class="vi">@idle_started_at</span> <span class="o">=</span> <span class="kp">nil</span>
        <span class="k">end</span>

        <span class="vi">@active_request_counter</span> <span class="o">+=</span> <span class="mi">1</span>
      <span class="k">end</span>
    <span class="k">end</span>
    <span class="c1"># ...</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div>
<p>First, keep in mind that this method is going to run every time a request <em>comes in</em> (starts). So, since we’re going to be incrementing a request counter and idle-time timer across multiple threads, we <em>do</em> need to use a simple Mutex (<code>@mutex</code> is simply a <code>Mutex.new</code> from the Ruby standard library). Once we’re certain that we can safely update our process-level variables, we need to do two things: mark that our “idle time” has ended, and increment our active-requests counter.</p>

<p>Pretty straightforward, there! Since this block may run as a multi-threaded application server picks up a request on thread #2 or #3, we’re careful to only end our “idle” timer if there aren’t <em>already</em> any requests being processed (<code>if @active_request_counter == 0</code>). </p>

<p>On the flip side, we have a <code>decr</code> method that runs every time a request <em>finishes</em> (ends):</p>
<div class="highlight"><pre class="highlight ruby"><code><span class="k">module</span> <span class="nn">Judoscale</span>
  <span class="k">class</span> <span class="nc">UtilizationTracker</span>
    <span class="c1"># ...</span>
    <span class="k">def</span> <span class="nf">decr</span>
      <span class="vi">@mutex</span><span class="p">.</span><span class="nf">synchronize</span> <span class="k">do</span>
        <span class="vi">@active_request_counter</span> <span class="o">-=</span> <span class="mi">1</span>

        <span class="k">if</span> <span class="vi">@active_request_counter</span> <span class="o">==</span> <span class="mi">0</span>
          <span class="c1"># We're now idle - start tracking idle time</span>
          <span class="vi">@idle_started_at</span> <span class="o">=</span> <span class="n">get_current_time</span>
        <span class="k">end</span>
      <span class="k">end</span>
    <span class="k">end</span>
    <span class="c1"># ...</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div>
<p>This one’s even simpler: decrement the count of active requests by one and, if that was the last request in flight, mark that our “idle time” has begun — the process is now idle!</p>

<p>The end result of these two functions working together is an accurate value stored into <code>@total_idle_time</code> which, in real time, tells us the number of milliseconds the process was idle.</p>

<p>The last piece of the puzzle, then, is to report that ratio and reset that variable/register! We do that in one last method on <code>Judoscale::UtilizationTracker</code>:</p>
<div class="highlight"><pre class="highlight ruby"><code><span class="k">module</span> <span class="nn">Judoscale</span>
  <span class="k">class</span> <span class="nc">UtilizationTracker</span>
    <span class="c1"># ...</span>
    <span class="k">def</span> <span class="nf">get_idle_ratio</span>
      <span class="vi">@mutex</span><span class="p">.</span><span class="nf">synchronize</span> <span class="k">do</span>
        <span class="n">total_report_cycle_time</span> <span class="o">=</span> <span class="n">current_time</span> <span class="o">-</span> <span class="vi">@report_cycle_started_at</span>

        <span class="c1"># Capture remaining idle time</span>
        <span class="k">if</span> <span class="vi">@idle_started_at</span>
          <span class="vi">@total_idle_time</span> <span class="o">+=</span> <span class="n">current_time</span> <span class="o">-</span> <span class="vi">@idle_started_at</span>
          <span class="vi">@idle_started_at</span> <span class="o">=</span> <span class="n">current_time</span>
        <span class="k">end</span>

        <span class="n">idle_ratio</span> <span class="o">=</span> <span class="vi">@total_idle_time</span> <span class="o">/</span> <span class="n">total_report_cycle_time</span>
        <span class="vi">@total_idle_time</span> <span class="o">=</span> <span class="mf">0.0</span>
        <span class="n">idle_ratio</span>
      <span class="k">end</span>
    <span class="k">end</span>
    <span class="c1"># ...</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div>
<p>Some background here: Judoscale packages report back to Judoscale servers every 10 seconds (using a zero-performance-impact background POST) with a handful of capacity metrics about the application. In this case, <code>@report_cycle_started_at</code> represents the timestamp at the <em>start</em> of that 10-second bucket. Since we’re trying to figure out the idle <em>ratio</em>, we need to divide the idle time over the total time. “The beginning of the bucket until now” is that “total time”.</p>

<p>Once we have that, we have a special case for when this code runs while the process is <em>actively</em> idle as to prevent over-counting or under-counting idle time. Since our “report cycle” observation window might start/end <em>during</em> an idle period, we need to handle that carefully. Visually, that’d look like this:</p>

<p><figure>
  <img alt="Example chart showing the same square wave line but now with two large rectangles over the whole line; both rectangles sharing an edge, showing that the first 10-second bucket “observation window” and the second, which share the same border in time, can leave an edge during an idle phase." loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/fc5aac73-5495-41cb-485e-ad2f4adb8b00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/fc5aac73-5495-41cb-485e-ad2f4adb8b00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/fc5aac73-5495-41cb-485e-ad2f4adb8b00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>Finally, we compute the idle ratio (a decimal, like <code>0.88</code> or <code>0.37</code>), reset the <code>@total_idle_time</code> back to <code>0.0</code>, and yield that idle ratio as the result. ✨</p>

<p>The last piece of code I’ll highlight is a layer up — the request middleware itself. This class, <a href="https://github.com/judoscale/judoscale-ruby/blob/c59a52025c4843506c915d85eb0f7c97f6d89d4a/judoscale-ruby/lib/judoscale/request_middleware.rb#L20" target="_blank" rel="noopener"><code>Judoscale::RequestMiddleware</code></a>, is essentially what wraps <em>every</em> Rack request before and after it’s handed down to the Rack application itself. I’m chopping out a lot here, but the bits pertinent to our discussion remain:</p>
<div class="highlight"><pre class="highlight ruby"><code><span class="k">module</span> <span class="nn">Judoscale</span>
  <span class="k">class</span> <span class="nc">RequestMiddleware</span>
    <span class="c1"># ...</span>
    <span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
      <span class="c1"># ...</span>
      <span class="n">tracker</span> <span class="o">=</span> <span class="no">UtilizationTracker</span><span class="p">.</span><span class="nf">instance</span> <span class="c1"># Singleton</span>
      <span class="n">tracker</span><span class="p">.</span><span class="nf">incr</span>

      <span class="c1"># ... lots of other code</span>

    <span class="k">ensure</span>
      <span class="n">tracker</span><span class="p">.</span><span class="nf">decr</span>
    <span class="k">end</span>
    <span class="c1"># ...</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div>
<p>Essentially we’ve created a two-part contract:</p>

<ol>
<li><em>Every</em> time a request starts, we guarantee we’re going to call <code>#incr</code> on the Process-level singleton instance of <code>UtilizationTracker</code></li>
<li><em>Every</em> time a request ends, regardless of how or why it ends, we guarantee we’re going to call <code>#decr</code> on that same singleton instance (thanks, <code>ensure</code>!)</li>
</ol>

<p>This is the glue that ensures our data inside of <code>UtilizationTracker</code> is consistent and accurate over the lifespan of the process. Isn’t it great?!</p>

<h2 id="aggregate-it-together">Aggregate It Together</h2>

<p>Zooming out a little bit, we’ll conclude the deep-dive with a sense of how the aggregation works beyond a single process. Let’s say that you’ve got 2 production web services/dynos/containers/etc. running, and each runs 4 web processes. Since each <em>process</em> POST’s back its own metrics every 10 seconds, that means our back-end is going to get 8 data-points about your application’s overall web-process idleness/busyness. Maybe for a given 10-second bucket Process #1 on server #1 showed an idle ratio of <code>0.66</code> (that is, it was idle for two-thirds of that 10-second window), while process #4 on server #2 read a ratio of <code>0.22</code> (meaning it was handling at least one request almost the whole bucket).</p>

<p>Once we have all of the data points, the aggregate is actually simple: we average them together. For example, then, if we received these data points:</p>

<table><thead>
<tr>
<th>Server</th>
<th>Process</th>
<th>Idle Ratio</th>
</tr>
</thead><tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0.56</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>0.77</td>
</tr>
<tr>
<td>1</td>
<td>3</td>
<td>0.48</td>
</tr>
<tr>
<td>1</td>
<td>4</td>
<td>0.39</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>0.81</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>0.44</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>0.52</td>
</tr>
<tr>
<td>2</td>
<td>4</td>
<td>0.62</td>
</tr>
</tbody></table>

<p>For that bucket, our average idle ratio would be:</p>
<div class="highlight"><pre class="highlight plaintext"><code>(0.56 + 0.77 + 0.48 + 0.39 + 0.81 + 0.44 + 0.52 + 0.62)/8
</code></pre></div>
<p>Which is <code>0.57</code>. So then, that application was idle 57% of the time (for that bucket) and, inversely, busy 43% of the time. Thus, that’d be a 43% utilization metric for that bucket, as we’ve defined it. Gathered, collected, and aggregated simply.</p>

<h2 id="wrapping-it-up">Wrapping It Up</h2>

<p>If there’s a theme to this little blog-post saga, it’s that the simplest model that matches reality tends to win. We started by trying to <em>guess</em> at busyness with background sampling, only to discover all the usual traps: aliasing, jitter, and overhead. Then we reframed the problem to match the truth on the ground: a process is either idle or it isn’t. Record the edges. Sum the rectangles. Report the ratio. Done.</p>

<p>That shift gave us three things you actually feel in production: lower overhead, correctness across weird traffic shapes (long I/O, tiny bursts, mixed workloads), and numbers you can trust enough to automate against. When an autoscaler acts on a metric, the worst feeling in the world is, “ehh, it’s <em>probably</em> fine.” Edge-tracking turns “probably” into confidence.</p>

<p>And the aggregation story is intentionally boring, too. Each process tells us how idle it was in the last 10 seconds; we average those into an application-level picture. No fancy weighting, no black-box magic. If your fleet spends 57% of a bucket idle, that’s 43% utilized. That’s a number you can reason about, chart, alert on, and scale from.</p>

<p>So if you’ve been skeptical of utilization-based autoscaling because it felt hand-wavey or weird, we hope this demystifies it. The implementation is small on purpose, tested in the sharp edges of real apps (including our own!), and designed to vanish into the background until you need it. Watch your utilization settle into patterns you recognize, set the thresholds that reflect your own tolerance for headroom vs. cost, then enable utilization autoscaling.</p>

<p>In other words: measure what matters, measure it honestly, and keep the math simple enough that you’ll actually use it.</p>
]]>
      </content:encoded>
    </item>
    <item>
      <title>Scaling Sideways: Why You Might Want To Run Two Production Apps</title>
      <description>Learn how running a second Rails app by subdomain can cut p95 latency, stabilize SEO-facing pages, and offload slow endpoints safely.</description>
      <pubDate>Wed, 5 Nov 2025 00:00:00 +0000</pubDate>
      <link>https://judoscale.com/blog/scaling-sideways-why-you-might-want-to-run-two-production-apps</link>
      <guid>https://judoscale.com/blog/scaling-sideways-why-you-might-want-to-run-two-production-apps</guid>
      <author>Jon Sully</author>
      <content:encoded>
        <![CDATA[<blockquote><p>We’re really trying to optimize for our public website’s performance for SEO reasons&hellip;</p>
</blockquote>
<p>&hellip;was the core theme of our meetings with one of our customers a few weeks ago. They run a Rails application with several different ‘sectors’ — a public website, two different user portals, and an admin ‘backend’ with several internal tools. It’s not an extremely <em>complex</em> application, but it is diverse in its traffic. After chatting with them for a few hours, we had a great solution ready for them — one we use ourselves but feel isn’t talked about enough! Running a second prod app.</p>

<p><figure>
  <img alt="A simple diagram showing two boxes and an arrow between them, the first being “prod”, the second being labeled “Also prod?” And the title “Scaling Sideways” above both" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/3c1991f7-9a6b-4fc8-ef8f-96638358ba00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/3c1991f7-9a6b-4fc8-ef8f-96638358ba00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/3c1991f7-9a6b-4fc8-ef8f-96638358ba00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-sky-50 dark:bg-sky-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-sky-900 dark:text-sky-400">
    👀 Note
  </h4>
  <div class="mt-2.5 text-sky-800 prose-a:text-sky-900 dark:prose-a:text-white prose-code:text-sky-400 dark:text-sky-200 dark:prose-code:text-gray-300">
    <p></p>

<p>Did you know that we love meeting and chatting performance, strategies, and scaling? Whether you’re a Judoscale customer or not, we’d love to hop on a call, screen-share, or whatever, and chat it out — just <a href="/chat">set up a call</a> with us! Totally free.</p>

  </div>
</div>

<p>We’re going to dive into that story and our clever suggestions for scaling sideways, but before we do, let’s clarify some terms so this doesn’t all become terribly confusing! We’ll use “<strong>main app</strong>” to describe the existing, single production application instance. We’ll then use “<strong>second app</strong>” to describe the new, separate clone of the main app — an instance still running all of the production app code (with all the same environment configs, etc.) but which is separate (more on that in a moment). Alright, let’s dive in!</p>

<h2 id="what-we-re-solving-for-here">What We’re Solving For Here</h2>

<p>This particular customer has a very SEO-driven business. That means that their public website, which is served by their core Rails application, needs to be excellent: fast, steady, predictable, burst-ready. But the app houses several other sectors which are older, slower, and less performance-friendly — we all have ’em!</p>

<p><figure>
  <img alt="A diagram of the customer app showing each ‘sector’ as its own box with emojis representing the sort of desired speed of each sector; freight truck for “internal tools”, typical consumer cars for “user portal”, and a race car for “public website”." loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/5b85df7a-094c-4b34-dd25-66a3c9b81800/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/5b85df7a-094c-4b34-dd25-66a3c9b81800/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/5b85df7a-094c-4b34-dd25-66a3c9b81800/public&quot;}" :src="src" x-intersect="src = fullResSrc">
            <figcaption class="text-center text-sm">
            We see you, Google!
          </figcaption>

</figure>
</p>

<p>Unfortunately, in a multi-threaded world (hello, Puma), those slower endpoints don’t just take longer for the people who hit them; they raise the waterline for everyone by occupying threads that subsequent would-be-faster requests must wait on. The result is a p50 (average) request time that looks pretty reasonable&hellip; but a p95 that’s much worse and erratic. Oh, and a support channel that pings for performance issues when there seemingly aren’t any.</p>

<p>From a telemetry and metrics standpoint, we’ve seen this issue plenty of times: CPU saturation is nonexistent and database resources look boring, but request queue time (<a href="/blog/request-queue-time">the metric that matters</a>) spikes randomly and p95s are all over. In the case of our customer, it’s not that their public website got slower, per se; it’s that the requests for those public site pages had to <em>wait</em>. Thus we’ve met an old truth: multi-threading increases throughput but amplifies latency (something we dissected in <a href="/blog/puma-default-threads-changed">“Why Did Rails&rsquo; Puma Config Change?!”</a>). Boil it way down and it’s hosting costs vs. p95s.</p>

<p><figure>
  <img alt="A screenshot of a chart showing spiky, erratic p95 response times while the average is much lower" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/92ea22fb-3780-42bc-c297-09cad08ee800/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/92ea22fb-3780-42bc-c297-09cad08ee800/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/92ea22fb-3780-42bc-c297-09cad08ee800/public&quot;}" :src="src" x-intersect="src = fullResSrc">
            <figcaption class="text-center text-sm">
            Spiky p95&rsquo;s and a WAY lower p50/average
          </figcaption>

</figure>
</p>

<p>But the reality for this customer is that they needed to tame and stabilize their p95 response times for their public website. Appeasing the finicky beast that is Google Search Ranking is a broadly unknown game, but stable performance does seem to be a factor.</p>

<p>The good news here is that we’ve got a creative solution. We call it “scaling sideways” — slightly different than ‘horizontal scaling’, yet still horizontal in concept: running a second, but subdomain-separated, instance of your production application.</p>

<h2 id="scaling-sideways">Scaling Sideways</h2>

<p>Let’s expand on the specifics of this strategy, since “scaling” can be a bit of an overloaded term. What we’re describing here isn’t “scaling” in the sense we’re likely all used to these days: changing the number of webserver or worker instances your production application is running at any given time (the core premise of <a href="/">Judoscale</a> itself). Instead we’re talking about “scaling” in a much slower and more methodical approach: running a second production application, which is essentially a clone of the main app, on a separate subdomain with separate infrastructure. It’s still the same code-base, same deployment branch, and really should have all of the same environment and configuration variables&hellip; just a different place to request the same data and/or pages.</p>

<p><figure>
  <img alt="A somewhat complex diagram showing two application servers, both powered by the same underlying dependency services (e.g. databases and file service providers), both deployed from the same code repo and branch, but on different subdomains" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/d8e4bd8e-a658-499f-cb8e-4583f6115600/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/d8e4bd8e-a658-499f-cb8e-4583f6115600/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/d8e4bd8e-a658-499f-cb8e-4583f6115600/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>The key to this strategy is offloading traffic to slower and less consistent endpoints to the second app (via its subdomain) so that your main app can handle its own traffic more consistently and quickly. The main app becomes the home for predictable, latency-sensitive endpoints; the second app absorbs the messier stuff without letting it bleed into the public experience.</p>

<p>Luckily, we don’t need a microservices migration plan to do this. We’re not decomposing the domain model; we’re just decomposing our runtime. One deliberate split is enough: the fast path (main) and the heavy/volatile path (second). The payoff is that your main app’s thread pool stops babysitting slow requests and blocking higher-priority endpoints. Queue time stabilizes. Tails compress. (&hellip;Engineers stop arguing about whether going single-threaded everywhere is “worth it.”)</p>

<h2 id="when-is-it-the-right-move">When Is It The Right Move?</h2>

<p>We should recognize first that this strategy isn’t perfect for every application. It shines when at least one of a few conditions are true:</p>

<p><figure>
  <img alt="A visual depiction of the four cases given below" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/262f9728-c54f-4942-f6d5-6efe01c19100/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/262f9728-c54f-4942-f6d5-6efe01c19100/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/262f9728-c54f-4942-f6d5-6efe01c19100/public&quot;}" :src="src" x-intersect="src = fullResSrc">
            <figcaption class="text-center text-sm">
            Really channeling my inner XKCD here&hellip;
          </figcaption>

</figure>
</p>

<p><strong>Your traffic has distinct “shapes.”</strong> If one slice of your app is bursty, slow, or just unpredictable (admin pages, CSV exports, report builders, portals, ‘real time’ (polling) dashboards), while another slice must feel instant and boring (marketing site, signup flow, product pages), you’re a great candidate. Sideways scaling lets you build a fast-lane for the steady stuff and a truck/carpool-lane (or two) for everything else.</p>

<p><strong>You have different SLAs for different routes.</strong> Some requests just matter more. If a public route missing its p95 target is business-critical (SEO, ad landing pages, checkout, conversions), prioritize it on the main app and give it a calmer thread pool. If an authenticated portal can tolerate higher p95s without harming KPAs or other business targets, move it to the second app.</p>

<p><strong>You can influence where traffic goes.</strong> This sounds obvious, but you need a lever. Many teams already have it: front-end <code>fetch()</code> calls, Turbo Frames/Streams, HTMX targets, or API clients you control. If you can change hostnames in those calls, you can steer traffic to the second app with minimal risk and no user-visible disruption. Especially if these calls are transparent to a browser’s address-bar.</p>

<p><strong>SEO is part of the story.</strong> If Google’s crawlers matter a great deal to your business, you might consider splitting your public site from your other application chunks. Instead of the classic “let’s just rewrite the marketing site to static”, you get a lot of the benefits of a dedicated marketing site system (the main app) while retaining all of the comforts of a unified code base and singular mental/domain model.</p>

<h2 id="judoscale-does-it-too">Judoscale Does It, Too!</h2>

<p>As it turns out, <a href="/">Judoscale</a> itself satisfies three of those bolded conditions above. The Judoscale architecture is built around customers installing the Judoscale <a href="https://github.com/judoscale" target="_blank" rel="noopener">package</a>, which is essentially just a light-weight monitor for request and job queues within the app. Those metrics ultimately get POST’ed back to Judoscale servers for processing and aggregation. Nice! But those POST’s happen every ten seconds for every <em>process</em> over thousands of applications. We have a <em>ton</em> of API traffic. As in, 3000-3500 requests <em>per second</em> 24/7.</p>

<p>Then, of course, there’s the Judoscale dashboard and user UI where you can see your metric charts, tune your scaling configuration, and do standard SaaS things. While those charts <em>do</em> have automatic 10-second update polling built-in, the traffic for that entire sector of the app trends much closer to about 50 RPS.</p>

<p>So&hellip; we (1) definitely have different ‘shapes’ — our API traffic is tiny payload and ultra-fast response whereas our dashboard traffic is small-to-medium payload and variable response. Additionally, we (2) definitely have different SLA’s for these two shapes. Our API needs to be available, but response times can fluctuate (there&rsquo;s no human waiting)&hellip; whereas our dashboard needs to be as fast as possible since it&rsquo;s customer-facing. Finally, we (3) <em>can</em> control where the majority of our traffic goes by tweaking the client packages to POST somewhere else (and/or some smart routing with Cloudflare).</p>

<p><figure>
  <img alt="A diagram showing a high level split of Judoscale’s two applications; the second app handling the massive volume of API traffic" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/81e7f776-9332-46bd-7ad2-fa6ddf8b7100/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/81e7f776-9332-46bd-7ad2-fa6ddf8b7100/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/81e7f776-9332-46bd-7ad2-fa6ddf8b7100/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>We’ll get to the implementation specifics below, but hopefully this gives you an idea of the versatility of scaling sideways: applications completely non-SEO focused can still benefit <em>greatly</em> from segmenting traffic in this style. </p>

<h2 id="how-you-actually-do-it">How You Actually Do It</h2>

<p>Spin up a clone of your main prod app. Same repo, same deploy pipeline, same environment variables (with a couple exceptions we’ll note). Point it at a sibling subdomain — <code>ww2.example.com</code>, <code>api2.example.com</code>, or simply <code>2.example.com</code> all work. The goal is sameness: both apps should boot the same code and talk to the same primary dependencies (database, cache, storage, queue, file storage [S3 et. al]). Differences should be intentional and minimal: web process counts, thread counts, and possibly instance sizes.</p>

<p>From there:</p>

<ol>
<li><strong>DNS &amp; routing.</strong> Create the new subdomain and point it to the second app’s router/load balancer/DNS target.</li>
<li><strong>Environment parity.</strong> Duplicate secrets and env vars (including <code>SECRET_KEY_BASE</code>/equivalents so session cookies work across hosts if necessary — more on this below). Consider different Puma thread counts between apps (more on this below too!).</li>
<li><strong>Traffic split.</strong> Start by moving non-navigational traffic: API calls from your front-end, background polling, Turbo Frames/Streams targets. These won’t change the URL in the address bar, so the move is low-risk.</li>
<li><strong>Progressively offload.</strong> Next, migrate heavier, authenticated pages and long-running endpoints to the second app. Be deliberate around what addresses users might see in their browser’s address bar!</li>
<li><strong>SEO guardrails.</strong> Add canonicals on anything public your second app might serve, ensure robots blocking is in place for that host, and keep sitemaps + social meta rooted on the main app.</li>
<li><strong>Observability.</strong> Watch queue time and p95s on both apps. You should see the main app flatten out quickly.</li>
</ol>

<p>Most importantly, treat this like a runtime composition change, not an architecture rewrite. You can ship it safely in small patches and keep rolling forward.</p>

<p><figure>
  <img alt="A somewhat complex diagram showing two application servers, both powered by the same underlying dependency services (e.g. databases and file service providers), both deployed from the same code repo and branch, but on different subdomains" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/52bd2851-dcd1-437e-ffa5-e8a111cb6d00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/52bd2851-dcd1-437e-ffa5-e8a111cb6d00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/52bd2851-dcd1-437e-ffa5-e8a111cb6d00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<h2 id="what-actually-moves">What Actually Moves</h2>

<p>A practical rule of thumb:</p>

<ul>
<li><strong>Stays on the main app:</strong> canonical public pages, sitemaps/robots, OpenGraph/Twitter cards, landing pages, docs/blog, marketing flows, and any route that shapes your public narrative or crawlability.</li>
<li><strong>Moves to the second app:</strong> authenticated portals, JSON APIs, front-end-driven fragments (Turbo/HTMX/Stimulus/etc.), polling endpoints, file uploads/exports, batchy or I/O-heavy controllers, and admin tooling.</li>
</ul>

<p>For navigations, you have options but need to be intentional. Keep in mind that browser address bars remain highly useful for users copying or pasting URL’s in/out and potentially sharing those URLs with others. For intra-portal / authenticated endpoints it may not matter than a user sends a colleague <code>https://2.example.com/portal/book/5</code> (especially if the colleague would’ve ended up forced over to the second app to log in to the portal anyway!).</p>

<p>But for resources and endpoints where the goal is speed and public accessibility, we’ll want to keep those endpoints pointing against the main app.</p>

<p>The good news is that we can be clever. For instance, if an endpoint is slow and synchronous (not recommended but we get it, it happens) <em>yet must result in a public URL</em>, we can still POST to the second app and do the work synchronously in that controller. We just need to make sure the response from the second app redirects back to the first. And since they share the same database, you can fluidly (for example) do an expensive <code>create</code> operation in the second app then immediately redirect to the now-existing record on the main app with confidence. There’s no delay in data propagation between the two applications! </p>

<p>In the case of our customer, this meant offloading most of their user portals and internal admin tools to the second app. Their public marketing site stayed put and immediately got calmer metrics. Problem solved!</p>

<p><figure>
  <img alt="A digram showing our customer’s ultimate break-out of their traffic across two apps" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/9b671bd0-1009-49cd-ac8b-7ee187627200/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/9b671bd0-1009-49cd-ac8b-7ee187627200/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/9b671bd0-1009-49cd-ac8b-7ee187627200/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<h2 id="judoscale-s-setup">Judoscale’s Setup</h2>

<p>We mentioned earlier that <a href="/">Judoscale</a> also runs a dual-prod-app setup, but we arrived at our split for different reasons — and with a different emphasis. We’re sharing that to underscore there isn’t one “right” pattern. For us, it was more about cost and UX than isolating slow paths&hellip; most of our endpoints are already fast!</p>

<p>Rather than sending volatile endpoints to a second app, we split by human interface. Our main app (<code>app.judoscale.com</code>) is the customer dashboard, so we tune it for UX: snappy, steady, predictable. Our API app (<code>api.judoscale.com</code>) serves the bulk of our traffic, but it’s non-human-facing and can tolerate small, occasional latency blips. The machines don’t mind! But people do. It’s not the fast-vs-volatile split we describe above (which is still the right path for this customer), but it delivers similar benefits: each runtime is optimized for what matters most to it.</p>

<p>Practically, this lets us fine-tune the API runtime for throughput and cost (concurrency, process counts, aggressive autoscaling) while keeping the main app conservative for a consistently great feel. The net effect: a calmer UX and lower hosting spend (more on cost below..). For many, the canonical split paradigm might be “fast vs. volatile” but for us it was “UX vs Cost”. It’s a different motive but the same playbook: split out a second prod app.</p>

<h2 id="a-caveat-on-cookies-auth-and-subdomains">A Caveat on Cookies, Auth, and Subdomains</h2>

<p>If you’re going to use a second app for a disparate, separate API or fully segmented authentication mechanism (like Judoscale did), feel free to skip this section. If instead you’ll be cleverly (and carefully) shuttling users between the two apps, we need to discuss shared authentication across subdomains.</p>

<p>The simplest way to accomplish this is to setup both applications with the <em>exact</em> same secret key base (or equivalent) so that cookie and session cryptographic signing validates to/from <em>both</em>. That is, if you log in on the main app, a subsequent request to the second app will see that you’re logged in. This strategy upholds the “keep both apps the <em>exact</em> same” principle by keeping sessions transparent between them. Both apps will read and write to the same session/cookie.</p>

<p>Once both applications are running the same <em>keys</em>, you’ll need to ensure that the actual cookie policies are setup correctly for both apps. Essentially we need to make sure that both apps are emitting cookies with the same sharing configurations setup so that browsers will send the same cookie to both apps. In Rails that might look something like this (for session storage via cookie):</p>
<div class="highlight"><pre class="highlight ruby"><code><span class="no">Rails</span><span class="p">.</span><span class="nf">application</span><span class="p">.</span><span class="nf">config</span><span class="p">.</span><span class="nf">session_store</span><span class="p">(</span>
  <span class="ss">:cookie_store</span><span class="p">,</span>
  <span class="ss">key: </span><span class="s2">"_my_app_shared_session_key"</span><span class="p">,</span>
  <span class="ss">domain: </span><span class="s2">".example.com"</span><span class="p">,</span>      <span class="c1"># explicit eTLD+1; covers example.com + subdomains</span>
  <span class="ss">expire_after: </span><span class="mi">1</span><span class="p">.</span><span class="nf">year</span><span class="p">,</span>
  <span class="ss">secure: </span><span class="kp">true</span><span class="p">,</span>                <span class="c1"># if this fails in specs/tests, switch to `!Rails.env.test?`</span>
  <span class="ss">same_site: :lax</span><span class="p">,</span>             <span class="c1"># mitigates CSRF while allowing subdomains</span>
  <span class="ss">httponly: </span><span class="kp">true</span>
<span class="p">)</span>
</code></pre></div>
<p>But, as with all things security-related, make sure you understand <em>every</em> config component here and are confident in your security strategy amidst sharing cookies between the two apps. YMMV.</p>

<h2 id="magic-p95-s-and-threads">Magic, P95’s, and Threads</h2>

<p>It’s worth taking a little detour here to assess the magic of what we’re presenting: it isn’t. There’s no real magic at play here — this is just simple queueing theory with friendlier furniture. We’ve talked about queueing theory broadly in <a href="/blog/request-queue-time">“Queue Time: The Metric that Matters”</a> but the mechanism at play in scaling sideways isn’t radical. When slow requests leave the main thread pool, fast requests stop waiting behind them. That means lower overall variance in request speeds (e.g. lower p95’s) and an app that users will probably describe as feeling “snappier”.</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-sky-50 dark:bg-sky-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-sky-900 dark:text-sky-400">
    👀 Note
  </h4>
  <div class="mt-2.5 text-sky-800 prose-a:text-sky-900 dark:prose-a:text-white prose-code:text-sky-400 dark:text-sky-200 dark:prose-code:text-gray-300">
    <p></p>

<p>Of course the slowness has to go <em>somewhere</em>&hellip; but we can be much more relaxed around the variance and volatility of our second app. When the slowness is going somewhere <em>made</em> to be slow, it feels much better.</p>

  </div>
</div>

<p>In fact, we can use our “keep the fast app fast” and “keep the slow app slow” mindset in tweaking our thread counts in each app. For a main app we recommend three Puma threads. That’s Rails’ <a href="/blog/puma-default-threads-changed">new standard</a> and proves to be an excellent tradeoff: increased throughput with a reasonable, low tail-latency increase (especially after you move all of the slow requests to the second app!). That said, we recommend deliberately choosing a higher thread count on the second app. Maybe five, maybe six. Your mileage will vary on specifics, but when we design and spin up an application <em>specifically</em> to handle our slower (likely I/O-bound) requests, <em>especially</em> when we aren’t as worried about response times, we can really leverage the power of a large thread pool. This should allow us to keep our instance-count low — a single server instance running five or six threads should be able to handle quite a bit of stuff! </p>

<h2 id="autoscaling-two-applications">Autoscaling Two Applications</h2>

<p>Finally, the last major topic to cover for scaling sideways is indeed autoscaling. First, you should use <a href="/">Judoscale (👋)</a>. Okay, obvious plug aside, there’s a little nuance here: you’re going to want both apps to autoscale. But they’re going to do so with different parameters and goals.</p>

<p><strong>Main app:</strong> now that variance is down and your endpoints are consistently performant, we’ll want to clamp our queue-time thresholds a bit tighter. The target is a flat, boring queue-time line very near zero. In Judoscale, you should see low enough metrics that an upscale threshold between 5-10ms feels very stable and scales nicely with your actual traffic curves (not erratically)!</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-green-50 dark:bg-green-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-green-900 dark:text-green-500">
    ✅ Tip
  </h4>
  <div class="mt-2.5 text-green-800 prose-a:text-green-900 dark:prose-a:text-white prose-code:text-green-900 dark:text-gray-300 dark:text-green-100/80 dark:prose-code:text-gray-300">
    <p></p>

<p>If your app has burstable traffic loads at known times, you should still define <a href="/docs/leveraging-schedules">a schedule</a> for your autoscaling. If it has burstable traffic loads at <em>unknown</em> times, consider autoscaling to guarantee a <a href="/blog/introducing-proactive-autoscaling">certain level of headroom</a>.</p>

  </div>
</div>

<p><strong>Second app:</strong> still scale on queue time but <em>expect</em> volatility and small spikes that self-resolve. We’d recommend a moderately high upscale threshold like 80ms as well as reducing upscale sensitivity to 20 seconds so brief jitters don’t cause thrashing (AKA ping-pong scaling, which we discussed <a href="/blog/autoscale-tuning-part-3-settings">here</a>). We want to upscale when necessary, but wait a moment to be sure that upscaling is, in fact, necessary.</p>

<p>So, all of that to say, queue time is still absolutely the metric to watch for scaling on both applications. And Judoscale is still absolutely the tool to use. But refining our scaling parameters for each app in their own context is the real path to success here! We want tight bounds and strict expectations on the main app with looser, workload-aware settings on the second.</p>

<h2 id="a-note-on-cost">A Note on Cost</h2>

<p>To address the potential elephant in the room: scaling sideways this way <em>may</em> cost a little more in your overall hosting bill. That’s true. But keep in mind that our first goal here was to optimize and speed up a sector of an application without refactoring the whole application. This <em>is</em> a “Can we throw money at the problem?” solution.</p>

<p>But there’s actually better news: <u>it’s likely that this strategy won’t actually cost much more than your base hosting level now</u>. Remember that the main app is likely going to run fewer instances the more surface area you move away from it. That’s savings. And the second app should make broader use of multi-threading, so it too may need fewer instances than you expect. That’s cheap!</p>

<p>At the end of the day, snappier user experiences and conversions tends to yield more sales, and more sales means you probably have a little more space in your hosting budget. We’re not advocating for going wild here — you should still autoscale both applications to keep things efficient — but this strategy is a reasonable cost-path forward for powerful performance gains.</p>

<h2 id="scale-sideways">Scale Sideways</h2>

<p><figure>
  <img alt="A simple diagram showing two boxes and an arrow between them, the first being “prod”, the second being labeled “Also prod?” And the title “Scaling Sideways” above both" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/3c1991f7-9a6b-4fc8-ef8f-96638358ba00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/3c1991f7-9a6b-4fc8-ef8f-96638358ba00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/3c1991f7-9a6b-4fc8-ef8f-96638358ba00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>We started with a simple ask: “optimize the public site for SEO”, and a familiar constraint: one app serving very different kinds of traffic. That’s why we reached for the often-overlooked move of running a second production app. It squarely addressed this customer’s need: keep the public face fast and predictable while letting portals and internal tools be as spiky and complex as they need to be. We should know, we do the same thing (though not for SEO purposes)!</p>

<p>The path there doesn’t require a big‑bang migration. Stand up the second app, put guardrails in place, and move traffic in slices. Begin with front-end calls, shift over some API action, then gradually migrate entire user-portals when you’re confident in your URL sharing&hellip; all while feature-flagging shifts to build confidence.</p>

<p>What you get for that incremental effort is real performance gain with little added domain complexity or cost. The main app’s thread pool narrows to the fast paths, queue time flattens, and p95s stop lurching. The second app absorbs the messy variance without leaking it into the public experience. Same codebase, two runtimes, each excellent at a different job. If your intro sounds like our customer’s (“we’re optimizing public performance for SEO”), or ours (“we really ned to optimize our API for throughput and reliability”), this is the strategy that keeps the promise without rewriting the product or doubling your spend.</p>
]]>
      </content:encoded>
    </item>
    <item>
      <title>Dealing With Heroku Memory Limits and Background Jobs</title>
      <description>Learn how to isolate heavy jobs on Heroku with a dedicated, autoscaled worker to avoid costly Performance dynos for all workers.</description>
      <pubDate>Thu, 30 Oct 2025 00:00:00 +0000</pubDate>
      <link>https://judoscale.com/blog/priced-out-of-heroku</link>
      <guid>https://judoscale.com/blog/priced-out-of-heroku</guid>
      <author>Adam McCrea</author>
      <content:encoded>
        <![CDATA[<blockquote><p>I added one background job and now I&rsquo;m priced out of Heroku.</p>
</blockquote>
<p>I&rsquo;ve heard some variation of this too many times to count. Your app hums along fine on Standard dynos…until you add video encoding, giant imports, or some other memory‑hungry job. Suddenly your worker needs a bigger box, and upgrading every worker to Performance dynos feels like buying a school bus because you might carpool once.</p>

<p>There&rsquo;s a simple pattern that keeps your bill sane and your architecture <a href="/blog/boring-software">boring</a> (the good kind): Put the heavy job on its own queue, give it a dedicated worker process, and autoscale that process to zero when it&rsquo;s idle. The rest of your app stays on Standard dynos.</p>

<p>This post focuses on a real example from Justin Searls on his podcast, <a href="https://justin.searls.co/casts/breaking-change/" target="_blank" rel="noopener">Breaking Change</a>—and exactly how I&rsquo;d set this up on Heroku.</p>

<h2 id="justins-story-4k-video-meets-1-gb-dynos">Justin&rsquo;s story: 4K video meets 1 GB dynos</h2>

<p>Justin&rsquo;s adding support for Instagram Stories (and soon Facebook) to <a href="https://posseparty.com" target="_blank" rel="noopener">POSSE Party</a>, his tool for syndicating your own content to social media. The shape of the problem:</p>

<ul>
<li>A minute of 4K HDR video is 700–800 MB.</li>
<li>Instagram only accepts 1080p within strict codec limits.</li>
<li>Custom server-side re‑encoding produces a compliant, compressed file.</li>
</ul>

<p>Everything worked fine, until a real 4K file hit the server. On Heroku Standard dynos (1 GB RAM), FFmpeg spikes over 1.2 GB during 4K→1080p. Heroku starts swinging the OOM hammer. Sometimes the encode finishes just in time before Heroku shuts it down. Sometimes not. Either way, it&rsquo;s &ldquo;no way to live.&rdquo; (Justin&rsquo;s words)</p>
<div class="highlight"><pre class="highlight plaintext"><code>2025-10-29:29:13.606617+00:00 heroku[worker.1]: Process running mem=844M(165.0%)
2025-10-29:29:13.608658+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2025-10-29:29:38.911996+00:00 heroku[worker.1]: Process running mem=810M(158.3%)
2025-10-29:29:38.913195+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2025-10-29:30:03.936646+00:00 heroku[worker.1]: Process running mem=1197M(233.8%)
2025-10-29:30:03.942490+00:00 heroku[worker.1]: Error R15 (Memory quota vastly exceeded
2025-10-29:30:03.944352+00:00 heroku[worker.1]: Stopping process with SIGKILL
2025-10-29:30:04.133323+00:00 heroku[worker.1]: Process exited with status 137
2025-10-29:30:04.179846+00:00 heroku[worker.1]: State changed from up to crashed
</code></pre></div>
<p>The apparent choices: 1. Upgrade to Performance dynos (ouch, $$$, especially if you upgrade all workers), or 2. Break out encoding to a separate service you run elsewhere (more moving parts, Active Storage integration gets annoying), or 3. Do it client‑side with WebCodecs (promising, but HDR tone‑mapping and codec constraints are tricky).</p>

<p>There&rsquo;s a fourth option that&rsquo;s dead simple and keeps everything on Heroku:</p>

<p><strong>Isolate the heavy job on its own queue, back it with a dedicated worker that uses a Performance dyno, and autoscale that worker from 0→1 only when needed.</strong></p>

<p>The perf dyno runs for a few minutes a month, all automated, costing almost nothing.</p>

<h2 id="the-dedicated-worker-pattern-at-a-glance">The &ldquo;dedicated worker&rdquo; pattern (at a glance)</h2>

<ol>
<li>Create a dedicated queue for heavy jobs, e.g. &ldquo;memory_hog&rdquo;.</li>
<li>Run a dedicated worker process that only monitors <code>memory_hog</code> with concurrency = 1.</li>
<li>Set that process&rsquo;s dyno type to a Performance size. Leave the quantity at 0.</li>
<li>Autoscale that worker based on queue latency (queue time).</li>
<li>Enqueue jobs; let autoscaling do the rest.</li>
</ol>

<p>If you&rsquo;ve read our post on <a href="/blog/planning-sidekiq-queues">planning your Sidekiq queues</a>, you know I&rsquo;m a huge advocate for latency‑based queue names. This is the exception. When memory is the constraint, name the queue accordingly so its purpose is obvious and you can add similar jobs later.</p>

<h2 id="step-by-step-setup">Step by step setup</h2>

<p>Let&rsquo;s walk through the actual implementation of this pattern. I&rsquo;m focusing on Sidekiq and Heroku here, but you can apply the same concepts to any job/task queue and cloud hosting platform.</p>

<p>1. Point the heavy job at a dedicated queue</p>
<div class="highlight"><pre class="highlight plaintext"><code># app/jobs/encode_video_job.rb

class EncodeVideoJob
  include Sidekiq::Job
  sidekiq_options queue: :memory_hog

  ...
end
</code></pre></div>
<p>2. Add a dedicated worker process</p>
<div class="highlight"><pre class="highlight plaintext"><code># Procfile

worker: bundle exec sidekiq -c 5 -q within_5_seconds -q within_5_minutes
memory_hog_worker: bundle exec sidekiq -c 1 -q memory_hog
</code></pre></div>
<p>Keep it single-threaded (<code>-c 1</code>) to avoid multiplying memory usage. If you truly need parallel encodes later, raise carefully.</p>

<p><figure>
  <img alt="A diagram showing a normal worker (standard dyno) and a separate memory_hog worker (perf dyno)" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/b6ce97ef-20c1-440c-fc90-e370fed4b100/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/b6ce97ef-20c1-440c-fc90-e370fed4b100/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/b6ce97ef-20c1-440c-fc90-e370fed4b100/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>3. Set the dyno type, not the dyno count</p>

<p>In the Heroku dashboard, open the <code>memory_hog</code> process and choose a Performance dyno size (whatever meets your memory needs). Leave the quantity at 0. This only tells Heroku what kind of dyno to use when you scale up later.</p>

<p><figure>
  <img alt="Heroku process settings for memory_hog" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/12346da8-148b-4e55-5787-87b38c1a2700/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/12346da8-148b-4e55-5787-87b38c1a2700/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/12346da8-148b-4e55-5787-87b38c1a2700/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>4. Wire up autoscaling in Judoscale</p>

<p>You can use any autoscaler that can scale your workers based on job queues. This is the Judoscale blog, so we&rsquo;re using Judoscale.</p>

<ul>
<li>Process: memory_hog</li>
<li>Scale range 0—1 dynos (maybe more?)</li>
<li>Scale up when queue latency &gt;= 1 second (anything in the queue)</li>
<li>Scale down when queue latency drops below 1 second (essentially idle)</li>
</ul>

<p><figure>
  <img alt="Judoscale rule screenshot — “Scale 0–1 when Queue Time hits 1 second.”" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/17dd7d56-64c2-4252-90a6-318e6d8be500/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/17dd7d56-64c2-4252-90a6-318e6d8be500/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/17dd7d56-64c2-4252-90a6-318e6d8be500/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>Now when you enqueue a video, queue latency rises, Judoscale starts one Performance dyno for memory_hog, FFmpeg runs, and the worker scales back to 0 when the queue drains.</p>

<p>5. Test it end‑to‑end</p>

<ol>
<li>Enqueue a big video.</li>
<li>Watch queue latency bump.</li>
<li>See Judoscale scale memory_hog 0→1.</li>
<li>Encode finishes; dyno scales back to 0.</li>
<li>Celebrate not paying for a big box 24/7.</li>
</ol>

<h2 id="gracefully-handling-long-running-jobs">Gracefully handling long-running jobs</h2>

<p>Jobs that consume a lot of memory also tend to be long-running jobs. We consider a &ldquo;long-running job&rdquo; to be any job that takes longer than the shutdown timeout for the given job processor, which is usually 25 seconds.</p>

<p>Out-of-the-box, Judoscale will downscale a worker service as soon as the queue is empty. As long as your jobs complete within the shutdown timeout, this is fine—The worker will receive the shutdown signal, finish the job, then shut down. If the job takes longer, it&rsquo;ll be killed before it can finish 👎.</p>

<p>Judoscale handles this scenario with an opt-in configuration to <a href="/docs/long-running-jobs">prevent downscaling when jobs are busy</a>.</p>

<p><figure>
  <img alt="Screenshot: Option to prevent downscaling when jobs are busy" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/2e31ea1e-e3dc-45e1-1432-e92e53f1dd00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/2e31ea1e-e3dc-45e1-1432-e92e53f1dd00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/2e31ea1e-e3dc-45e1-1432-e92e53f1dd00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>To see this option in the UI, you must enable &ldquo;busy job&rdquo; tracking in your code. Check out <a href="/docs/long-running-jobs">the docs</a> for details.</p>

<h2 id="alternative-approaches">Alternative approaches</h2>

<p>Hopefully it&rsquo;s obvious why this beats the &ldquo;just upgrade everything&rdquo; approach. It&rsquo;s completely unnecessary! If you only need a perf dyno for one occasional job, there&rsquo;s no reason to pay for it 24/7.</p>

<p>The other alternative I often hear (and Justin mentioned in the podcast) is extracting this work into a separate service outside of Heroku. It&rsquo;s cliche at this point to talk about how cheap hardware is if you get closer to the metal. Yes, you pay a substantial tax to use a PaaS like Heroku, but that tax buys your time. There&rsquo;s simply no reason to waste time spinning up new infra when simple (and cheap!) solutions exist on your current platform.</p>
<div class="my-8 rounded-3xl px-6 py-1 bg-sky-50 dark:bg-sky-900/30 dark:ring-1 dark:ring-gray-300/10" >
  <h4 class="font-medium text-sky-900 dark:text-sky-400">
    👀 Note
  </h4>
  <div class="mt-2.5 text-sky-800 prose-a:text-sky-900 dark:prose-a:text-white prose-code:text-sky-400 dark:text-sky-200 dark:prose-code:text-gray-300">
    <p></p>

<p>For more general advice on reducing memory usage in Rails, check out our other posts: How to Use Less Memory, <a href="/blog/rails-on-heroku-use-less-memory-pt-1">Part 1</a> and <a href="/blog/rails-on-heroku-use-less-memory-pt-2">Part 2</a>.</p>

  </div>
</div>

<h2 id="take-action">Take action</h2>

<p>Judoscale was built for exactly this: autoscaling workers by queue latency, including scaling to zero. Turn it on, ship your feature, and leave the school bus at the dealership.</p>

<p>If you want help wiring it up, <a href="mailto:hello@judoscale.com">email us</a> or <a href="/chat">call us</a>.</p>
]]>
      </content:encoded>
    </item>
    <item>
      <title>How to Choose a Node.js Framework</title>
      <description>The popularity of Node means we have lots of framework options. Dig in and learn how to choose between the most common Node.js frameworks.</description>
      <pubDate>Thu, 18 Sep 2025 00:00:00 +0000</pubDate>
      <link>https://judoscale.com/blog/which-node-framework</link>
      <guid>https://judoscale.com/blog/which-node-framework</guid>
      <author>Jeff Morhous</author>
      <content:encoded>
        <![CDATA[<p>Building a Node.js server without a full framework is technically possible, but leaning on a framework will certainly improve your development speed and likely your app&rsquo;s quality. Frameworks give us reusable tools that help us not have to reinvent the wheel for every little thing. Choosing a Node framework (and picking the right one) will ensure you spend less time on basic functionality and more time on your actual application logic.</p>

<p><figure>
  <img alt="Comparing Node.js frameworks" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/29815dfe-d245-4e18-9dc0-8249e09ada00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/29815dfe-d245-4e18-9dc0-8249e09ada00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/29815dfe-d245-4e18-9dc0-8249e09ada00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>The popularity of JavaScript and Node.js means we get a handful of really incredible frameworks to choose from and plenty of <a href="/blog/heroku-alternatives">great hosting options</a>. In this article, I&rsquo;ll explain what makes each framework unique and how you can choose between them. Ready to jump in with me?</p>

<h2 id="using-express-js">Using Express.js</h2>

<p>Express is the common standard for Node.js web frameworks. It&rsquo;s the Node framework that nearly every Node developer has used or at least heard of. It’s a small framework that doesn’t impose a strict structure. This makes it pretty <strong>flexible</strong>. You can use Express to build everything from simple JSON APIs to full-blown web applications. Even if you don&rsquo;t choose to build on Express, the framework you do choose may include it under the hood.</p>

<p><figure>
  <img alt="Expressjs is a common framework meme" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/553d93e2-9fcf-4d96-06e6-e8359340b800/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/553d93e2-9fcf-4d96-06e6-e8359340b800/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/553d93e2-9fcf-4d96-06e6-e8359340b800/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>Getting started is very easy, as Express intentionally has a very shallow learning curve. If you’re new to Node, Express is often the first recommendation from the JS community. I put a HUGE emphasis on developer experience. It&rsquo;s why I love Rails. The Express developer experience is straightforward, but it&rsquo;s not an all-inclusive framework like Rails or even Laravel.</p>

<p>One of Express’s biggest strengths is its huge ecosystem of middleware and plugins. Because it’s been the most popular Node framework for years, almost any feature you need has an Express middleware available. This points out an issue. Express does not have all that much without this middleware, so you&rsquo;ll be forced to make lots of choices for your app!</p>

<p>One important thing to keep in mind is that <strong>Express is not the fastest framework out there</strong>. It’s known to have a bit of overhead in its request handling. Other frameworks like Fastify or Koa can beat Express on throughput. Express is ideal when you want maximum flexibility and a quick start to a project.</p>

<p><strong>Express has stood the test of time. It is mature, stable, and proven in production.</strong> Many high-traffic websites have been built with Express, an Express app can certainly scale to handle very large loads. Since Express leaves structure to you, it can scale <em>organizationally</em> if you apply good conventions. But you won’t get as much built-in support for scaling as you would from a more structured framework like Nest.</p>

<h2 id="using-fastify">Using Fastify</h2>

<p>Fastify is a newer player than to Express, but it has gained a lot of traction due to its performance. As the name implies, Fastify was <strong>designed for fast Node apps</strong>. This raw speed comes from an event-driven, highly optimized internal architecture with minimal overhead per request.</p>

<p><figure>
  <img alt="Fastify performance benchmarks" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/79d9ecea-74c3-4b32-60f5-4332663b0400/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/79d9ecea-74c3-4b32-60f5-4332663b0400/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/79d9ecea-74c3-4b32-60f5-4332663b0400/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>Fastify’s API and concepts will feel familiar to you if you already know Express. You still register routes and handlers, and you can also use middleware (though in Fastify they’re called <em>hooks</em> or plugins). The learning curve is moderate, certainly more than express. This is because it’s a bit more structured than Express, pushing you to define JSON schemas for your requests/replies that can be used for validation and serialization. This schema-based approach is optional but strongly recommended for the framework as it helps catch errors and improve performance.</p>

<p>Fastify does have an official plugin system with many plugins available to add features. The documentation is solid (I read tons of documentation), and the community is growing fast.</p>

<p>Performance is Fastify’s headline claim to fame. It consistently ranks among the top Node frameworks in terms of throughput and low latency. In a real-world context, this means a Fastify server can handle more concurrent requests on the same hardware than an Express server, which gives you room to grow.</p>

<p>If <a href="/node">you&rsquo;re autoscaling your Node app</a> (you should be), Fastify’s low overhead is even more beneficial. Each instance can do more work, potentially reducing the number of instances you need and certainly making each scale event more granular. There’s no special magic beyond this. Because each process is more efficient, your scaling is more cost-effective.</p>

<p>A great use case is <strong>building APIs</strong> where performance is the most important requirement. Fastify stands out in use cases where every millisecond counts and in resource-constrained environments. The trade-offs are relatively minimal. Fastify isn’t as universally known as Express, and its community, while enthusiastic, is smaller than Express’s. That said, it’s not <em>obscure</em> – plenty of big companies use Fastify.</p>

<h2 id="using-koa">Using Koa</h2>

<p>Koa is often described as the successor to Express. It was actually created by the same team behind Express, with the goal of making a smaller, more modern framework. Koa is even more minimalist than Express, shipping without any middleware (not even a router!). This sounds daunting if you&rsquo;re used to something batteries-included like Ruby on Rails, but it’s by design.</p>

<p>Koa aims to strip the core down to just the essentials, leaving it up to you to add only what you decide you need. Koa’s core library is tiny, only spanning around 600 lines of code. It leverages modern JavaScript features like async/await, which allows Koa to avoid the legacy of callback-based middleware that Express 4 uses. </p>

<p>If you already know Express, picking up Koa is straightforward (conceptually, at least). You still create a server, define routes, and use your chosen middleware. Many developers find Koa’s usage of <code>async</code> functions makes middleware composition and error handling simpler than Express. The learning curve is moderate, more of a challenge than the other Node frameworks we&rsquo;ve discussed so far.  You will need to assemble your own stack of middleware, which means reading documentation and doing a proper implementation for those pieces. Koa intentionally does <em>not</em> maintain compatibility with Express-style middleware, so you can’t just drop in an Express middleware and expect it to work.</p>

<p>Koa’s lean design does have performance benefits. With less overhead, Koa can handle more requests per second than Express in comparable scenarios.  In practice, Koa’s performance is <em>pretty good</em> for a Node framework, but not as good as Fastify.</p>

<p>Koa is a great choice if you want a <strong>modern, minimal framework</strong> and are comfortable assembling <em>all</em> the pieces you need. It’s well-suited for building APIs or web services where you might otherwise use Express, but you’d prefer a cleaner async/await-based flow. It’s like Express but faster and a bit more refined, at the cost of doing a little more setup on your own.</p>

<h2 id="using-nest-js">Using Nest.js</h2>

<p>NestJS is a different paradigm from Express, Fastify, or even Koa. <strong>Nest is a full-featured, batteries-included framework</strong>. The idea behind Nest is to give Node applications a structured architecture out of the box. It adopts a modular approach, with support for TypeScript and features that will remind you of fuller frameworks like Rails or Laravel.</p>

<p><figure>
  <img alt="Nest.js features" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/b21dd93d-fda1-41f2-7955-3ed4be3faa00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/b21dd93d-fda1-41f2-7955-3ed4be3faa00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/b21dd93d-fda1-41f2-7955-3ed4be3faa00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>Working with Nest can be a mix of excitement at what&rsquo;s included and dread all there is to learn. The developer experience is driven by the fact that NestJS <em>forces</em> a structure. Personally, I love this, but I&rsquo;m already a huge fan of convention over configuration. It also makes Nest great for large teams since everyone knows where to put their code and how things should be structured.</p>

<p>The framework comes with a CLI that can generate boilerplate, just like Rails scaffolds. However, the <strong>learning curve is steep</strong>. You need to get comfortable with lots of concepts before becoming productive, since they affect the overall flow of Nest apps. </p>

<p>A NestJS application uses the Express framework under the hood. This means that Nest’s performance will be somewhere in the ballpark of Express, maybe even a bit slower due to the overhead. Still, you can swap out the underlying HTTP adapter to use Fastify instead of Express. With a one-line change, your Nest app can run on Fastify and instantly get a big boost in throughput.</p>

<p>That said, pure performance is usually not the deciding factor in choosing to use Nest. Most people who choose Nest do so for its organization and convention. It shines in large, long-term projects where maintainability is key.</p>

<p>NestJS <strong>does not include a frontend framework</strong>. It is purely a backend framework, but it can serve a frontend if you want it to. Since it’s just a Node app (built on Express or Fastify), you can add a view engine like Handlebars to render server-side HTML templates.</p>

<p>NestJS does not ship with a built-in ORM, which makes me sad! It does, however, cleanly integrate with whatever data access layer you prefer. The community has produced official and semi-official integrations including TypeORM and Primsa.</p>

<p>If your project will be worked on by many developers, or you foresee it growing significantly in scope, Nest provides the architecture to manage that complexity. It’s also great if you prefer strongly typed code and are a fan of Angular or Java-style frameworks. Many companies choose Nest for mission-critical apps because it imposes a consistent structure and has many reliability features built in.</p>

<h2 id="using-next-js">Using Next.js</h2>

<p>Node.js is also used in full-stack frameworks that integrate the frontend and backend together. Next.js is a framework for building web applications with React (frontend) <em>and</em> Node.js (backend) <em>together</em>, but it&rsquo;s best described as a frontend-first option. With Next, you can render React pages on the server (SSR), generate static sites, and define serverless API routes all in the same project. </p>

<p><figure>
  <img alt="Next.js is a Node.js framework" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/01c2f688-32f0-4bc9-e2ba-74c452425900/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/01c2f688-32f0-4bc9-e2ba-74c452425900/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/01c2f688-32f0-4bc9-e2ba-74c452425900/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>I have a love/hate relationship with Next. It has a generally good developer experience, especially for frontend engineers who want to expand into backend without leaving the React ecosystem. Next handles all the build tooling, code splitting, etc. The local development experience is genuinely great, boasting hot reloading, actually useful errors, and so on. Still, the learning curve Next is moderate. You need to know React, and then learn Next’s conventions. That sounds simple, except those <strong>conventions change in breaking ways relatively often</strong>. It&rsquo;s not uncommon for me to visit an old Next project and find it needs lots of work to bring up to date.</p>

<p>Next.js is best for scenarios where you need to build a frontend interface along with your backend logic. It allows for server-side rendering (SSR), which means pages are pre-rendered on the server for fast initial load and SEO, as well as static site generation (SSG) where pages can be precomputed at build time and served as static files.</p>

<p>Since Next is a frontend-first framework, it does not provide an ORM either. If you need an ORM, you add it yourself just like you would in a plain Node.js app.</p>

<p>It blends the lines between a frontend framework and a backend framework, which may be a selling point or a problem depending on your perspective. That said, for many teams, the productivity gain outweighs these concerns. Next.js is often recommended for full-stack apps because it offers great performance to the end user and fairly easy scalability (if you deploy to a platform like Vercel, and it scales automatically).</p>

<h2 id="how-to-make-the-right-decision">How to make the right decision</h2>

<p>Choosing the best Node framework for your project is hard. Of course, it depends on what you’re building and what your priorities are. It’s not one-size-fits-all. Still, I&rsquo;ll do my best to give you a straightforward decision-making framework.</p>

<p><figure>
  <img alt="Node frameworks compared" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/e673f215-ed72-4cd3-ba1c-42bd6eea7800/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/e673f215-ed72-4cd3-ba1c-42bd6eea7800/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/e673f215-ed72-4cd3-ba1c-42bd6eea7800/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>For quick prototypes of simple APIs, just use <strong>Express</strong>. Its simplicity is unbeatable for getting something off the ground fast. You won’t waste time fighting the framework or learning new conventions.    </p>

<p>For high-performance APIs, use <strong>Fastify</strong>. If you expect heavy load or just want to maximize efficiency, Fastify will give you more headroom. It’s a great choice for building JSON APIs or microservices where throughput is critical. The trade-off is a slightly smaller community and middleware options, but the core features are pretty good by themselves.</p>

<p><strong>Koa</strong> is an option if you like to keep things lightweight and are interested in using something super modern. It’s a nice upgrade from Express when you want to embrace async/await and have more control. Choose Koa only when you are comfortable assembling your own toolkit and you want a framework that stays out of your way.</p>

<p>For large-scale, enterprise APIs, <strong>NestJS</strong> is often the best choice. If you’re building something that might grow to dozens of modules, with several teams working on it, Nest provides the architecture to keep it maintainable. It’s the go-to if you prefer strong conventions, TypeScript, and a full-featured backend framework that comes with everything included. It scales just like any other backend option and has a great structure that&rsquo;s easy to lean on.</p>

<p>For applications that need a strong frontend component or you simply like having both parts of a project in one framework, <strong>Next.js</strong> is hard to beat. Naturally, the full-stack nature of the platform gives you plenty more options for your work than Nest or one of the backend options. If your project is essentially a React app that needs server-side rendering or will benefit from static site generation, Next.js will make your life much easier.</p>

<p>I couldn&rsquo;t help but add Rails and Laravel to this chart to make the comparison even more clear. I&rsquo;m quite attached to those frameworks&rsquo; batteries-included approach to building web apps. With either framework, you can build fully-featured CRUD apps that support user authentication and a dozen other common features, all without adding a single library. Next is the only Node framework that comes close to this amount of support.</p>

<p>That being said, all of these frameworks are perfectly capable of building useful, scalable applications. Just keep in mind that they offer different balance of convenience, performance, and structure. Good luck choosing!</p>
]]>
      </content:encoded>
    </item>
    <item>
      <title>Autoscaling Insights: What Nearly A Decade Of Autoscaling Your Apps Has Revealed To Us</title>
      <description>Learn autoscaling best practices: handle noisy neighbors, set headroom, tune queue times, scale workers to zero, and avoid thrashing.</description>
      <pubDate>Sat, 13 Sep 2025 00:00:00 +0000</pubDate>
      <link>https://judoscale.com/blog/autoscaling-insights-what-nearly-a-decade-of-autoscaling-your-apps-has-revealed-to-us</link>
      <guid>https://judoscale.com/blog/autoscaling-insights-what-nearly-a-decade-of-autoscaling-your-apps-has-revealed-to-us</guid>
      <author>Jon Sully</author>
      <content:encoded>
        <![CDATA[<p>We’ve been autoscaling apps for a long time — almost a decade! That’s long enough to see patterns repeat across Rails, Node, and Python; Redis and Postgres; Heroku, AWS, Render, and more. From these experiences and insights we wanted to put together a compendium of mistakes, misconceptions, less-intuitive ideas, and “oh wow that <em>does</em> matter” moments as best we could.</p>

<p>We’ll organize this like a listicle, but we’ll connect the dots as we go so it reads like one story: what queue time really tells you, why shared hardware is noisy, how scheduling fits in, and how to keep your scaler from thrashing.</p>

<h2 id="1-shared-hardware-is-noisy">1) Shared hardware is noisy</h2>

<p>Multi-tenant machines are cheaper because they’re shared, but providers <em>overprovision</em> their shared machines. It’s not “well we have 10 cores so we can host 10 apps, each getting 1 core of CPU”. Depending on how much the hosting provider wants to <del>profit</del> cram applications into a single host, it could be a lot more like, “well we have 10 cores and apps tend to underutilize what’s available so let’s stuff 30 apps onto those 10 cores” 😬</p>

<p><figure>
  <img alt="Illustration graphic showing two example servers with eight cores of compute; the first having only eight tenants across those cores, the second having tens of tenants, with the banner text on the top of the image reading ‘Shared Hardware is Overprovisioned’" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/9c1f5147-951d-4b3a-4482-81366c538000/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/9c1f5147-951d-4b3a-4482-81366c538000/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/9c1f5147-951d-4b3a-4482-81366c538000/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>When your app shares CPU with other tenants, one neighbor’s burst is another neighbor’s tail‑latency blip. We’ve talked about this <a href="/blog/shared-hardware-how-bad-can-it-get">several times</a>, but the “noisy neighbor” effect is not groundbreaking or new. It’s also not a single-provider issue: every hosting platform that operates on shared hardware is subject to some degree of neighbor noise!</p>

<p><strong>The takeaway:</strong> get familiar with what noisy neighbors look like in your app. Learn to read your charts so you can tell app‑wide issues from single‑dyno outliers (the noisy‑neighbor tell).</p>

<p><strong>The action item:</strong> enable Judoscale’s <a href="/blog/noisy-neighbors-fixed">Dyno Sniper</a> (by-request only as of September 2025) feature to automatically detect and restart services that fall prey to a noisy neighbor’s delay. It’s free. It’s magic. It works everywhere. There’s really no downside.</p>

<h2 id="2-autoscaling-for-headroom-is-hard-most-teams-miss-it">2) Autoscaling “for headroom” is hard (most teams miss it)</h2>

<p><em>Many</em> teams use autoscaling specifically to keep a certain level of headroom available for unknown bursts of traffic ahead. It’s not the most efficient way to run an app, but the premise can prevent downtime for highly burst-prone traffic loads. That said, actually configuring your autoscaler to do that correctly is <em>extremely</em> difficult. Most teams end up very over-provisioned constantly (e.g. wasting money! 💰) or without the headroom they desire, and alerts when bursts arrive 🚨.</p>

<p>Most autoscalers operate on what we’ve come to call “<a href="/blog/introducing-proactive-autoscaling">reactive metrics</a>”. These reactive metrics are excellent. They’ve always been excellent. When you’re using an autoscaler that’s watching them and reacts quickly (like Judoscale!), reactive metrics are absolutely the right answer for 90+% of applications. That said, if you’re in the other 10% (specifically looking for autoscaling-<em>with-headroom</em>), reactive metrics aren’t the right tool. If you need to maintain a proportional overhead of capacity relative to your scale as it changes, you’ll need Judoscale’s custom Utilization-based autoscaling. It allows you to say “Keep me at about 70% capacity utilization so that I always have 30% overhead”.</p>

<p><strong>The takeaway</strong>: to our knowledge, Judoscale is the only autoscaler out there that offers true <a href="/blog/how-utilization-works">proactive, headroom-prioritized autoscaling</a> via custom “Utilization” measurement 24/7. If you need that kind of behavior, get Judoscale installed and activated. You’ll be surprised at how useful extra headroom can be in high-burst loads!</p>

<p><figure>
  <img alt="An illustrated diagram showing a statically-scaled app failing to have capacity to handle requests as traffic load rises" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/7363ea95-9b8f-4477-eaa2-9a3773125800/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/7363ea95-9b8f-4477-eaa2-9a3773125800/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/7363ea95-9b8f-4477-eaa2-9a3773125800/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<h2 id="3-queue-time-ranges-for-healthy-apps-are-lower-than-we-expected">3) Queue time ranges for healthy apps are lower than we expected</h2>

<p>After several years of watching both customer queue time data as well as our own, we decided to lower the default queue time threshold for new apps on Judoscale. While that change could be worth an article of its own, suffice it to say that our realization was based around the stability of shared and dedicated hardware. Queue time thresholds for dedicated hardware (think Heroku’s <code>Perf-</code> series and/or Fir platform) can be <em>very</em> low. As in, “scale up if queue time hits 5ms”. And that’s ultimately a reflection of a very stable and operational stack — a request hitting Heroku, getting routed, and hitting your dyno consistently and predictably. The moment that dyno begins queuing requests for more than just a few milliseconds, we can be sure there’s a capacity problem.</p>

<p>Our realization came regarding <em>shared</em> hardware (Heroku’s <code>Std-</code> series). We’d long considered a higher default queue time for those tiers since shared hardware <em>can</em> experience blips of queueing even though you don’t have a capacity issue (yet) — those are the moments your neighbor’s code is running instead of yours.</p>

<p>What we found is that shared hardware <em>can</em> actually operate on the same degree of stability and low queue time threshold as long as you’re carefully and deliberately staying within the bounds of your own “Dyno Load” or overall “shared slice” lane. We’ll write about this more in the coming months, but here’s <strong>the takeaway</strong>: keep an eye on, and be careful about, how much of <em>your</em> slice of the shared hardware you’re really using. On Heroku, this means being careful about that “Dyno Load” metric. If you consistently push for more Dyno Load than you’re actually allocated, you’re going to have a bad time.</p>

<h2 id="4-scaling-to-zero-is-a-super-power-for-workers">4) Scaling to zero is a super‑power (for workers)</h2>

<p>Event‑driven workers don’t need to idle. If there’s no work to do, you shouldn’t be paying for workers. Sure, keep at least one of your low-latency queue workers around all the time — we do too. But if you’re following our “<a href="/blog/planning-sidekiq-queues">Opinionated Guide to Planning Your Sidekiq Queues</a>”, you should allow both your <code>less_than_five_minutes</code> and especially your <code>less_than_five_hours</code> queues to scale to zero. Cold-start times once jobs actually hit those queues are around 30-45 seconds (YMMV) so you’ll be well-within your queue time SLA&hellip; while saving free money 💰.</p>

<p><strong>The takeaway</strong>: scaling background job workers to zero when there’s no work in the queue is free cash in your pocket. Set up your <a href="/">Judoscale</a> schedule and scaling range to allow for zero-scale and enjoy the free money you save 😎. Then, while you kick back, give our <a href="/blog/planning-sidekiq-queues">Sidekiq Queues Guide</a> a read!</p>

<p><figure>
  <img alt="Background job dynos scaling up as more background jobs are kicked off" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/5607cb01-7bb3-4940-08f0-dccb95fedc00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/5607cb01-7bb3-4940-08f0-dccb95fedc00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/5607cb01-7bb3-4940-08f0-dccb95fedc00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<h2 id="5-y-all-have-too-many-job-queues">5) Y’all have too many job queues</h2>

<p>We’ve seen it all: the “one queue per job class” approach, the “every feature gets its own queue” approach, and, of course, the “we’re just using this queue one time and we’ll clean it up right after” approach. All roads lead to more queues. The thing is, lots of queues create real issues:</p>

<ul>
<li>More queues means more queues to watch, and that means lots of polling and overhead from your job system. Sidekiq’s own <a href="https://github.com/sidekiq/sidekiq/wiki/Advanced-Options" target="_blank" rel="noopener">docs</a> stress that they “don&rsquo;t recommend having more than a handful of queues per Sidekiq process” and</li>
</ul>
<blockquote><p>Lots of queues makes for a more complex system and Sidekiq Pro cannot reliably handle multiple queues without polling&hellip; slamming Redis.</p>
</blockquote>
<ul>
<li><p>Long term maintenance of jobs, orchestration, and priorities is <em>terrible</em> when you have that many queues to think about. It’s more than you can comfortably hold in your head.</p></li>
<li><p>Setting up autoscaling policies across 10, 15, or even 20+ queues is a <em>massive</em> headache. Trying to keep them all reasonably in-sync is worse. It’s not worth it.</p></li>
</ul>

<p>You end up here:</p>

<p><figure>
  <img alt="A visual depicting lots of queues leading to worker processes that are on fire" loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/29aed8ef-fe26-4a37-1396-38f11cc72e00/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/29aed8ef-fe26-4a37-1396-38f11cc72e00/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://imagedelivery.net/g5ziwLsypgTqGag7aYHX0w/29aed8ef-fe26-4a37-1396-38f11cc72e00/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<p>Trust us, you don’t want to end up there.</p>

<p>Simply put, if you’ve got more than about five queues, you’re probably headed in the wrong direction. We urge and recommend only having three! And we recommend naming them based on an expected queue-time SLA, then setting your autoscaling up for each queue to reflect that. It’s a beautiful, job-agnostic way of handling queues!</p>

<p><strong>The takeaway</strong>: read <a href="/blog/planning-sidekiq-queues">this guide</a> and audit your background job queues. If you’ve got more than five, determine why and what should be merged. KISS!</p>

<h2 id="6-on-the-web-side-downscale-by-one-almost-always">6) On the web side, downscale by one (almost always)</h2>

<p>Several of our larger-app customers have run into interesting headaches and unforeseen issues by downscaling by more than one at-a-time. Judoscale’s highly configurable nature <em>does</em> allow you to do this, but we’ve learned over time that you <em>probably shouldn’t</em>.</p>

<p>The reasoning is simple: cost vs. benefit. The benefit, in short, is that you’ll shave a little off your bill. If you’re going to downscale off some dyno load anyway, doing it in bigger steps means slightly larger savings accumulated at the end of the month! ‘Slightly’ is the key word there. The cost, however, can be less pleasant. You can end up downscaling too far! At that point your users could experience slowness, you could spark alerts, and you’ll inevitably upscale again soon. There’s just no need for all the thrashing.</p>

<p><strong>The takeaway</strong>: downscaling by more than one at-a-time, for web services, isn’t worth the marginal gains. The risk of downscaling too far is real! Stick to downscaling by just one-at-a-time.</p>

<p><strong>Our next step</strong>: we actually feel pretty compelled to remove this option altogether in the future. It’s exceedingly rare that downscaling by more than one service at a time is the <em>right</em> move. TBD, but this lever may disappear!</p>

<h2 id="7-understand-intra-dyno-concurrency">7) Understand intra-dyno concurrency</h2>

<p>The short version of this story is based on understanding the interplay between request <em>routing</em> and how requests are handled within a single service. Many PaaS’s use simple random-based request routing: a new request can go to any active service/instance in the cluster. It’s not intelligent or load-based. So a single service could receive multiple requests in a row, even while processing its prior, while another service may get none!</p>

<p>It’s important, then, that each single service is able to handle multiple requests concurrently. Otherwise those randomly-routed new requests will be queued and you’ll have a consistently higher, but sporadic, average queue time for your app. 😬</p>

<p>In each of the runtime languages we support autoscaling for (Ruby, Python, and Node.js), the web frameworks <em>cannot</em> process requests in true-parallel (concurrently) per process — only asynchronously in alternating execution. That’s a mouthful, but we recently wrote a post that walks through that idea with great diagrams; give it a read <a href="/blog/puma-default-threads-changed">here</a>! The key is that a single <em>process</em> isn’t capable of true concurrency, but running multiple processes within your service <em>does</em> yield intra-service concurrency (the ability for a single service to handle more than one request in true parallel).</p>

<p><strong>The takeaway</strong>: wherever possible, run more than one process per web service instance (“dyno”, “service”, etc.). This is a particular challenge on <code>Std-1x</code>-style, single-CPU-core, service tiers. But, if all other variables are held constant, it’s better to run a single service with two processes than two services that each have a single process!</p>

<h2 id="8-scheduling-boring-obvious-and-wildly-effective">8) Scheduling: boring, obvious, and wildly effective</h2>

<p>Scheduled scaling is the easiest money‑saver with the least risk. If your traffic has a weekly rhythm, tell Judoscale <a href="/blog/autoscale-tuning-part-2-scheduling">about it</a>. Keep your minimum scale higher on weekday business hours, then drop it during nights and weekends (for example). You can even leverage a tighter schedule to pre-scale up before large releases, big events, and other known spikes!</p>

<p>You can even get the best of both worlds by running a schedule <em>and</em> autoscaling together. Instead of scheduling hard-locked service counts, you can schedule the <em>range</em> of scale that you want to tweak at a given time. That gives you the flexibility to respond to unknown load changes while still controlling baselines in accordance with known load changes! 🎉</p>

<p>Of course, if you choose to <em>not</em> build a schedule, autoscaling itself will ensure that your capacity grows and shrinks to meet need. But depending on how sharp your traffic changes can be at known times, you might either end up with: 1) a few minutes of slow service as autoscaling ramps up your capacity amidst a large traffic burst, or 2) wasted dollars as your baseline (minimum) service/dyno count stays higher than it needs to be on off hours!</p>

<p><strong>The takeaway</strong>: take a couple of minutes to think about your app’s weekly traffic patterns by day (even by hours!). Also consider any weekly rhythms your app has in terms of events, releases, or other in-domain things which drive influxes of users! Then bake all of those into a dynamic autoscaling schedule with Judoscale ✨</p>

<p><figure>
  <img alt="A screenshot of the Judoscale configuration UI showing an example schedule operating on a live application, dynamically shifting the scale range depending on the time of day." loading="lazy" x-init="$el.removeAttribute(&#39;loading&#39;)" src="https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/8a4f6d7b-9147-4956-244d-ddda23cf8500/quality=1,blur=20" x-data="{&quot;src&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/8a4f6d7b-9147-4956-244d-ddda23cf8500/quality=1,blur=20&quot;,&quot;fullResSrc&quot;:&quot;https://judoscale.com/cdn-cgi/imagedelivery/g5ziwLsypgTqGag7aYHX0w/8a4f6d7b-9147-4956-244d-ddda23cf8500/public&quot;}" :src="src" x-intersect="src = fullResSrc">
  
</figure>
</p>

<h2 id="wrapping-up">Wrapping up</h2>

<p>If there’s a single throughline in all of this, it’s that autoscaling is a feedback loop living in a noisy world. The mechanics aren’t mystical: measure what users feel (queue time), add capacity when you’re near the edge, and give your system enough time to absorb changes before you decide again. Do that consistently and the chaos of shared hardware, random routing, and spiky traffic stops feeling like chaos.</p>

<p>Choose boring on purpose. Keep your queues few and meaningful. Let workers scale to zero when there’s nothing to do. Downscale web by one so you don’t saw off the branch you’re sitting on. Run real in‑dyno concurrency so random routing doesn’t turn little bursts into instant queue time. And if a single dyno is having a uniquely bad day, assume a noisy neighbor before you assume a rewrite (and let <a href="/blog/noisy-neighbors-fixed">Dyno Sniper</a> handle the whack‑a‑mole so your team doesn’t have to).</p>

<p>That’s it. No silver bullets—just a handful of defaults that make your platform feel calmer and your costs feel saner. After nearly a decade of watching this stuff in the wild, the “secret” is that stability isn’t a hero move; it’s a series of small, boring, repeatable decisions. Make those decisions once, encode them in Judoscale, and let the loop hum 🔁</p>
]]>
      </content:encoded>
    </item>
  </channel>
</rss>
