Scaling Sideways: Why You Might Want To Run Two Production Apps

Jon Sully headshot

Jon Sully

@jon-sully

We’re really trying to optimize for our public website’s performance for SEO reasons…

…was the core theme of our meetings with one of our customers a few weeks ago. They run a Rails application with several different ‘sectors’ — a public website, two different user portals, and an admin ‘backend’ with several internal tools. It’s not an extremely complex application, but it is diverse in its traffic. After chatting with them for a few hours, we had a great solution ready for them — one we use ourselves but feel isn’t talked about enough! Running a second prod app.

A simple diagram showing two boxes and an arrow between them, the first being “prod”, the second being labeled “Also prod?” And the title “Scaling Sideways” above both

👀 Note

Did you know that we love meeting and chatting performance, strategies, and scaling? Whether you’re a Judoscale customer or not, we’d love to hop on a call, screen-share, or whatever, and chat it out — just set up a call with us! Totally free.

We’re going to dive into that story and our clever suggestions for scaling sideways, but before we do, let’s clarify some terms so this doesn’t all become terribly confusing! We’ll use “main app” to describe the existing, single production application instance. We’ll then use “second app” to describe the new, separate clone of the main app — an instance still running all of the production app code (with all the same environment configs, etc.) but which is separate (more on that in a moment). Alright, let’s dive in!

What We’re Solving For Here

This particular customer has a very SEO-driven business. That means that their public website, which is served by their core Rails application, needs to be excellent: fast, steady, predictable, burst-ready. But the app houses several other sectors which are older, slower, and less performance-friendly — we all have ’em!

A diagram of the customer app showing each ‘sector’ as its own box with emojis representing the sort of desired speed of each sector; freight truck for “internal tools”, typical consumer cars for “user portal”, and a race car for “public website”.
We see you, Google!

Unfortunately, in a multi-threaded world (hello, Puma), those slower endpoints don’t just take longer for the people who hit them; they raise the waterline for everyone by occupying threads that subsequent would-be-faster requests must wait on. The result is a p50 (average) request time that looks pretty reasonable… but a p95 that’s much worse and erratic. Oh, and a support channel that pings for performance issues when there seemingly aren’t any.

From a telemetry and metrics standpoint, we’ve seen this issue plenty of times: CPU saturation is nonexistent and database resources look boring, but request queue time (the metric that matters) spikes randomly and p95s are all over. In the case of our customer, it’s not that their public website got slower, per se; it’s that the requests for those public site pages had to wait. Thus we’ve met an old truth: multi-threading increases throughput but amplifies latency (something we dissected in “Why Did Rails’ Puma Config Change?!”). Boil it way down and it’s hosting costs vs. p95s.

A screenshot of a chart showing spiky, erratic p95 response times while the average is much lower
Spiky p95’s and a WAY lower p50/average

But the reality for this customer is that they needed to tame and stabilize their p95 response times for their public website. Appeasing the finicky beast that is Google Search Ranking is a broadly unknown game, but stable performance does seem to be a factor.

The good news here is that we’ve got a creative solution. We call it “scaling sideways” — slightly different than ‘horizontal scaling’, yet still horizontal in concept: running a second, but subdomain-separated, instance of your production application.

Scaling Sideways

Let’s expand on the specifics of this strategy, since “scaling” can be a bit of an overloaded term. What we’re describing here isn’t “scaling” in the sense we’re likely all used to these days: changing the number of webserver or worker instances your production application is running at any given time (the core premise of Judoscale itself). Instead we’re talking about “scaling” in a much slower and more methodical approach: running a second production application, which is essentially a clone of the main app, on a separate subdomain with separate infrastructure. It’s still the same code-base, same deployment branch, and really should have all of the same environment and configuration variables… just a different place to request the same data and/or pages.

A somewhat complex diagram showing two application servers, both powered by the same underlying dependency services (e.g. databases and file service providers), both deployed from the same code repo and branch, but on different subdomains

The key to this strategy is offloading traffic to slower and less consistent endpoints to the second app (via its subdomain) so that your main app can handle its own traffic more consistently and quickly. The main app becomes the home for predictable, latency-sensitive endpoints; the second app absorbs the messier stuff without letting it bleed into the public experience.

Luckily, we don’t need a microservices migration plan to do this. We’re not decomposing the domain model; we’re just decomposing our runtime. One deliberate split is enough: the fast path (main) and the heavy/volatile path (second). The payoff is that your main app’s thread pool stops babysitting slow requests and blocking higher-priority endpoints. Queue time stabilizes. Tails compress. (…Engineers stop arguing about whether going single-threaded everywhere is “worth it.”)

When Is It The Right Move?

We should recognize first that this strategy isn’t perfect for every application. It shines when at least one of a few conditions are true:

A visual depiction of the four cases given below
Really channeling my inner XKCD here…

Your traffic has distinct “shapes.” If one slice of your app is bursty, slow, or just unpredictable (admin pages, CSV exports, report builders, portals, ‘real time’ (polling) dashboards), while another slice must feel instant and boring (marketing site, signup flow, product pages), you’re a great candidate. Sideways scaling lets you build a fast-lane for the steady stuff and a truck/carpool-lane (or two) for everything else.

You have different SLAs for different routes. Some requests just matter more. If a public route missing its p95 target is business-critical (SEO, ad landing pages, checkout, conversions), prioritize it on the main app and give it a calmer thread pool. If an authenticated portal can tolerate higher p95s without harming KPAs or other business targets, move it to the second app.

You can influence where traffic goes. This sounds obvious, but you need a lever. Many teams already have it: front-end fetch() calls, Turbo Frames/Streams, HTMX targets, or API clients you control. If you can change hostnames in those calls, you can steer traffic to the second app with minimal risk and no user-visible disruption. Especially if these calls are transparent to a browser’s address-bar.

SEO is part of the story. If Google’s crawlers matter a great deal to your business, you might consider splitting your public site from your other application chunks. Instead of the classic “let’s just rewrite the marketing site to static”, you get a lot of the benefits of a dedicated marketing site system (the main app) while retaining all of the comforts of a unified code base and singular mental/domain model.

Judoscale Does It, Too!

As it turns out, Judoscale itself satisfies three of those bolded conditions above. The Judoscale architecture is built around customers installing the Judoscale package, which is essentially just a light-weight monitor for request and job queues within the app. Those metrics ultimately get POST’ed back to Judoscale servers for processing and aggregation. Nice! But those POST’s happen every ten seconds for every process over thousands of applications. We have a ton of API traffic. As in, 3000-3500 requests per second 24/7.

Then, of course, there’s the Judoscale dashboard and user UI where you can see your metric charts, tune your scaling configuration, and do standard SaaS things. While those charts do have automatic 10-second update polling built-in, the traffic for that entire sector of the app trends much closer to about 50 RPS.

So… we (1) definitely have different ‘shapes’ — our API traffic is tiny payload and ultra-fast response whereas our dashboard traffic is small-to-medium payload and variable response. Additionally, we (2) definitely have different SLA’s for these two shapes. Our API needs to be available, but response times can fluctuate (there’s no human waiting)… whereas our dashboard needs to be as fast as possible since it’s customer-facing. Finally, we (3) can control where the majority of our traffic goes by tweaking the client packages to POST somewhere else (and/or some smart routing with Cloudflare).

A diagram showing a high level split of Judoscale’s two applications; the second app handling the massive volume of API traffic

We’ll get to the implementation specifics below, but hopefully this gives you an idea of the versatility of scaling sideways: applications completely non-SEO focused can still benefit greatly from segmenting traffic in this style.

How You Actually Do It

Spin up a clone of your main prod app. Same repo, same deploy pipeline, same environment variables (with a couple exceptions we’ll note). Point it at a sibling subdomain — ww2.example.com, api2.example.com, or simply 2.example.com all work. The goal is sameness: both apps should boot the same code and talk to the same primary dependencies (database, cache, storage, queue, file storage [S3 et. al]). Differences should be intentional and minimal: web process counts, thread counts, and possibly instance sizes.

From there:

  1. DNS & routing. Create the new subdomain and point it to the second app’s router/load balancer/DNS target.
  2. Environment parity. Duplicate secrets and env vars (including SECRET_KEY_BASE/equivalents so session cookies work across hosts if necessary — more on this below). Consider different Puma thread counts between apps (more on this below too!).
  3. Traffic split. Start by moving non-navigational traffic: API calls from your front-end, background polling, Turbo Frames/Streams targets. These won’t change the URL in the address bar, so the move is low-risk.
  4. Progressively offload. Next, migrate heavier, authenticated pages and long-running endpoints to the second app. Be deliberate around what addresses users might see in their browser’s address bar!
  5. SEO guardrails. Add canonicals on anything public your second app might serve, ensure robots blocking is in place for that host, and keep sitemaps + social meta rooted on the main app.
  6. Observability. Watch queue time and p95s on both apps. You should see the main app flatten out quickly.

Most importantly, treat this like a runtime composition change, not an architecture rewrite. You can ship it safely in small patches and keep rolling forward.

A somewhat complex diagram showing two application servers, both powered by the same underlying dependency services (e.g. databases and file service providers), both deployed from the same code repo and branch, but on different subdomains

What Actually Moves

A practical rule of thumb:

  • Stays on the main app: canonical public pages, sitemaps/robots, OpenGraph/Twitter cards, landing pages, docs/blog, marketing flows, and any route that shapes your public narrative or crawlability.
  • Moves to the second app: authenticated portals, JSON APIs, front-end-driven fragments (Turbo/HTMX/Stimulus/etc.), polling endpoints, file uploads/exports, batchy or I/O-heavy controllers, and admin tooling.

For navigations, you have options but need to be intentional. Keep in mind that browser address bars remain highly useful for users copying or pasting URL’s in/out and potentially sharing those URLs with others. For intra-portal / authenticated endpoints it may not matter than a user sends a colleague https://2.example.com/portal/book/5 (especially if the colleague would’ve ended up forced over to the second app to log in to the portal anyway!).

But for resources and endpoints where the goal is speed and public accessibility, we’ll want to keep those endpoints pointing against the main app.

The good news is that we can be clever. For instance, if an endpoint is slow and synchronous (not recommended but we get it, it happens) yet must result in a public URL, we can still POST to the second app and do the work synchronously in that controller. We just need to make sure the response from the second app redirects back to the first. And since they share the same database, you can fluidly (for example) do an expensive create operation in the second app then immediately redirect to the now-existing record on the main app with confidence. There’s no delay in data propagation between the two applications!

In the case of our customer, this meant offloading most of their user portals and internal admin tools to the second app. Their public marketing site stayed put and immediately got calmer metrics. Problem solved!

A digram showing our customer’s ultimate break-out of their traffic across two apps

Judoscale’s Setup

We mentioned earlier that Judoscale also runs a dual-prod-app setup, but we arrived at our split for different reasons — and with a different emphasis. We’re sharing that to underscore there isn’t one “right” pattern. For us, it was more about cost and UX than isolating slow paths… most of our endpoints are already fast!

Rather than sending volatile endpoints to a second app, we split by human interface. Our main app (app.judoscale.com) is the customer dashboard, so we tune it for UX: snappy, steady, predictable. Our API app (api.judoscale.com) serves the bulk of our traffic, but it’s non-human-facing and can tolerate small, occasional latency blips. The machines don’t mind! But people do. It’s not the fast-vs-volatile split we describe above (which is still the right path for this customer), but it delivers similar benefits: each runtime is optimized for what matters most to it.

Practically, this lets us fine-tune the API runtime for throughput and cost (concurrency, process counts, aggressive autoscaling) while keeping the main app conservative for a consistently great feel. The net effect: a calmer UX and lower hosting spend (more on cost below..). For many, the canonical split paradigm might be “fast vs. volatile” but for us it was “UX vs Cost”. It’s a different motive but the same playbook: split out a second prod app.

A Caveat on Cookies, Auth, and Subdomains

If you’re going to use a second app for a disparate, separate API or fully segmented authentication mechanism (like Judoscale did), feel free to skip this section. If instead you’ll be cleverly (and carefully) shuttling users between the two apps, we need to discuss shared authentication across subdomains.

The simplest way to accomplish this is to setup both applications with the exact same secret key base (or equivalent) so that cookie and session cryptographic signing validates to/from both. That is, if you log in on the main app, a subsequent request to the second app will see that you’re logged in. This strategy upholds the “keep both apps the exact same” principle by keeping sessions transparent between them. Both apps will read and write to the same session/cookie.

Once both applications are running the same keys, you’ll need to ensure that the actual cookie policies are setup correctly for both apps. Essentially we need to make sure that both apps are emitting cookies with the same sharing configurations setup so that browsers will send the same cookie to both apps. In Rails that might look something like this (for session storage via cookie):

Rails.application.config.session_store(
  :cookie_store,
  key: "_my_app_shared_session_key",
  domain: ".example.com",      # explicit eTLD+1; covers example.com + subdomains
  expire_after: 1.year,
  secure: true,                # if this fails in specs/tests, switch to `!Rails.env.test?`
  same_site: :lax,             # mitigates CSRF while allowing subdomains
  httponly: true
)

But, as with all things security-related, make sure you understand every config component here and are confident in your security strategy amidst sharing cookies between the two apps. YMMV.

Magic, P95’s, and Threads

It’s worth taking a little detour here to assess the magic of what we’re presenting: it isn’t. There’s no real magic at play here — this is just simple queueing theory with friendlier furniture. We’ve talked about queueing theory broadly in “Queue Time: The Metric that Matters” but the mechanism at play in scaling sideways isn’t radical. When slow requests leave the main thread pool, fast requests stop waiting behind them. That means lower overall variance in request speeds (e.g. lower p95’s) and an app that users will probably describe as feeling “snappier”.

👀 Note

Of course the slowness has to go somewhere… but we can be much more relaxed around the variance and volatility of our second app. When the slowness is going somewhere made to be slow, it feels much better.

In fact, we can use our “keep the fast app fast” and “keep the slow app slow” mindset in tweaking our thread counts in each app. For a main app we recommend three Puma threads. That’s Rails’ new standard and proves to be an excellent tradeoff: increased throughput with a reasonable, low tail-latency increase (especially after you move all of the slow requests to the second app!). That said, we recommend deliberately choosing a higher thread count on the second app. Maybe five, maybe six. Your mileage will vary on specifics, but when we design and spin up an application specifically to handle our slower (likely I/O-bound) requests, especially when we aren’t as worried about response times, we can really leverage the power of a large thread pool. This should allow us to keep our instance-count low — a single server instance running five or six threads should be able to handle quite a bit of stuff!

Autoscaling Two Applications

Finally, the last major topic to cover for scaling sideways is indeed autoscaling. First, you should use Judoscale (👋). Okay, obvious plug aside, there’s a little nuance here: you’re going to want both apps to autoscale. But they’re going to do so with different parameters and goals.

Main app: now that variance is down and your endpoints are consistently performant, we’ll want to clamp our queue-time thresholds a bit tighter. The target is a flat, boring queue-time line very near zero. In Judoscale, you should see low enough metrics that an upscale threshold between 5-10ms feels very stable and scales nicely with your actual traffic curves (not erratically)!

✅ Tip

If your app has burstable traffic loads at known times, you should still define a schedule for your autoscaling. If it has burstable traffic loads at unknown times, consider autoscaling to guarantee a certain level of headroom.

Second app: still scale on queue time but expect volatility and small spikes that self-resolve. We’d recommend a moderately high upscale threshold like 80ms as well as reducing upscale sensitivity to 20 seconds so brief jitters don’t cause thrashing (AKA ping-pong scaling, which we discussed here). We want to upscale when necessary, but wait a moment to be sure that upscaling is, in fact, necessary.

So, all of that to say, queue time is still absolutely the metric to watch for scaling on both applications. And Judoscale is still absolutely the tool to use. But refining our scaling parameters for each app in their own context is the real path to success here! We want tight bounds and strict expectations on the main app with looser, workload-aware settings on the second.

A Note on Cost

To address the potential elephant in the room: scaling sideways this way may cost a little more in your overall hosting bill. That’s true. But keep in mind that our first goal here was to optimize and speed up a sector of an application without refactoring the whole application. This is a “Can we throw money at the problem?” solution.

But there’s actually better news: it’s likely that this strategy won’t actually cost much more than your base hosting level now. Remember that the main app is likely going to run fewer instances the more surface area you move away from it. That’s savings. And the second app should make broader use of multi-threading, so it too may need fewer instances than you expect. That’s cheap!

At the end of the day, snappier user experiences and conversions tends to yield more sales, and more sales means you probably have a little more space in your hosting budget. We’re not advocating for going wild here — you should still autoscale both applications to keep things efficient — but this strategy is a reasonable cost-path forward for powerful performance gains.

Scale Sideways

A simple diagram showing two boxes and an arrow between them, the first being “prod”, the second being labeled “Also prod?” And the title “Scaling Sideways” above both

We started with a simple ask: “optimize the public site for SEO”, and a familiar constraint: one app serving very different kinds of traffic. That’s why we reached for the often-overlooked move of running a second production app. It squarely addressed this customer’s need: keep the public face fast and predictable while letting portals and internal tools be as spiky and complex as they need to be. We should know, we do the same thing (though not for SEO purposes)!

The path there doesn’t require a big‑bang migration. Stand up the second app, put guardrails in place, and move traffic in slices. Begin with front-end calls, shift over some API action, then gradually migrate entire user-portals when you’re confident in your URL sharing… all while feature-flagging shifts to build confidence.

What you get for that incremental effort is real performance gain with little added domain complexity or cost. The main app’s thread pool narrows to the fast paths, queue time flattens, and p95s stop lurching. The second app absorbs the messy variance without leaking it into the public experience. Same codebase, two runtimes, each excellent at a different job. If your intro sounds like our customer’s (“we’re optimizing public performance for SEO”), or ours (“we really ned to optimize our API for throughput and reliability”), this is the strategy that keeps the promise without rewriting the product or doubling your spend.