Process Utilization: How We Actually Track That

Over the last few months we’ve published a couple of articles talking about our new “Utilization”-based autoscaling option. The first talked through the use-cases for this new option — when it’s useful and who it’s for (“Autoscaling: Proactive vs. Reactive”). The second was a bit more nitty-gritty, explaining the high-level concept for how we’re tracking this ‘utilization’ metric (“How Judoscale’s Utilization-Based Autoscaling Works”)…

This post is the nerdy sequel to the latter: the actual boots-on-the-ground / nuts-and-bolts of how we attempted to track process utilization, how that proved to be a bad setup, and the clever idea that lead us to a way better v2. This is the story of low-level measurement with sampling, thread safety, and lackluster results leading to new ideas 😅.

The job to be done

As per our second post in this saga, our definition of ‘utilization’ is based around an idle-state. Paraphrased, it’s essentially:

Measure the fraction of time a web-server process is handling at least one request, then aggregate that across all processes over time.

Two constraints forced us to think carefully:

Extremely low overhead. Judoscale is a performance tool; it’s an autoscaler that’s intended to help your application soar. It is not something whose client code should impact your application! The Judoscale package should have a perceivably invisible performance impact on the app running it. Full stop. No compromises.
Correct values in a multi-threaded world. While Ruby, Python, and Node can operate in an asynchronous fashion, and that asynchronosity can be valuable for serving many web requests at once, we need to be very careful in collecting values. It’s easy to accidentally collect thread-level metrics which then overlap and become very confusing. We need to be careful to stay up at the process level.

So… now we need to actually write some code: how do you actually capture the idyllic “idle time” of a process in a real application receiving real traffic?

Attempt 1: Background Sampling

Our first proof-of-concept was built around running a mostly dormant background thread. It would essentially wake up every few hundred milliseconds, ask “is this process handling any requests right now?”, record that yes-or-no, then go back to sleep. Voilá: utilization!

It was easy to ship, but it had issues. Notably…

Aliasing difficulties. Bursty traffic and short requests can fall between samples. Imagine a process that handles a flurry of 30–50 ms requests. With a 250 ms sample rate, many bursts are invisible; you under‑count busyness simply because you looked away at the wrong moments. Whoops!

Jitter vs. overhead trade‑off. If we increased the sampling rate to reduce aliasing, we immediately hike CPU wakeups, heap churn, and lock contention (on every process, 24/7!) even when your app is idle. Oof ☹️

Low signal‑to‑noise. Inherently, sampling produces a staircase approximation of a curve. Real utilization is a smooth “busy/idle timeline.” Our samples were a blurry thumbnail of a scene that actually mattered.

I personally tend to visualize this, oddly enough, as a mathematical curve on a chart (oh how my high-school math teacher would be proud). Imagine we have some real curve of data, perhaps like this:

Example chart with a curve going up and down in various sections, labeled “ACTUAL Data Over time”

Okay, great. Now let’s pretend we don’t actually know what that curve looks like and we’re taking a sampling-based approach to figuring it out. What we end up with is a bunch of samples. That might look like this:

Same example chart now shown without the original data curve and instead with a handful of sample-points (dots) that are spread out a bit; you can no longer see the nuance or details of the curvature as the samples are too far apart to have captured that curve in high detail

Which might be fine for some cases, but we’ve clearly lost several details from the original curve — the fast spikes and drops, in particular. Thus the issue of sampling rates is seen: sample too slowly relative to how fast your data actually changes and you won’t capture a high-detail image. Sample too quickly…

Same example chart now shown without the original data curve and instead with a ton of sample-points (dots) that are tightly packed and follow every detail of the original curve

You end up with a great representation of the curve, but you took up way too much horsepower constantly waking up and reading those samples. It’s hard for an app to actually serve its requests when the thread scheduler is constantly switching back to a background thread asking “HEY ARE YOU SERVING A REQUEST?!” (“I’M FREAKING TRYING TO, THANK YOU VERY MUCH!!!”).

When we’re talking about requests that might take 5ms, 50ms, or 150ms to fully handle and deliver, a sample rate of 250+ms just doesn’t capture the details. And a faster sample rate feels heavy-handed. This wasn’t going to work…

Attempt #2: Event edges + a tiny counter

Okay, to be fair, the line curve I gave above was a little disingenuous to the actual type of data we’re trying to track. Utilization, as we’ve defined it, isn’t a curve with smooth radii and roller-coaster-esque waves. As we’ve defined it, instantaneous utilization is either a zero or a one. A process is either busy, or it is not. If we were to plot that on a chart, it would actually look more like this:

Example chart where the line observed is not a curve but a straight line which jumps between 0 and 1 on the Y axis with straight, vertical jumps; more like a state-representation over time line

That is, a square wave representing a binary signal. Unfortunately, a square wave signal can actually make sampling results even worse. Check out how wrong an ill-timed sampling pattern can get:

Example chart similar to the above, a square wave line, but with sampling dots only landing on where the signal is in the ‘1’ / ‘on’ position, leaving the impression that the line is always 1 — I left the green line slightly opaque for reference

If you believed your sample data in that case, you’d think the signal is almost always “on”, but that’s not true.

👀 Note

Fun math fact: the fewer possible points on a Y-axis there are, the worse the infrequent-sampling-effect (observing statistically incorrect data because you’re sampling too infrequently) can become. When your Y-axis range is just 0-1 you actually need to sample far more frequently to capture the binary signal with any real integrity. It’s much harder than a flowing curve!

If you’re curious for more of the math here, read up on Bernoulli distributions and binomial variance 🤓

Anyway, the novel idea ended up being beautifully boring: don’t poll at all, just record state transitions cleverly. If we simply track the timestamps of when a process leaves and returns to idle, we can realize the real, true value of “how much time was it non-idle”? That looks like this:

Example chart showing the same square wave line now with arrows pointing to where the wave goes high or low, indicating “leaving idle” and “returning to idle”, respectively, and blue shading underneath the “busy” portions of the line: the boxes created when the line shifts up to ‘high’ state then back down to ‘low’ state

And once we have the blue blocks, we can simply add them all together for a given timespan, then say active_time = (blue_block_total) / total_time. Sum the rectangles! Boom!

The Benefits of Edge-Tracking

Tracking the state-changes (we’ll call them “edges” for math’s sake) has some really fantastic benefits over polling.

Computational cost: instead of constantly waking up a thread to check in on current requests (which requires stack shifting, single-threaded locking switches, etc.), we instead can simply read and/or write against a process-global timestamp register when any request starts or ends.
Correctness: instead of hoping a reasonable sample rate provides a decent guess at the actual curve being modeled, we instead know the exact amount of time that a given process is non-idle! There’s no guess.
Reliable for all traffic shapes: Sudden request waves, thin bursts, long I/O waits — they all work. If a worker is non‑idle, it gets counted correctly and appropriately.

Once we realized this route, we quickly understood that it was all upside. There’s no catch here! A purely better approach born of a realization that we’re tracking binary signals, not actual curves.

Let’s See Some Code

✅ Tip

Just a note before we dive into the code: we developed our utilization-based tracking and scaling in Ruby first, so these examples are going to be in Ruby. But since this new approach is agnostic to any language specifics, we have the same implementations for Node and Python 🎉 it’s all the same when you’re just tracking edges!

The great news with this new approach is that it’s so simple I can share the real code that implements it here in a blog post. This code is taken straight from the judoscale-ruby Github repository, which houses all of the Ruby packages Judoscale publishes.

👀 Note

One caveat in this code: while my diagram and example above focused on showing that we track “busy time”, our actual implementation is inverted: we track “idle time” rather than “busy time”.

Tracking “busy time” is slightly easier to grok (and build diagrams for!), but in reality our code does this:

Example chart showing the same square wave line now with no shading “under the boxes” as above, but instead with arrows pointing to the segments of the line that are in the ‘low’ state, highlighted as “Idle Time”

It’s the inverse, so the math still all checks out, but understanding both “busy time” and “idle time” are useful for us! We just went with idle-side tracking for our code because it ended up slightly simpler. Check it out!

First, we have a Judoscale::UtilizationTracker class. It has a few methods and helpers in it, but the important parts start with the incr method (short for “increment”):

module Judoscale
  class UtilizationTracker
    # ...
    def incr
      @mutex.synchronize do
        if @active_request_counter == 0 && @idle_started_at
          # We were idle and now we're not - add to total idle time
          @total_idle_time += get_current_time - @idle_started_at
          @idle_started_at = nil
        end

        @active_request_counter += 1
      end
    end
    # ...
  end
end

First, keep in mind that this method is going to run every time a request comes in (starts). So, since we’re going to be incrementing a request counter and idle-time timer across multiple threads, we do need to use a simple Mutex (@mutex is simply a Mutex.new from the Ruby standard library). Once we’re certain that we can safely update our process-level variables, we need to do two things: mark that our “idle time” has ended, and increment our active-requests counter.

Pretty straightforward, there! Since this block may run as a multi-threaded application server picks up a request on thread #2 or #3, we’re careful to only end our “idle” timer if there aren’t already any requests being processed (if @active_request_counter == 0).

On the flip side, we have a decr method that runs every time a request finishes (ends):

module Judoscale
  class UtilizationTracker
    # ...
    def decr
      @mutex.synchronize do
        @active_request_counter -= 1

        if @active_request_counter == 0
          # We're now idle - start tracking idle time
          @idle_started_at = get_current_time
        end
      end
    end
    # ...
  end
end

This one’s even simpler: decrement the count of active requests by one and, if that was the last request in flight, mark that our “idle time” has begun — the process is now idle!

The end result of these two functions working together is an accurate value stored into @total_idle_time which, in real time, tells us the number of milliseconds the process was idle.

The last piece of the puzzle, then, is to report that ratio and reset that variable/register! We do that in one last method on Judoscale::UtilizationTracker:

module Judoscale
  class UtilizationTracker
    # ...
    def get_idle_ratio
      @mutex.synchronize do
        total_report_cycle_time = current_time - @report_cycle_started_at

        # Capture remaining idle time
        if @idle_started_at
          @total_idle_time += current_time - @idle_started_at
          @idle_started_at = current_time
        end

        idle_ratio = @total_idle_time / total_report_cycle_time
        @total_idle_time = 0.0
        idle_ratio
      end
    end
    # ...
  end
end

Some background here: Judoscale packages report back to Judoscale servers every 10 seconds (using a zero-performance-impact background POST) with a handful of capacity metrics about the application. In this case, @report_cycle_started_at represents the timestamp at the start of that 10-second bucket. Since we’re trying to figure out the idle ratio, we need to divide the idle time over the total time. “The beginning of the bucket until now” is that “total time”.

Once we have that, we have a special case for when this code runs while the process is actively idle as to prevent over-counting or under-counting idle time. Since our “report cycle” observation window might start/end during an idle period, we need to handle that carefully. Visually, that’d look like this:

Example chart showing the same square wave line but now with two large rectangles over the whole line; both rectangles sharing an edge, showing that the first 10-second bucket “observation window” and the second, which share the same border in time, can leave an edge during an idle phase.

Finally, we compute the idle ratio (a decimal, like 0.88 or 0.37), reset the @total_idle_time back to 0.0, and yield that idle ratio as the result. ✨

The last piece of code I’ll highlight is a layer up — the request middleware itself. This class, Judoscale::RequestMiddleware, is essentially what wraps every Rack request before and after it’s handed down to the Rack application itself. I’m chopping out a lot here, but the bits pertinent to our discussion remain:

module Judoscale
  class RequestMiddleware
    # ...
    def call(env)
      # ...
      tracker = UtilizationTracker.instance # Singleton
      tracker.incr

      # ... lots of other code

    ensure
      tracker.decr
    end
    # ...
  end
end

Essentially we’ve created a two-part contract:

Every time a request starts, we guarantee we’re going to call #incr on the Process-level singleton instance of UtilizationTracker
Every time a request ends, regardless of how or why it ends, we guarantee we’re going to call #decr on that same singleton instance (thanks, ensure!)

This is the glue that ensures our data inside of UtilizationTracker is consistent and accurate over the lifespan of the process. Isn’t it great?!

Aggregate It Together

Zooming out a little bit, we’ll conclude the deep-dive with a sense of how the aggregation works beyond a single process. Let’s say that you’ve got 2 production web services/dynos/containers/etc. running, and each runs 4 web processes. Since each process POST’s back its own metrics every 10 seconds, that means our back-end is going to get 8 data-points about your application’s overall web-process idleness/busyness. Maybe for a given 10-second bucket Process #1 on server #1 showed an idle ratio of 0.66 (that is, it was idle for two-thirds of that 10-second window), while process #4 on server #2 read a ratio of 0.22 (meaning it was handling at least one request almost the whole bucket).

Once we have all of the data points, the aggregate is actually simple: we average them together. For example, then, if we received these data points:

Server	Process	Idle Ratio
1	1	0.56
1	2	0.77
1	3	0.48
1	4	0.39
2	1	0.81
2	2	0.44
2	3	0.52
2	4	0.62

For that bucket, our average idle ratio would be:

(0.56 + 0.77 + 0.48 + 0.39 + 0.81 + 0.44 + 0.52 + 0.62)/8

Which is 0.57. So then, that application was idle 57% of the time (for that bucket) and, inversely, busy 43% of the time. Thus, that’d be a 43% utilization metric for that bucket, as we’ve defined it. Gathered, collected, and aggregated simply.

Wrapping It Up

If there’s a theme to this little blog-post saga, it’s that the simplest model that matches reality tends to win. We started by trying to guess at busyness with background sampling, only to discover all the usual traps: aliasing, jitter, and overhead. Then we reframed the problem to match the truth on the ground: a process is either idle or it isn’t. Record the edges. Sum the rectangles. Report the ratio. Done.

That shift gave us three things you actually feel in production: lower overhead, correctness across weird traffic shapes (long I/O, tiny bursts, mixed workloads), and numbers you can trust enough to automate against. When an autoscaler acts on a metric, the worst feeling in the world is, “ehh, it’s probably fine.” Edge-tracking turns “probably” into confidence.

And the aggregation story is intentionally boring, too. Each process tells us how idle it was in the last 10 seconds; we average those into an application-level picture. No fancy weighting, no black-box magic. If your fleet spends 57% of a bucket idle, that’s 43% utilized. That’s a number you can reason about, chart, alert on, and scale from.

So if you’ve been skeptical of utilization-based autoscaling because it felt hand-wavey or weird, we hope this demystifies it. The implementation is small on purpose, tested in the sharp edges of real apps (including our own!), and designed to vanish into the background until you need it. Watch your utilization settle into patterns you recognize, set the thresholds that reflect your own tolerance for headroom vs. cost, then enable utilization autoscaling.

In other words: measure what matters, measure it honestly, and keep the math simple enough that you’ll actually use it.