Small Teams Need PaaS-Ops, Not DevOps
![Jon Sully headshot](/assets/authors/jon-sully-981ff138.jpg)
Jon Sully
@jon-sullyPicture this with me: you’re running a small, scrappy startup. You’ve got a team of just four developers — each one an experienced full-stack engineer — just trying to get features out the door for customers. Your runway is limited, your roadmap is… ambitious, and every feature you ship is one hopeful step towards survival. Survival is important.
Now imagine that one of your devs starts grumbling, “We really need to sort out our DevOps….” They want to build an extensive CI/CD pipeline, automate infrastructure setup with Terraform, ensure everything is repeatably containerized with Docker images and registries, something about orchestration with Kubernetes, and also a bunch of “IAM” setup, whatever that is? Something-something “security roles”…
Your eyes glaze over. As you stare deeply down the pipe of endless YAML files and system tooling configuration, you remember your actual product backlog. What about the features that needed to ship yesterday? What about the bug reports that keep coming in? What about.. marketing!? Ugh.
This is where DevOps fails small teams.
Because, let’s be honest, when you’re a small team trying to ship a product, you can’t afford one or more of your team spending all of their time tuning Helm charts or tweaking AWS auto-scaling groups. And you certainly can’t afford to hire another engineer to manage your infrastructure headaches! There’s just not enough time and not enough budget.
No, you need developers that can push new features and bug fixes, keep the app running smoothly day-to-day, and avoid falling into giant rabbit-holes of systems infrastructure or maintenance. You need infrastructure that Just Works(™), without requiring a master’s degree in cloud architecture.
That’s where PaaS-Ops comes in.
PaaS-Ops
So, What’s PaaS-Ops?
Unlike DevOps, which somewhere-along-the-way turned into a job title and catch-all for all things infra, PaaS-Ops is simply an ethos — some guidelines for handling infrastructure needs as a small team. It’s a set of guide-rails.
PaaS-Ops is the idea of embracing Platform-as-a-Service (PaaS) providers — Heroku, Render, Fly.io, Railway, Vercel, etc — to handle your infrastructure for you. It’s about outsourcing the complexity of modern cloud architecture and focusing on the parts that matter: your product, your users, your business. Now, PaaS providers can’t do everything for you; there are decisions that need to be made by your team and knobs that do need to be turned occasionally. PaaS-Ops is all about knowing enough to make those decisions and turn those knobs confidently while avoiding the deep rabbit-holes and time-sinks of lower-level infra providers.
✅ Tip
For example, PaaS-Ops is choosing to quickly setup Github Actions to run a basic CI test flow in your repository rather than reaching for a DIY setup involving self-hosted runners, IAM policies, S3 buckets for artifact storage, custom VPCs for network isolation, EC2 instances for build servers, CloudWatch for logging, and the soul-draining experience of debugging a broken pipeline through a tangle of services that were supposed to make your life easier. Whoops!
In a small team the person running PaaS-Ops isn’t a DevOps specialist. And, ideally, it’s not just one person at all. Every “full-stack” developer, while not a DevOps specialist, should be PaaS-Ops fluent. That is, they shouldn’t need to care about the inner workings of Kubernetes, but they should be confident in knowing how to manage a PaaS dashboard to keep an app running smoothly.
Essentially, PaaS-Ops is all about skipping the “cloud plumbing” nightmare and instead getting to “it just works” as much as possible. That doesn’t mean everything is always perfectly smooth, but it means sticking as closely as possible to PaaS systems so your developers can focus on your product instead.
For small teams, PaaS-Ops isn’t a compromise. It’s the smarter path.
Anti-DevOps
Before diving into what PaaS-Ops really covers, we should take a quick aside here to clarify that PaaS-Ops is not anti-DevOps. We’ve framed the argument here as “one vs. the other” but that’s not exactly accurate or fair. The “DevOps” we’ve been describing is more of what we’d consider to be the current dev-culture “DevOps” definition: dedicated resources with titles including or around the term “DevOps”, which help to manage and/or build complex, cloud-based infrastructure… complex enough to require said dedicated resources.
But if we step back a few years, the original idea behind “DevOps” was simply that developers would be involved in the “ops” side of things. It was the contrasting response to having fully separate development teams and deployment teams that often didn’t communicate enough.
So while we could say that PaaS-Ops and DevOps are diametrically opposed, that’s actually not true. PaaS-Ops is actually a subset of DevOps, at least in the classical definition of DevOps. Modern “DevOps” has just gotten… bigger.
Thus, PaaS-Ops is just right-sized DevOps for the small team. It’s about saying:
- “No, we don’t have time to build our own CI/CD from scratch.”
- “No, we don’t need to learn AWS’s 200+ services.”
- “Yes, we’d rather push a button on Render and call it a day.”
These things all still involve your development team into your operations and deployment workflows, therefore it’s still DevOps. They just encourage a smaller team to choose workflows suited to the team size rather than diving into the hot-new-thing in the ops world.
Some examples:
Task | DevOps / AWS | PaaS-Ops |
---|---|---|
Style | DIY and customize everything | Batteries included |
Deployment | Build CI / CD Pipelines From Scratch | Push to Git |
Scaling | Configure ESBs / ALBs / provisioning | Add more dynos |
Monitoring | Build Grafana stack | Use built-in dashboards |
Secrets | Configure and integrate AWS Secrets Manager or SSM | Use ENV or Config vars |
Extendability | 200+ AWS Tools to integrate and configure | Add-on marketplace |
But let’s dive in further. What exactly does being “PaaS-Ops Fluent” mean? What all is under the “PaaS-Ops” umbrella? We’ll try to highlight some specifics.
What ‘PaaS-Ops Fluent’ Engineers Should Know (and What They Shouldn’t!)
Now that we’ve introduced and advocated for this “PaaS-Ops” thing, we should probably cover what we think falls under the PaaS-Ops umbrella. This isn’t doctrine — we certainly aren’t the point of authority for defining terms! But we do want to explore the common themes of operations, infrastructure management, and tooling to better illustrate the things a PaaS-Ops fluent developer should know. And “PaaS-Ops Fluent” just means someone that’s fully capable of managing a PaaS stack application’s infrastructure top-to-bottom — knowing what all the knobs do!
1. ⚙️ CI/CD Pipelines: Fast and Simple
In concept, a Continuous Integration / Continuous Deployment pipeline automates the process of integrating code changes, running tests, and deploying your application. When we push new code to our repositories, be it on a branch or main
/master
, our test suite should automatically run and validate our changes. Our changes should then be deployed where appropriate — likely to our production environment if the changes are on main
/master
, otherwise maybe a test environment if our changes are in a PR.
In a DevOps world we’d build this from scratch: setting up runners, managing artifacts, configuring network access, and tweaking YAML files endlessly. Let’s avoid that. Actually, let’s ask the most important question for a small team:
💭 Can We Skip This?
The answer here is… no, we really shouldn’t. Just as we shouldn’t ever skip writing tests, we shouldn’t skip running them either. We also shouldn’t skip automating the deployment process. Getting code out to production should be as friction-free as possible! Automation is the key to that.
But the good news is that there are several simple, off-the-shelf solutions for CI/CD.
The PaaS-Ops Approach
Even though they’re often joined together by a slash, CI and CD are two different things. For CI, our preferred solution is Github Actions. Since we already use Github for our source control, and many teams get a large swath of free Github Actions credits, this totally fits the PaaS-Ops ‘keep it simple’ ethos. Having your test results in the same place as your code is particularly nice when you encounter failures — often the failure can be directly linked to the code and you can start debugging without leaving your browser!
✅ Tip
Pro Tip: Leverage the smarts of ChatGPT or Claude to help build your Github Actions YAML file, or at least to help you search the web for tangible examples for your application stack. We’ve used several CI solutions over the years and the YAML configuration is always a pain in the butt. Modern LLM tools can make it quite a bit easier and should get you up and running in less than an hour! Nonetheless, this is still a much easier path forward than building your own CI from scratch on something like AWS.
On the other hand, we use Heroku for our hosting, which essentially covers our CD. Heroku Pipelines allow you to setup automatic CD for a repository which both deploys your main
/master
branch as the production application automatically but also gives you the ability to automatically spin up test environments / applications for PRs or branches. This feature is incredible.
Heroku Pipelines can take an hour or two to setup properly, but there’s great power in getting a hands-on, testable clone of your application up and running for each PR you open. Beyond that, it’s also a great feeling knowing that merging something to main
/master
will push it out to production — no extra buttons to click, no servers to go manually provision.
What to Ignore
In short, we want to stay simple, off-the-shelf, and hosted with our CI and CD workflows. Small teams should avoid “self-hosted runners” (or really “self-hosted” anything), custom artifact storage solutions (like pushing test screenshots to S3 buckets or etc.), and complex multi-step pipelines with every possible edge-case accounted for. CI should simply run your linter and your tests, then fail quickly if something doesn’t pass. We don’t need perfect edge-case coverage or extensive tooling, we just want our tests to run automatically!
What to Study
When it comes to CI/CD there are a couple of metrics that a PaaS-Ops fluent engineer should understand and watch: CI runtime and production build times. Keeping both of these fast is critical for tight feedback loops and happy devs.
Luckily we have a couple of knobs to turn in direct response to these metrics. First, a ‘knob’ of our own choosing: what order our CI runs in. This can help us fail fast — just run our linters and other quick-checks first! Don’t run the quick stuff after the giant test suite.
Another CI knob we have is parallelism. Integrating parallel-test running is a powerful knob to increase the speed at which the entire suite completes.
Additionally, the strongest knob you have available to you is the specific settings panel for your PaaS’s pipeline. You should become very familiar with how your pipeline works. For instance, you should know how to configure your pipeline to automatically build and deploy any new commits to main
/master
or to setup a manual-promotion workflow. You should know how to configure automatic PR apps. You should know the conditions that cause various environments in your pipeline to (re-) build.
Tool Tip
There are third-party tools out there that fit into the PaaS-Ops ethos and can make your CI experience even better without much extra time investment. Something like Knapsack Pro comes to mind. We’re a Rails shop, so this is right up our alley, but it’s a plug-and-play solution for parallelizing our test suite… with virtually zero setup. It learns our tests’ run-times and figures out the most efficient way to parallelize them on its own! That’s actually about all we can say about it… it works so well that we never needed to dig in deeper. We just enjoy that our test suite runs extremely quickly. Quick feedback loops are a hallmark of agile development!
2. 📈 Scaling: Dynos, Not Nodes
In concept, scaling is the process of ensuring that your application can handle more traffic without crashing. It’s about increasing capacity when demand grows and reducing capacity when demand falls. In theory, it sounds simple: more traffic = more servers = problem solved. But anyone who’s had to manage scaling infrastructure knows that it’s rarely that straightforward.
In the DevOps world, scaling means configuring auto-scaling groups (ASGs), setting up load balancers, provisioning new servers (usually EC2 instances), and possibly orchestrating them with Kubernetes. You have to monitor CPU usage, configure health checks, and balance your nodes across availability zones.
That’s… a lot. So let’s ask the big question:
💭 Can We Skip This?
Negative! Scaling is essential. At some point, your app will need to handle more requests, whether that’s from a product launch, a viral social post, or just normal user growth.
But we don’t need to go full “cloud architect” to solve this. With the right PaaS, scaling is as simple as adjusting a slider or running a single command. No YAML, no Kubernetes, no load balancers. Just more dynos or instances. In the PaaS world scaling is a dial, not a dissertation.
The PaaS-Ops Approach
In a PaaS setup, scaling is about turning a couple simple dials:
- Vertical scaling (making your instances more powerful)
- Horizontal scaling (adding more instances to handle more traffic)
If you’re using Heroku, we recommend reading this article to figure out your app’s best vertical scale (what Dyno type to run). Otherwise, try to lean on the side of smaller instances so that you can horizontally scale in smaller steps; it’s more efficient that way.
When it comes to horizontal scaling, it can be as easy as running a terminal command like:
$> heroku ps:scale web=3
Suddenly you’ve now tripled your web-request capacity. Want to scale back down to save money? heroku ps:scale web=1
— problem solved.
What’s great about this approach is that you don’t need to worry about the underlying infrastructure — no configuring load balancers, no managing individual servers, no orchestration tools. Heroku, Fly.io, or Render handle all of that for you.
✅ Tip
Pro Tip: It’s neat that Heroku et al. allow you to change your scale so easily, but we recommend not changing your scale manually this way except in very rare cases. Instead, PaaS’s afford us the option to integrate automatic scaling extremely easily. See our Tool Tip just below for how to get your scale automatically adjusted 24/7 according to your traffic for free.
What to Ignore
When it comes to scaling, small teams should skip anything that requires managing individual servers, writing custom scaling policies, or worrying about custom metrics. Specifically, don’t burn time on:
- Auto-scaling groups (ASGs). These are useful in a highly complex AWS environment but not relevant or useful in a PaaS world.
- Load balancers. Your PaaS provider should handle this for you automatically; you should absolutely not try to integrate your own! Don’t reinvent the wheel here!
- Cluster management tools (like Kubernetes). These are irrelevant in the PaaS world, where the platform will automatically manage your cluster for you.
- Infrastructure-as-Code tools (like Terraform). Terraform and other IaC tools are neat and absolutely important for large, complex, custom builds (e.g. AWS-land). But they’re not day-one tools! Even if you want to add them to your PaaS setup, we recommend delaying that as long as possible. You can always add it later!
Your scaling should be a dial or knob, not a config or project!
Tool Tip
We mentioned automatic scaling up in our Pro Tip above, but the reality of PaaS-Ops scaling is that you essentially should never need to think about scaling. You should run automatic scaling (“autoscaling”) and carry on with your actual product needs. Autoscaling exists to constantly ensure your app is scaled to the right number of instances according to your traffic and load — it’s a beautiful, automatic feedback loop system that keeps everything running smooth without any intervention. And it works best on PaaS tools.
We recommend Judoscale since it autoscales via queue time (which is the best way) and supports several different PaaS platforms. Judoscale is a plug-and-play add-on that was made to fit the PaaS-Ops ethos. It’s also free 🎉
What to Study
The reality is that with a tool like Judoscale running on your application, you likely won’t worry about scaling day-to-day. Instead, it might a once-a-month kind of check. But nonetheless, a PaaS-Ops fluent engineer should still know what to watch and what to tweak when the need arises.
In terms of what to watch, by far the most important metric to monitor is queue time — for both web
processes and background workers. If web requests are piling up, or background jobs are backing up, queue time will alert you! You should also keep an occasional eye on your CPU saturation, as this may also indicate a need to scale horizontally (though it tends to be less reliable than queue time).
In terms of what to tweak, you should be familiar with how to customize and tailor your autoscaling settings for your app. We have a three-part series on that very topic! Additionally, you should have a good sense of when to change the container size you’re using. A post like this one should serve to help you decide which one is right for your app.
Finally, you should have some familiarity with how to split out background job queues and processes to maximize your parallel performance and efficiency. We previously wrote “An Opinionated Guide to Planning Your Sidekiq Queues” which is Sidekiq/Rails specific, but ultimately applicable to any background job system. A PaaS-Ops fluent dev should be highly familiar with how to organize background jobs into different queues with different latency expectations… and how to use autoscaling to ensure those expectations are met!
3. 🔒 Secrets: Use Config Vars and Move On
In concept, secrets management is the process of storing and accessing sensitive information like API keys, database credentials, and third-party tokens in a secure, organized way. You don’t want your credentials hardcoded into your codebase, and you definitely don’t want them leaked in version control. Secrets should be easy to update, secure from prying eyes, and available to your application at runtime.
In a traditional DevOps setup, secrets management can quickly become a whole thing. You’d use tools like AWS Secrets Manager or HashiCorp Vault, configure IAM policies to manage who can access what, set up rotation policies to change your secrets periodically, and possibly even integrate this with Kubernetes so your pods can access their credentials.
It’s a lot of complexity for something that — let’s be honest — most small teams don’t need to overthink. So let’s go back to our tried-and-true question:
💭 Can We Skip This?
Sorry, not this one either 😕. Secrets management is critical, especially as your team grows. Sensitive data should never be stored in your codebase. But we can absolutely skip the complexity of building a custom secrets management system! Secrets management is super easy in the PaaS-Ops world — hardly something we even think about once it’s up and running!
The PaaS-Ops Approach
Ready for the whole solution? It’s beautifully simple. Use environment variables. That’s it! The secret to easy secrets when you need to keep them secret is (secretly?) just using ENV vars.
Heroku, Render, Fly.io, and similar platforms let you manage environment variables directly in their dashboards or via their CLI tools. These variables get injected into your app at runtime, and you don’t have to worry about the underlying security — the platform handles that for you.
Most PaaS providers also make it easy to manage separate sets of config vars for different environments (e.g., development, staging, production).
✅ Tip
Pro Tip: If you run Github Actions, you can setup secure, secret ENV vars for your CI / testing workflow too — this should only take a few minutes but allows you to use various tokens and keys in your tests, if needed. NOTE: don’t use your production tokens or keys in your tests… 😅. See more here
What to Ignore
The PaaS-Ops solution here is so simple and easy that we really should ignore anything related to “secrets” altogether. We don’t need a “secrets” tool. Specifically, stay away from:
- AWS Secrets Manager. It’s overkill for anything PaaS-Ops related and doesn’t really integrate with any PaaS systems to begin with!
- HashiCorp Vault. Similar to AWS Secrets Manager, this tool is built for a totally different workflow and has no bearing on PaaS tools.
- Secret Rotation Systems. We don’t need automatic secret rotation or any kind of extensive system here… for small teams, just keep it simple and rotate your secrets on your own accord from time to time. This is one tool too many!
The reality is that, if you secrets setup is taking over an hour to configure or setup, you’re probably over-engineering it. Secrets should be simple for small teams.
What to Study
Secrets management is one of those areas that feels like set-it-and-forget-it, but a PaaS-Ops fluent engineer knows that nothing stays secure forever. You have a responsibility to rotate your secrets periodically, both to reduce risk and maintain a healthy app.
The primary metric to watch is secret rotation frequency — meaning, how long it’s been since your last rotation. If you’ve gone months (or, let’s be honest, years) without updating your API keys, database credentials, or third-party tokens, it’s time to refresh them. Stale secrets are risky secrets. Make it part of your process to rotate them on a regular cadence.
As for knobs to turn, start with access control. Ensure that secrets are scoped to the least privilege necessary. For example, your production database credentials should only be accessible in production environments — not in local development or staging. PaaS platforms typically handle environment variable overrides well, so be sure you’re familiar with how to manage different secrets for each environment (e.g., staging vs. production).
Tool Tip
We’re quite hesitant to recommend any tool in this section because, again, secrets should be remarkably simple on a PaaS — just use environment variables! But, if you’re convinced that you need something one (sizable) step up in complexity and configuration, check out Doppler. It’s a secrets management tool that has simple integrations with most PaaS’s.
But really, stick with environment variables for as long as you can. Once you dive into secrets management it can become quite the rabbit hole!
4. 🔎 Monitoring & Observability: Don’t Over-Engineer It
In concept, monitoring and observability give you insight into how your app is running out in production. Are users hitting errors? Is your app’s response time getting sluggish? Did your latest deployment cause memory usage to spike? The answers to these questions are essential for keeping your app healthy and your users happy. But getting those answers also needs to be quick and easy for your team!
In a DevOps world, monitoring often means standing up a whole stack of tools: Prometheus for metrics, Grafana for dashboards, Fluentd or Logstash for log aggregation, and a custom alerting system to wake you up when something breaks. You’d configure your own dashboards to track every metric under the sun — from CPU usage to disk I/O to the number of active database connections. Oof. That’s a lot of work.
But wait! Can we…
💭 Can We Skip This?
Only if you like driving with a blindfold on! Don’t skip this. Your team needs to be aware of how your production application is running and if issues arise. Even the simplest apps need some level of monitoring to catch issues before your users do. But you don’t need to build a custom monitoring stack. There’s an easier, headache free-er, way.
The PaaS-Ops Approach
Most PaaS platforms offer basic monitoring out of the box — a fully ready dashboard webpage with metrics that update in real time. For examples:
- Heroku offers a dedicated dashboard that shows response times, memory usage, throughput, system events and/or errors, and even certain language metrics for available languages
- Render also offers their own real-time metrics dashboard that shows similar stats across any services running there
- Fly.io has out-of-the-box regional metrics and performance stats too
And the reality is that these dashboards may be enough! For small teams and early-stage applications, these simple metrics are enough to ensure that everything is running smoothly. Which is neat because we had to do nothing for them — they’re just part of using a PaaS.
✅ Tip
Pro Tip: While running just your native dashboards is fine, we often recommend adding an APM tool just for added visibility into endpoint-specific performance and tracing. Something like Scout APM is typically our first choice, especially since their Heroku add-on has a very generous free tier. We love free!
What to Ignore
Remember: small team, simple setup. Our PaaS’s come with basic monitoring out of the box and we should do our best to avoid integrating any additional / third-party system that isn’t simple plug-and-play. For instance, avoid:
- Prometheus / Grafana anything. These tools are neat and incredibly powerful, but they can be fiddly, they’re lower-level than we prefer with PaaS systems, and they require a lot of manual configuration and setup. This is not out-of-the-box!
- Collecting every metric possible. Again, keep it simple! We’re not after every metric. We just want the metrics that would actually cause us to take action if they’re out of line. Errors, response time, and resource limits, for examples.
- Custom alerting systems. As much as possible, stick with the alerts built into your PaaS. And, if doable, wire them up to your existing team communication system (Slack, Teams, etc.) so that you don’t have to proactively go check something else.
We just need to know that our application is online, running as fast as we expect it to, and not in danger of crashing due to resource overload. As long as those three things are true, we probably don’t need to worry about our production app and can get back to developing new features for our product. Do your best to stay out of the weeds here!
What to Study
There are a few metrics that are important to stay aware of when it comes to monitoring. The first is simply your app’s response time. It remains one of the clearest indicators of your app’s health. Spiking response times tend to indicate some kind of issue — but you’ll never know if you never keep an eye on your response times! Next, keep an eye on your 500 response rates. A sudden increase in 500 error responses is a bad sign — something’s going wrong. Lastly, keep an occasional eye on your health checks. Health check tools exist simply to give you third-party reassurance that your app is online… so you should look at / use them!
When it comes to knowing which knobs to turn, your best friend here is alerts. You don’t want to find out your app is down because a customer emailed you. Set up alerts for critical issues like high error rates, long response times, or failed health checks. At the same time, be careful not to create too much noise — alerts should be actionable, not annoying.
You should also know how to customize dashboards to surface the metrics that matter most to your team. Your PaaS provider may have built-in dashboards, but third-party tools can provide more flexibility if needed. Keeping a dashboard in your office can be a handy way to get a sense of confidence around your app’s consistent uptime.
Tool Tip
We already mentioned a great tool above in our Pro Tip, Scout, but we’ll also mention another one here — LogTail (BetterStack). Logs are a big part of observability and debugging. While most PaaS’s will show you logs in real-time, adding a proper log management tool to your PaaS setup is usually both easy and free. These tools allow you to search your logs, generate charts and dashboards from your log data, and typically come out of the box with some helpful defaults. We recommend LogTail — particularly their generous free-tier on their Heroku add-on. An example of LogTail’s default dashboard for Heroku applications:
5. 🛑 Error Management: Knowing is Half the Battle
In concept, error management is about knowing when your app breaks in production — and what to do about it. No matter how well you test locally or in staging, unexpected things happen when real users interact with your app. They hit weird edge cases. APIs you rely on fail. Race conditions sneak in. We need to find out when (and where!) errors happen in our production environment so that we can fix them!
In a DevOps world, error management might mean building a custom solution to catch, aggregate, and alert on exceptions. You’d manage log pipelines, configure alerting systems, and wire up dashboards to track error rates across various environments. That’s a lot of setup for a small team. Especially when off-the-shelf solutions do all of that for you.
Alright, one more time with my favorite question:
💭 Can We Skip This?
It’s going to be a ‘nope’ here too. Ignoring errors in production is a fast track to angry users and churned customers. You need to know when things go wrong… ideally before your users start emailing support! 🔥
But again, we’re not trying to reinvent the wheel here. Skip building anything to track your errors! Use an error tracking service that works out of the box.
The PaaS-Ops Approach
The PaaS-Ops approach to error management is simple: use a third-party error tracking tool like Sentry or Honeybadger. These tools integrate with your app (via SDKs) and automatically capture uncaught exceptions, aggregate them, and notify your team.
That’s it! Mission accomplished. We’ve done this several times for several different apps and it usually takes less than 15 minutes to get error tracking, including notifications back to you, completely set up. Seriously, just:
- Choose a tool and add its package to your app; deploy that to production
- Set up notifications via email, Slack, Teams, etc. so that you know when a new error occurs
- Don’t worry about every little error!
The last point there is key, too. As a small team you simply don’t have time to fix everything. Try to prioritize errors into a few different buckets, if possible. These three are a good starting-point: “fix it now”, “fix it next week”, “not worrying about fixing it yet”.
✅ Tip
Pro Tip: Make sure your error tracking tool also captures as much context with the error as possible. It can help when debugging later-on to know who the user was, what they were trying to do, where they came from, and any other transient state at the time of the error. Context matters!
What to Ignore
Since third party error tracking tools are so simple and out-of-the-box, basically everything beyond these tools is likely grounds for ignoring — work here is just wasted time. Try to avoid:
- Custom log pipelines. We discussed logs briefly in the above section and suggested an out-of-the-box system there too. “Custom” is not a word you want around your small team when it comes to infrastructure!
- Low-priority errors. Remember that even with a simple error-tracking system, you’re going to see alerts and notifications for all errors. Try to keep in mind that not all errors are worth your team’s time and efforts.
- DIY error dashboards. Stick to the tooling that comes with your chosen error-tracking system and don’t reinvent this wheel either.
What to Study
When it comes to error management, the most important metric to monitor is simply error frequency. You should be aware (on a fairly regular basis) how often errors are cropping up in your application. An error just arising today with 1,000 events is likely a much bigger problem than one which began three months ago and happens only once per day. Error frequency can help you determine the priority level of your errors!
As for knowing which knobs to turn, the two most important ones are alerts and release tracking. These are both fairly simple — you should know how to configure alerts so that you and your team will be quickly made aware of new errors popping up in production. At the same time, be intentional with your alert rules to avoid unnecessary noise. You want alerts for critical errors that impact users, not for every little hiccup.
Release tracking is a powerful knob that often gets overlooked. In concept, it’s simple — make your error management software aware of when you release new code. In practice, it’s powerful. Release tracking allows your team to quickly identify which release introduced a bug and roll back if needed. Knowing exactly when an error first occurred — and what code changed — can save you hours of debugging time.
Tool Tip
Several error-tracking tools are trying to expand their offerings these days. Sentry, our default choice, included. These tools now offer log management and/or uptime monitoring and/or various other things beyond simply error tracking. Whether or not you want to use those additional features on those platforms is up to you. We tend not to — we like Sentry for error management and leave it at that — but just be aware that when you visit these services’ websites, there may be more happening than you expect! Try to ignore the noise and focus on their core tool.
6. 🛠 Services: Use Managed Databases and Add-Ons
In concept, your app relies on various services to function. The most common ones are databases (like Postgres or MySQL), caches (like Redis or Memcached), and queues (for background jobs). Without these, your app isn’t doing much beyond serving static pages. And that’s no ‘app’ at all! That’s just a website.
But here’s where things get tricky for small teams: running these services yourself is a nightmare of maintenance and troubleshooting. If you’ve ever SSH’d into a server to debug a database outage, you know that sinking feeling when you realize you’re the DBA now. Congratulations, you didn’t sign up for this, but here you are… staring at a terminal, googling furiously, writing esoteric commands like…
echo "select * from pg_ls_waldir()" | psql | grep -E '(archive|partial)$'
…because you’re about to run out of disk space and need to manage your WAL segments — whatever those are. This is madness.
The good news is that you don’t need to end up in that scenario. You don’t have to manage these services yourself at all. Managed services are a hallmark of PaaS setups and their value is unmatched when your team’s time is at a premium (which it always is).
But hey, just for fun, let’s ask my favorite question one last time:
💭 Can We Skip This?
Only if you’re going to run an app with no database, no cache, no queue, and no other services. So… probably not. But we can and should skip managing them ourselves. It’s just not worth the time for a small team. Let someone else handle the backups, updates, and scaling headaches.
The PaaS-Ops Approach
In the PaaS world, using services is as simple as clicking a button to provision a managed database or cache. We worry very little about services beyond that. Here’s what you should know:
- Managed Databases
- Use Heroku Postgres, Render Databases, Fly.io’s built-in database clusters, or even third-party managed data services like Crunchy Data. These services handle backups, failover, encryption, and scaling for you. You don’t need to worry about maintaining the database server itself — just connect your app and go. Save time, move on, build your product.
Caches
- For caching, use managed Redis services. Redis.io (especially via their PaaS add-ons) is a great example: click to provision, and you’re up and running. The service handles persistence, automatic failover, and performance monitoring.
Queues
- If your app runs a background job system, like Sidekiq (Ruby on Rails) or BullMQ (NodeJS), go ahead and setup another Redis instance — a dedicated, separate Redis for your background queues. This instance can be set to an eviction policy of “never” to ensure you never lose jobs, but will still provide you with performance monitoring, alerting, and failover protections.
✅ Tip
Pro Tip: ⚠️ Pay close attention to connection limits on managed services. Most PaaS databases have limits on the number of concurrent connections, so you’ll need to be a little careful to ensure you won’t hit that limit. This isn’t complicated math but you do need to anticipate how many services/dynos you’re going to run at any given time.
Luckily, we built a database connection calculator to help you figure out this issue. And the good news is, once you’re sure you won’t exceed your limit, you don’t need to think about it again! This is a one-time safety check.
What to Ignore
Small teams should skip self-hosting any core services. Specifically:
- Don’t run your own Postgres server. Seriously, unless you love configuring replication and doing 3 AM restores from backup, just don’t.
- Don’t self-host Redis. Let a managed provider handle the persistence and scaling.
- Avoid DIY queue systems. There are plenty of well-supported job queue libraries that integrate seamlessly with managed Redis.
- Amazon SQS. Redis is just so much simpler. You know how we feel about AWS by this point, right?
Trying to manage these services yourself as a small team is a waste of time when better options exist!
What to Study
A PaaS-Ops fluent engineer should keep a close eye on a couple of metrics in this category. Most simply, that’s connection counts and overall performance metrics. Connection limits end up being a bottleneck for many applications as they grow, so being away of your real-time connection counts can help you avoid that roadblock before it hits. Know your provider’s limits and plan accordingly.
In terms of knobs to turn, the first is simply your service tier. Managed services often have different pricing tiers that affect available resources, connection limits, and performance capabilities. If your app is outgrowing its current tier, a quick upgrade can solve a lot of performance headaches without requiring any code changes.
The second important knob is query optimization for your database. While most PaaS-managed databases handle backups, scaling, and failover for you, bad queries are still your responsibility. Be familiar with basic query performance tools (like EXPLAIN
for Postgres) to spot slow queries and optimize your indexes where necessary.
Lastly, for background job queues, make sure you understand how to split your queues to balance your load. Creating separate queues for high-priority jobs and low-priority jobs helps ensure that critical tasks aren’t blocked by less important ones. This organizational knob can dramatically improve your app’s performance under heavy load. We even wrote a whole article on just that topic.
Tool Tip
One of the best parts of Heroku (and other PaaS platforms) is their add-ons marketplace. You can provision Postgres, Redis, Elasticsearch, email services, error monitoring tools, and more with a single click. These services get added to your app with zero configuration and give you direct access via your app dashboard! Some of the newer PaaS’s don’t quite have a marketplace system up and running yet, but if yours does, it’s worth the time savings to use it.
The Myth of the Perfect Infrastructure
Now that we’ve covered six core pieces of PaaS-Ops Fluency, allow us to tangent here just a little bit…
There’s something funny about engineering-brained people — particularly in the early days of a product. We’ve all been there: you’re hacking away at a fresh new project, and before long, you’ve abstracted your way into a beautiful, flexible architecture that’s ready for every possible future use case. Your app is containerized, your CI pipeline is pristine, and you’ve got environment-specific configs for production, staging, and that one demo environment you set up for investors (nice 🤑).
You start thinking, “We should really automate more of this. It’ll make scaling easier.” So you open a new tab, start reading about Terraform, Kubernetes, and multi-cloud failover, and suddenly you’re deep into the DevOps rabbit hole.
And why wouldn’t you be? The product is still taking shape. Features are half-baked, and the roadmap keeps shifting. So instead of fully committing to product development, you hedge your bets by abstracting the infrastructure — because if everything is abstracted, you’ll be ready for whatever changes come your way.
But here’s the thing: The perfect infrastructure doesn’t exist. And trying to build one is a trap.
The Trap of Flexibility for Flexibility’s Sake
Early-stage teams often chase flexibility in their infrastructure because they don’t know what the product will look like six months from now. It feels prudent to prepare for change.
So you build for scenarios that haven’t happened yet. You optimize for traffic that hasn’t materialized. You containerize services that aren’t even separate. You add layers of abstraction and redundancy to future-proof your app — until you wake up one day and realize that your team is spending more time on infrastructure than on building the product.
The irony is that the more flexibility you bake into your infrastructure, the more rigid things become. The more knobs you add, the more decisions your team has to make. The more layers you abstract, the harder it is to troubleshoot. And suddenly, that flexibility you were chasing? It’s slowing you down.
Building a hyper-flexible, can-do-anything infrastructure setup that perfectly meets the needs of a yet-to-be-finalized product overlooks the most important part of a startup: the product itself. The goal isn’t to build perfect infrastructure. The goal is to build a product! Your infrastructure should be just good enough to get you to the next milestone — nothing more.
So What is PaaS-Ops?
For small teams in particular, PaaS-Ops is the rejection of building the perfect infrastructure for your application. Instead, PaaS-Ops represents the choice to use infrastructure that meets the needs of today, by leveraging the power of PaaS’s and out-of-the-box tools. Tomorrow might have different needs, but we’ll figure them out tomorrow.
Don’t know if your application is going to ultimately need a lot of crazy dynamic scaling in the future? Don’t worry about it, just use a PaaS with autoscaling for now.
Is your head of product banking on some crazy feature where errors get routed with custom rules set by each customer? Maybe just start with Sentry today and see if that idea ever really materializes (or, better yet, if customers ever actually ask for it).
Worried that you might end up with lots of environments that need complex secrets management? Worry less and just ENV vars today.
PaaS-Ops is all about staying focused on your product so that you can get to that complex future somewhere down the road. Shipped code beats perfect infrastructure every single time — PaaS-Ops simply means prioritizing shipping code instead of perfecting infrastructure.
You’re Not Building AWS, and That’s a Good Thing
Ultimately, you’re not a FAANG company — and you probably don’t want to be. You’re a small, scrappy team with a product to build and customers to serve. Your job isn’t to create the next great cloud platform — your job is to get your app in front of users and iterate.
So don’t fall into the trap of chasing perfect infrastructure. Build what you need, skip what you don’t, and let your PaaS tools do the heavy lifting. The future-proofing can wait. Right now, your priority is survival.
And PaaS-Ops? It’s how small teams survive.