From autoscaling to always-on: a self-hosted GitLab CI story

One of our customers runs a payment-gateway aggregator. They asked to stay anonymous, so we'll call them PGWA. Like a lot of teams in payments, they run their own GitLab — self-hosted, on their own infrastructure, for the usual compliance reasons.

That one fact shapes everything about their CI. Self-hosted GitLab means GitLab.com's shared runners are off the table entirely — those only exist for projects hosted on GitLab.com. If you run your own instance, you bring your own runners. So the only real questions are: how fast, and how much.

We spent a good chunk of 2025 answering those two questions the hard way. Here's the path — including the parts where we were wrong — and why PGWA's pipelines have now run on the same boring setup for nine months.

Attempt 1 — AWS, the textbook way

We started where most teams start: the popular cattle-ops terraform-aws-gitlab-runner module on AWS. EC2 Spot workers (m5.large / m5.xlarge), the docker+machine executor, an S3 distributed cache, and an autoscaling schedule — a couple of warm instances during working hours, scale to zero overnight. Later we also tried the newer docker-autoscaler executor.

It worked. It also had four problems that never went away:

Cold starts. docker+machine launches a fresh EC2 instance per burst. The first job in a burst waited 5–10 minutes for a machine to boot and provision before it did any actual work.
An unpredictable bill. Spot is cheap until it isn't. Price swings and interruptions meant the monthly invoice tracked AWS's mood, not PGWA's budget.
Version lock. We had to pin the module to 8.1.0 — the v9 line broke for us with Preparation failed: exit status 1 — so we were frozen on a known-good version and accumulating tech debt.
Cache overhead. S3 caching worked, but it was one more moving part with its own API calls and transfer costs.

AWS was the slow, expensive part of the stack. So we moved the compute.

Attempt 2 — a Hetzner autoscaler (faster, but…)

We rebuilt the same idea on Hetzner Cloud: the docker-autoscaler executor with the Hetzner fleeting plugin, a cpx51 manager spawning up to six workers on demand — roughly 36 concurrent jobs at peak — out of Helsinki.

Hetzner was genuinely faster than AWS and far cheaper per core. We were happy for about a week. Then the two problems that haunt every autoscaler showed up:

Cache fragmentation. Each ephemeral worker had its own local /cache. Scale up → brand-new machine → cold cache → re-pull base images, re-download dependencies, rebuild layers. The "fast CI" promise leaks out at exactly the moment you scale, which is exactly the moment you needed it.
A bill you still can't forecast. Spawn-on-demand means the invoice follows your busiest week. Better than Spot, but finance still couldn't put a number on next month.

Faster, cheaper, same shape of pain. So we stopped trying to be clever.

Attempt 3 — the boring answer: always-on

Instead of N ephemeral workers that come and go, we gave PGWA a small number of always-on boxes that never go cold:

5 × Hetzner cpx51 (8 vCPU / 16 GB each), spread across Nuremberg, Helsinki and Falkenstein for redundancy.
10 concurrent jobs per box → 50 parallel jobs, always ready.
A warm local cache — same machine, every job, no re-pull cliff when load spikes.
A plain docker executor. No fleet manager, no scaling rules, no plugin to debug.

The trade is obvious and we'll say it out loud: you pay for a little idle capacity. In return:

	Autoscaler	Always-on
First-job latency	5–10 min cold start	none — boxes are up
Cache on scale-up	cold, re-pull	warm, persistent
Monthly bill	tracks your busiest week	flat, known number
Ops surface	scaling rules + fleet health	five boxes running gitlab-runner

We added a node-exporter on each box for metrics and a nightly cleanup cron, and that was essentially it.

Nine months later

We moved PGWA onto the always-on setup in September 2025. It's been running their pipelines ever since — about nine months — through their normal load, without the Spot interruptions, cache misses, or surprise invoices of the autoscaling era.

The most telling number is the one we stopped tracking: how often we think about the runners. We don't. They run.

The lesson — and why Runsetters exists

For a steady, self-hosted GitLab workload, predictability and a warm cache beat dynamic cost-optimisation. Autoscaling sells "pay only for what you use," but the hidden line items are cold starts, cache misses, and a bill you can't plan around. An always-on box trades a sliver of idle capacity for a lot of calm.

That setup — managed, always-on Hetzner runners for self-hosted GitLab — is exactly what we turned into Runsetters. PGWA was the proving ground; the product is that experience without you having to run the Terraform.

If you run your own GitLab and you're tired of either babysitting an autoscaler or paying per-minute somewhere else, that's the itch we built this to scratch. Point your .gitlab-ci.yml tags: at our runner and the box, updates, and capacity are ours to worry about.

Attempt 1 — AWS, the textbook way

Attempt 2 — a Hetzner autoscaler (faster, but…)

Attempt 3 — the boring answer: always-on

Nine months later

The lesson — and why Runsetters exists

Stop buying CI minutes. Start renting machines.