On this page

What an honest cloud bill looks like

Most teams who tell us their cloud bill is "out of control" haven't actually sat with it. They've seen the monthly total climb, asked finance for a breakdown, received a tag-grouped CSV, glanced at it, and given up.

That's not a billing problem. It's a half-hour with the right person standing in front of the right view.

The bill we walked into

A Swiss mid-market client — one platform, one cloud, three product teams. Monthly spend had grown 38 percent over twelve months while traffic grew about 9 percent. The CFO had stopped reviewing the line items in March.

Here's the shape of the spend, before we touched anything.

CategoryMonthly costShareWhat was actually happening
Compute (always-on)CHF 41,20034%Right-sized to peak, not to average. No autoscaling.
Object storageCHF 28,90024%Three years of build artefacts on standard tier.
NAT egressCHF 18,40015%Cross-AZ chatter from a misplaced cache.
Managed databaseCHF 14,10012%Multi-AZ on three non-production environments.
ObservabilityCHF 9,7008%Per-host pricing on a fleet that had quietly tripled.
OtherCHF 8,3007%Long tail.
TotalCHF 120,600100%

Four lines explain three quarters of the bill. Each one is a decision someone made, then forgot to revisit.

The screenshot moment

Sitting with the engineering lead and pulling up the cost explorer, we found this within twenty minutes:

Cloud cost breakdown by service
A typical cost-explorer view — most teams have one, few teams open it.

This isn't a tooling gap. It's a habit gap.

The four moves

We picked four moves and sized each one before touching anything. These weren't clever — they were just decisions that had been deferred.

Compute: right-size to average, not to peak

Compute was provisioned for peak load with no autoscaling. We turned on horizontal autoscaling, added a small reserved-instance commitment for the always-on baseline, and left a buffer on top.

# Before: a single fleet sized for peak
# After: a baseline group + autoscaled overflow

terraform apply -target=module.compute_baseline   # 30% of peak, reserved
terraform apply -target=module.compute_overflow   # autoscaled, on-demand

Roughly 28 percent off the compute line, with the same headroom on a Friday afternoon. No code changes.

Storage: lifecycle the artefacts

Three years of build artefacts on standard storage was the easy one. Anything over ninety days went to a cooler tier; anything over a year went to archive. The lifecycle policy is six lines.

A bucket policy nobody wrote is a bucket policy that defaults to expensive forever. Storage tiering is the smallest amount of engineering for the largest amount of recovered budget.

Saved roughly CHF 19,000 a month. The work took an afternoon.

Network: move the cache

The NAT egress was almost entirely one cache that had been deployed in the wrong availability zone. Each cross-AZ hit was billed. Move the cache, save the line.

Observability: tier the agents

Per-host pricing on every node — including ephemeral build runners — was the kind of thing nobody notices until somebody runs the bill against the fleet inventory. We tiered: full observability on production, minimal on staging, nothing on ephemeral runners.

Where it landed

CategoryBeforeAfterChange
Compute (always-on)CHF 41,200CHF 29,700−28%
Object storageCHF 28,900CHF 9,800−66%
NAT egressCHF 18,400CHF 3,100−83%
Managed databaseCHF 14,100CHF 9,200−35%
ObservabilityCHF 9,700CHF 4,800−51%
OtherCHF 8,300CHF 7,900−5%
TotalCHF 120,600CHF 64,500−47%

A 47 percent reduction. Six weeks of focused work, no architectural rewrite, no headcount change.

The pattern under it

The bill wasn't out of control. It was unattended.

The same pattern shows up almost every time we open a client's cost explorer for the first time. Defaults compound. Decisions accumulate. Nobody owns the bill until somebody sits down with it.

If your monthly cloud spend has grown noticeably faster than your traffic, the next move probably isn't a re-architecture. It's an afternoon, the cost explorer, and a person who's allowed to make changes.

That's usually where we start too.

Bring the problem early.

Architecture, AI, delivery.