All posts
Reliabilitybackupsdisaster-recoveryrestore-drills

Restore-drills: proving your backups before you need them

A backup you've never restored is a hypothesis, not a safety net. HostSSH runs scheduled restore-drills on throwaway targets and turns recoverability into something you can actually see.

Sevak Girard· Founder, Girard Media·Jun 19, 2026·3 min read

Almost everyone backs up. Far fewer can tell you, with confidence, that the backup they took last night would actually come back. The gap between those two statements is where outages turn into disasters.

A backup is a claim: "this image contains everything needed to reconstruct the system." The only way to test a claim is to act on it. HostSSH does exactly that, on a schedule, so the answer to "are we recoverable?" is a fact on a dashboard rather than a hope in a runbook.

Why untested backups fail

The failure modes are boringly consistent, and none of them show up until the worst possible moment.

Silent corruption

A backup job that exits 0 only proves the job ran — not that the bytes it wrote are readable. Bit-rot in cold storage, a truncated upload, a half-rotated encryption key: all of these produce an image that looks fine in a bucket listing and is useless when you stream it back.

Incomplete capture

Database dumps are the classic trap. You faithfully back up Postgres every night, then discover on restore day that the uploaded media lived on a volume nobody snapshotted, the reverse-proxy config was hand-edited on the box, and three environment secrets only ever existed in one shell's history. The data came back; the system didn't.

Drift between backup and restore paths

The code that writes a backup and the code that reads it are different paths, and they rot independently. A format change, a renamed field, a new compression codec — any of these can make last year's images unreadable by this year's tooling. You only find out when you reach for an old image under pressure.

What a restore-drill actually does

A restore-drill is the inverse of a backup: instead of capturing a system, it reconstructs one and checks that the result is real. HostSSH runs the full path against a disposable target so the drill can never touch production.

  1. Provision a throwaway target — a fresh, isolated VPS that exists only for the duration of the drill.
  2. Restore the latest .hsi image — brain, every database, object storage, volumes, and secrets custody, exactly as a real recovery would.
  3. Bring the system up — start the Coolify/Docker brain and its sidecars on the new box.
  4. Verify against ground truth — the part that makes it a proof rather than a vibe.
  5. Tear the target down — the drill leaves nothing behind but a result.

Verification that means something

A green check is only worth what it measures. HostSSH drills assert concrete, falsifiable conditions:

  • Row counts match the source within the expected delta — the data is not just present, it's complete.
  • Domains return 200 — the app is wired up and serving, not merely installed.
  • TLS is valid — certificates reissued correctly and chain to a trusted root.
  • Volumes mount and contain expected files — the media and uploads survived the round trip.

If any assertion fails, the drill fails loudly — while you have time to fix it, not while customers are watching.

Reading the results

Each drill resolves to a single card: when it ran, which image it exercised, how long the restore took, and the per-assertion outcome. Two numbers matter most.

  • Time-to-restore is your real recovery-time objective, measured rather than estimated. If it's creeping up, you learn that on a calm Tuesday instead of during an incident.
  • The pass streak is the trust signal. A long unbroken run of green drills is the difference between "we think we can recover" and "we recovered, last night, in eleven minutes, automatically."

Make the proof routine

The house rule is simple: a backup you haven't restored isn't a backup. Schedule drills monthly at minimum — weekly for anything you'd lose sleep over — and treat a failed drill with the same seriousness as a failed deploy.

Recoverability you can see beats recoverability you're hoping for. Prove it before you need it, on a target you can throw away, so the day you actually need it is just another green card.