Restores Are Harder Than Backups
A backup that can't be restored is a zip file you feel good about. Building the restore side taught me how many ways a restore can succeed while failing.
A backup that can't be restored is a zip file you feel good about.
I knew this. Everyone knows this. But it took building the restore side of this tool to understand how many ways a restore can go wrong while looking like it succeeded.
The Restore Flow
A restore moves a backup from storage to a target server and imports it into a database. Sounds simple. Five stages.
Each stage has its own progress range. The dashboard shows a progress bar so I know where a long restore is sitting. A 800MB backup to a server on a slow connection can take a while - I don't want to stare at a spinner wondering if it's stuck.
But every one of those stages has a failure mode I didn't anticipate until I hit it.
The --force Trap
MySQL's import client has a --force flag. It keeps going past per-statement errors instead of stopping at the first one. I enabled it because some dumps include generated column values that the target rejects - harmless rows that shouldn't abort the whole restore.
The problem: --force also means mysql exits with code 0 even when half the import failed. Your restore "succeeds." Your database has empty tables. You don't find out until something breaks in the app.
So now the restore captures stderr separately, then classifies every error line after the import finishes. Known-acceptable errors (like generated column warnings from Laravel Pulse tables) get logged and tolerated. Anything else fails the restore.
The verification step at the end is the safety net. After every import, it counts the tables in the target database. Zero tables means the import silently failed. That's a hard failure, not a successful restore.
Decompress Before You Drop
This one was an ordering mistake. The first version would drop the existing database, then decompress the backup file, then import. Logical order, right?
Except if the compressed file is corrupt, you've already dropped the database. Now you have no data and no backup to restore from.
Now it decompresses first. If the file is corrupt, the restore fails before anything gets dropped. The existing database stays untouched.
Cross-Engine Compatibility
I have MySQL 8 servers and MariaDB servers. They speak mostly the same dialect. Mostly.
MySQL 8 introduced utf8mb4_0900_ai_ci as the default collation. MariaDB doesn't recognize it. Import a MySQL 8 dump into MariaDB and you get ERROR 1273: Unknown collation. Every table with the default collation fails to create.
The restore detects what the target server actually is before importing. If it's MariaDB, it streams the dump through sed and rewrites MySQL 8's collations to MariaDB equivalents:
utf8mb4_0900_ai_ci → utf8mb4_unicode_ci
utf8mb4_0900_as_cs → utf8mb4_bin
For MySQL targets, no rewrite. For MariaDB-sourced dumps going to MariaDB, no rewrite needed either - they never contain the _0900_ collations in the first place.
There's also MariaDB's sandbox mode comment. MariaDB 10.5+ dumps start with a /*M!999999\- enable the sandbox mode */ comment that MySQL 8 chokes on. That gets stripped for every MySQL-protocol restore.
Bandwidth Throttling
Some of my servers are on connections where I can't just blast a 500MB file over SFTP at full speed. It would saturate the link and affect everything else running on that server.
Each server has an optional bandwidth limit in KB/s. When it's set, uploads use SFTP with a throttled progress callback that sleeps when the transfer is running ahead of schedule. The restore stays within the limit without needing external tools.
Restore Schedules
Manual restores are fine for emergencies. But the real value is automated verification.
A restore schedule says: after this database backs up, automatically restore it to that server into this database. If the restore succeeds, the backup is verified. If it fails, I hear about it.
There are two trigger types. Cron schedules run restores on a fixed cadence - useful for keeping a staging database refreshed from production. After-backup triggers fire automatically when a backup completes - that's the verification workflow.
The verification server is a separate machine with its own database credentials. The restore creates (or drops and recreates) the target database, imports the dump, and verifies the table count. If anything fails at any stage, I get a notification.
This is the part that lets me sleep at night. I don't have to trust that backups are good. I can prove it, automatically, every time one runs.
Concurrency
Two restores targeting the same database on the same server at the same time would be bad. One drops the database, the other tries to import into it, both corrupt each other.
Each restore acquires a lock keyed to the target server and database name. If a restore is already running for that combination, the new one skips and logs a warning. Same pattern the backup side uses to prevent concurrent backups of the same database.
The restore side took longer to build than the backup side. Every edge case was something I discovered by having a restore that looked successful but wasn't. The --force trap, the decompress ordering, the cross-engine collation mismatch - none of these showed up until I was actually restoring to different servers with different database engines.
The next post covers everything else that broke along the way, and how the commit history tells the story of hardening a backup system for production.