VeePeenini, Part 10: Ninety Seconds of Downtime
· Vitor Pontual · 4 min read
The plan from Part 9 was good. Running it live, while my friends were mid-tournament, is a different feeling. Here is how it actually went, step by step, because the calm version is the one worth remembering.
Wait for “finished,” not for the whistle
First I waited for a match to be genuinely done, which is not the same as the final whistle. The app grades a match a minute or two after it ends: it settles every prediction, awards the points, and drops the loot packs. If I had taken my copy of the database in that window, I’d have captured a half-graded match and lost the points that landed a moment later.
So I watched the database directly until every prediction for that game had flipped to graded and the running totals stopped moving. Whistle is not finished. Graded and stable is finished. Only then did I move.
Take the front door down
I asked everyone to hold for a few minutes, then I took the front door down: stopped the app, and separately stopped the one nightly job that writes straight to the database. With both of those off, nothing on the old box could change. The blackout had started, and the clock was running, and that pressure is exactly why all the rehearsing was worth it.
The moment the plan lives or dies
I cloned the database to the new box and ran the checksum gate. This is the single moment the whole thing succeeds or fails.
source server : profiles=9 preds=453 trades=772 stickers=6244 sum_points=9304
new box : profiles=9 preds=453 trades=772 stickers=6244 sum_points=9304
ck_profiles d6a2d8cc == d6a2d8cc
ck_preds 48b6fb52 == 48b6fb52
ck_stickers 276ede58 == 276ede58
ck_trades 1f2f376f == 1f2f376f all match
Every count, every total, and every content fingerprint, identical on both boxes. If even one fingerprint had differed, the plan was to abort on the spot and bring the old box straight back, with nothing lost, since the old box was simply paused. I didn’t have to. They matched.
Flip the address
I brought the scoring jobs up on the new box, turned them off on the old one, and repointed the app’s address to the new dedicated tunnel. Then I watched.
For about a minute the address bounced between healthy and a “bad gateway” error. That’s expected: a change to where an address points takes a short while to spread across the network, so for a moment some requests still arrived at the old, now-stopped box. Then it settled, and stayed settled, a long unbroken run of clean, healthy responses, every one of them served from the new box.
I checked one more thing to be sure it was really the new box answering and not a stale cached copy: the responses came back marked as freshly generated, not from any cache, and each one carried a live database check that only a running server could produce. The old box’s app was stopped, so a live answer could only be coming from the new one.
The result
Total time my friends could not use the app: around ninety seconds.
The game is now live from the second location, with every point, sticker, and trade intact and verified down to the byte. The old server is sitting there powered off but ready, a fallback I can fall back to if something surprising shows up. One honest caveat I kept in front of myself: the instant players started acting on the new box, the old box’s copy began going stale, so the real safety net was that ninety-second blackout, not the old box afterward. Once people are playing again, the new box is the only truth.
A clean cutover is a good day. What I took away from it, including the things I got lucky on and the things an earlier decision quietly saved me from, is Part 11.