It worked on my machine — that’s the bug.

A companion to the last note. That one was about tests that pass against a stand-in. This one is about a quieter failure — code that genuinely worked, but only because of state nobody had written down. It passes every test, runs fine for months, and then fails the first time someone builds it clean.

01

The migration runner that couldn’t connect

The project pinned exactly one database driver: asyncpg, the async one. But the migration runner — the script that applies schema changes — was written for a synchronous driver. A synchronous code path driving an async driver throws MissingGreenlet the instant it touches I/O. So, as written, the runner literally could not connect using the only driver the project installed. Internally contradictory.

And yet — earlier migrations had been applied successfully. They existed; the schema was real. The only way that could have happened: at some point they were applied under a synchronous psycopg that someone installed by hand, in the moment, to get unstuck — and never added to the lockfile. By the time I looked, that tool was gone. “It worked before” was completely true. It just depended on a tool that lived nowhere in version control. The fix was to rewrite the runner as async, against the one driver that’s actually pinned — so it reproduces from a clean checkout, every time.

02

The build that refused on a fresh clone

Different seam, same shape. Python’s setuptools, if you don’t explicitly declare your package, falls back to “flat-layout auto-discovery” — and it refuses to build if it finds more than one top-level directory that looks like a package. Someone had added a second folder — a proof/ directory — next to the application package. Now there were two. A clean build refuses, with an error and no binary.

So why had it built fine for weeks? Because the working environment already had the package installed and cached. The task runner reused that cached build and never rebuilt from scratch, so auto-discovery never had to run. The failure only appeared on a cold build — a fresh git worktree, no cache, building from zero. The fix: declare the package explicitly, so discovery never has to guess.

The principle

“Works on my machine” means “works in my already-built environment”

Both bugs are the same species: the thing worked, but only because of state that wasn’t captured anywhere — a warm cache, a hand-installed tool, an environment shaped by months of use. The defect isn’t absent; it’s dormant, waiting for the first clean rebuild — a new hire’s laptop, a CI runner, a fresh container, the production box on its first boot.

In the frame from the last note, this is the reproducibility gap — one of the ways a green pipeline lies. A test or a build that only passes on uncaptured state isn’t evidence the system works; it’s evidence the system works here. Two defenses:

  1. Rebuild from zero, on a schedule. A fresh clone, no cache, build and migrate from nothing — ideally in CI, on every change. Then the cold-path failures get caught by a robot today, instead of by a new hire six months from now.
  2. Pin every input. The driver, the build config, the tools. A process that quietly depends on something unpinned isn’t a process — it’s a memory, and memories don’t survive a clean machine.

A plumb line only tells the truth because it hangs from a fixed point. Pin your inputs to a fixed point and rebuild against it — or what you’re trusting isn’t a measurement. It’s a memory of one that used to work.

This is the gap we hunt.

Plumbline runs a one-week Drift Audit that finds where your AI-built software passes today but won’t survive a clean rebuild — through a relay you control, zero access to your systems.

Book a 20-min fit call
← All field notes plumblinehq.ai