There is a particular silence that arrives around 3:17 in the morning, ninety minutes into a bridge call, when nobody yet wants to say what everybody knows - which is that the thing is much worse than the dashboard suggested and someone in another timezone is about to have a very bad day. I’ve spent nearly two decades inside that silence. I have notes.

Zero Sev Zero is what came of those notes.

I’m T - they/them, eighteen-plus years in incident and problem management, most recently inside a software company you’ve probably heard of, until I wasn’t. The work goes by various names: reliability, ITSM, major incident management, the people you call when the thing won’t stop being on fire. It is a strange profession, full of strange people, and almost nobody outside of it understands what we actually do or why we keep doing it.

That is roughly the territory this publication intends to map.

If you’ve come here for sober frameworks and bullet-pointed retrospectives, the open internet is generously supplied with those and I wish you well. What’s on offer here is something else - dispatches from inside an industry that takes itself far too seriously about all the wrong things and not nearly seriously enough about the rest. Postmortems written as etiquette guides. Layoff notices as Atkinson sketches. The quiet category of incident nobody wants to declare. The form changes; the argument underneath does not. The way an organisation handles failure says more about it than the way it handles success, and most of us have been getting it wrong on purpose.

I have stories. They are not all flattering. This is where I’m putting them.

User's avatar

Subscribe to Zero Sev Zero

Reliability thoughts and philosophy for technology and software engineering

People