Watermelon

Every metric was green. Nothing was fine.

Jun 15, 2026

^{[This article is a rewritten version of ‘Your Incident Metrics Are Lying To You’, originally published in February 2026]}

The service delivery manager smiled the way a crocodile smiles, with the teeth doing the work and the eyes elsewhere, and shared their screen. Every metric was green. It had been green last month. It would be green next month. Green appeared to be the natural and permanent condition of the service, the way the ocean is wet and the all-hands is long.

A wonderful provider. No problems anywhere. Pages of ticket data to prove it.

You already know this is not where the story ends. We would not have come all this way for a happy one.

The green was real, in the sense that the numbers genuinely were that colour. It was also fiction, in the sense that nothing it described matched what we had been living through. Calls were closed before anything was fixed. Major incidents ran behind a black box where a bridge should have been. Internal teams were neither consulted nor informed. The work was measured by the kilo, not the outcome.

But the metrics were green.

This is the watermelon. Green skin, and underneath, red all the way down. It is not a story about one bad provider. The watermelon comes for everyone with a dashboard, which is to say everyone. The odds are good that yours is sitting in a meeting somewhere right now, looking lovely.

The devil is in the detail, as it always is, and the detail is this: a number with no context is not information. It is a costume. And most incident practices are wearing several.

Take MTTR, the mean time to resolve, possibly the worst number you can love. Most incidents close quickly. Some close slowly. A few never really close at all; they sit open for months, then years, until someone doing a spring clean discovers a ticket old enough to draw a pension and mercifully puts it down. When you take the mean, those ancient zombies shamble back into the calculation and take a bite out of it. A single one can drag the average somewhere it has no business being. They exist in every organisation, and they are lying to you on average.

If you must have a typical value, use the median, and sit a p90 next to it. The median ignores the corpses. The p90 tells you what your bad days actually cost. Between them you get something close to the truth, which is more than the mean has ever offered.

Then there is time to detect, which depends entirely on what your organisation thinks detecting means. Measure only when the alert first fired and you will never learn that Bob did not look at it for three hours, being occupied with a coffee and a croissant. Measure only when Bob finally looked, and Bob becomes a hero for noticing a fire the building had been advertising since breakfast. You need both. The gap between the alert and the human is the whole story, and either number alone hides half of it.

Time to resolve has the same disease. Resolved by whom, and meaning what. Service restored, or ticket closed. These are not the same event and are frequently weeks apart. An incident parked in a monitoring state because a team thought it might recur, and then forgot it existed, and then left it gathering dust until someone tripped over it months later, will quietly poison your data the moment that ticket is finally closed. Pick a definition of resolved that means service is back. Make everyone use it. The alternative is fiction with a timestamp.

And the deepest cut of all: is any of it true to begin with? Incident records are set down once in the heat of the thing and almost never corrected afterwards. Unless your post-incident review actually checks the times, the durations, the sequence, your metrics are not measurements. They are first drafts that nobody edited, aggregated into a dashboard and presented as fact. You are not reporting on your incidents. You are reporting on your typing.

So fix the typing first. Make the review the place where the data is audited, not merely admired. Check the numbers as part of the root cause work, because a root cause built on a wrong timeline is just a confident guess. Stop assuming the data is correct. Find out.

Once the numbers are honest, you can finally ask the questions that matter, and here the work has changed. Reading every incident properly, at scale, used to require a room full of people nobody was ever going to hire. A competent model will now do it cheaply, reading the scribed record the way a witness would rather than the way a counter does. It finds the patterns the dashboard cannot: the playbook nobody updated, the handoff that stalls every single time, the decision point where the same forty minutes vanishes in incident after incident. Give it a referencing requirement so it shows its working, check it the way you would check any junior, and it will surface things that were hiding in plain sight. The trends were always there. You just never had the eyes for them.

While you are at it, measure the one number nobody ever mentions: time to assemble. The minutes from the incident being raised to the moment every team you need is actually on the line. It is strange that this is the forgotten metric, because it is one of the very few entirely within your control. If it takes forty-five minutes to get the right network engineer onto the bridge, no tooling on earth will save you, and no clever dashboard will hide it for long. Assembling the team is the first real act of incident management. Get faster at it and time to resolve falls out of the sky as a side effect. The sooner the right people are present, the sooner it is fixed, the sooner Bob gets back to the croissant. Everyone wins, except for the croissant.

Here is the part worth carrying out of the room. Every metric you measure is an instruction. Show me the number you reward and I will show you the behaviour you are about to get. Punish a team for missing a resolution target and they will not fix things faster. They will close tickets faster. The number will improve and the service will not, and you will have paid for the improvement with the only thing that ever mattered. The metric is the behaviour. It was always the behaviour.

Which brings us back to the watermelon, and to the uncomfortable thing the watermelon is actually for. It was never an accident of measurement. It was a success of avoidance. The provider wants the green, because green means no questions. The manager wants the green, because green means the decision to hire the provider was correct. The room wants the green, because red means work, and conflict, and somebody being wrong out loud. Nobody slices the watermelon open, because slicing it open is how you find out, and finding out was never what the dashboard was for.

The green was the most reliable thing the provider ever shipped. It worked perfectly, every month, without fail. It simply was not measuring the service.

It was measuring how badly everyone in the room wanted to stop looking.

Discussion about this post

Ready for more?