The Room Where the Playbook Died

The Room Where the Playbook Died

The Room Where the Playbook Died

The binder was on the table. Spiral-bound, tabbed, two inches thick. Someone had laminated the cover.

I noticed it because nobody was touching it.

It was 11 p.m. on a Tuesday, and I was standing in the makeshift war room of a Fortune 500 company — hundreds of physical locations, billions in annual revenue, a company that had spent real money on security infrastructure. The CISO was there. The CIO was there. Two outside incident response firms had people dialing in. And in the center of the conference table, between the cold coffee and the half-eaten boxes of delivery food, sat the incident response playbook that had taken months to build, review, and approve.

Nobody touched it. Not once. Not that night, and not in the days that followed.


I want to tell you about that binder, because I think it explains something that most post-incident write-ups get wrong. They talk about the breach — how the attackers got in, what they encrypted, what the ransom demand was. They treat the playbook as either a success or a failure. They recommend you build a better one.

What they don't tell you is what happens in the room when the playbook stops being useful — which is usually within the first hour.

The ransomware had detonated across their environment sometime during the previous 48 hours, moving laterally through segments that were supposed to be isolated. By the time anyone knew something was wrong, the encryption had already reached production. The backups — the ones everyone trusted, the ones that had been tested in tabletop exercises — had been sitting on infrastructure that was now compromised. The recovery path that everyone had mentally rehearsed simply did not exist in the form anyone expected.

That's when the binder became a decoration.


Here is what I have learned, across more than two decades of getting called into situations like this: a playbook is a document about a world that no longer exists. It was written before the breach. It assumes a specific topology, a specific failure mode, a specific sequence of events. The moment an actual incident begins, the real environment starts diverging from the documented one — and it diverges faster than anyone wants to admit.

The playbook told the team to failover to their secondary site. The secondary site was on the same domain. The playbook said to restore from backup. The backup system had been touched. The playbook had an escalation tree. Half the people on it were unavailable, traveling, or — in one memorable case — on a cruise ship somewhere without reliable satellite internet.

None of this was negligence. The team had done the right things. They had engaged a reputable incident response firm. They had built documentation. They had run exercises. And none of it was worth very much when the actual geometry of the crisis failed to match the geometry of the plan.

What saved them — and this is the part that never makes the case study — was something much older and much less glamorous than any of that. It was architecture.


Somewhere in the infrastructure history of this company, a decision had been made that most modern organizations would consider backward: a set of core operational systems ran on a legacy network segment that was not joined to the corporate domain. It was a technical debt item. It was on the modernization roadmap. People had complained about it in planning meetings.

When the ransomware moved through the environment, it moved along trust relationships. Domain. Active Directory. Shared credentials. The paths that interconnected systems. That legacy segment had none of those connections. It sat there, unreachable by the encryption, running on an architecture that predated the very assumptions the attackers were exploiting.

I will be honest with you: nobody planned for that. It was not a design decision made with this scenario in mind. It was an accident of history that became, in the worst week of that organization's recent memory, the thing that kept them from a complete catastrophe. A significant portion of their revenue ran through those systems. And those systems kept running.

That gap — between the encrypted corporate infrastructure and the still-functioning edge locations — gave us time. Not much, but enough.


The nights that followed were not comfortable. We rebuilt core infrastructure from clean media. We re-architected the data center topology under pressure, making decisions in hours that would normally take months of committee review. We did not pay the ransom. Not because of principle, though I do have views on that, but because we calculated that we could recover faster than we could negotiate and validate a decryption key — and we were right, if barely.

What I kept coming back to, during those long nights, was the question of why the architecture held where the playbook didn't.

A playbook is a procedure. It tells people what to do under assumed conditions. Architecture is a constraint system. It shapes what is possible — and what is impossible — regardless of what anyone planned for or documented. When the conditions of a breach stop matching your assumptions, the playbook becomes fiction. The architecture is still real.

The network segmentation that existed — even the accidental kind — limited blast radius. The offline data that existed — even the stuff people had meant to modernize away — gave us a recovery path. The manual processes that had survived in parallel with automated ones, because no one had gotten around to eliminating them, gave us operational continuity while we worked.

Every piece of resilience that mattered was structural. None of it was procedural.


I have been in enough of these situations now to know that this pattern is not unique. The organizations that recover fastest from major incidents are almost never the ones with the most sophisticated playbooks. They are the ones whose infrastructure had, intentionally or not, diversity built into it. Redundancy that crossed trust boundaries, not just failure modes. Isolation that was genuine, not just documented. Recovery paths that existed at the infrastructure layer, not just in a plan.

The organizations that struggle the longest are the ones that consolidated everything — centralized identity, unified backup systems, converged networks — in the name of efficiency and manageability. Those are real values. I am not arguing against them. But pure consolidation, without deliberate resilience design, means that the blast radius of a major incident expands to fill whatever you have made available to it.

What surprised me, in that war room, was how little the people there understood their own infrastructure. Not because they were incompetent — they weren't. But because the architecture had been built incrementally, documented inconsistently, and understood mostly by people who had since left the organization. The actual behavior of the environment under stress was, in important ways, unknown to the people responsible for it.

That is a different problem than a missing playbook. And it is not solved by a better one.


We eventually got them back. Full recovery took longer than anyone wanted, but the business survived. Revenue continuity during the incident was, by any reasonable measure, better than it had any right to be — because of a network segment that was on the modernization roadmap.

The binder stayed on the table. I don't know what happened to it after we left.

What I do know is this: if you want to understand your actual resilience posture, don't start with your playbooks. Start with your architecture. Ask whether your backup systems exist on infrastructure that shares trust relationships with what they're backing up. Ask whether your network segmentation would survive a credential compromise. Ask what your recovery path looks like if the primary one is unavailable — not in the documentation, but in the actual infrastructure topology.

The answers to those questions will tell you more about your real recovery capability than any tabletop exercise. Because in a crisis, you don't rise to the level of your playbooks. You fall to the level of your architecture.

And sometimes, if you're lucky, what saves you is the legacy system that nobody got around to modernizing.


I keep thinking about that laminated cover. Someone made that decision — the laminating — because they wanted the binder to last. To be durable. To survive repeated use.

It lasted. It just wasn't what we needed.

What we needed had been there all along, unreachable and unglamorous, running on hardware that should have been retired years ago. It didn't look like a resilience strategy. It looked like technical debt.

In the right circumstances, those are the same thing.

Is your architecture ready for the breach your playbook can't handle?

Most mid-market companies discover their real resilience posture during an incident — not before. A focused architectural review can surface the gaps your playbooks are hiding.

← Back to Insights