Lesson 1.7

1.7: Incidents

8 minutes

Incidents

The first five pillars are about reducing the probability and impact of bad events. This pillar is about what happens when one lands on your desk anyway. An incident is the moment where a security program stops being a binder and starts being a series of decisions made with too little information, on too little sleep, in the presence of lawyers.

What lives here

  • Detection. How you find out something is wrong. Usually not a CISO-grade SIEM alert. Usually a customer, a vendor, a journalist, or a law enforcement agent calling.
  • Decision-making. Who has authority to decide what during an incident. Who pays ransoms. Who shuts things down. Who decides a notification is required.
  • The work. Containment, eradication, recovery. The actual hours of running down indicators, rotating credentials, rebuilding systems.
  • Communication. What you say to customers, regulators, investors, staff, press. The narrative, the timing, the tone.
  • The afterwards. Post-incident review. What you learned. What you change. What the lawyers let you write down.

What typically goes wrong

No one knows the IR process. You have an incident response policy because you needed one for SOC 2. It lives in Confluence. The engineer who wrote it has left. The current on-call has never read it. The first real incident is the moment you discover this.

Missing basics surface at exactly the wrong time. There’s no decision-log template — so you’re trying to invent one at 3am. There’s no backup counsel — so your first call is to your commercial lawyer, who doesn’t do breach work. There’s no PR playbook — so your communications strategy is a slack thread with the CEO at hour four. There’s no regulator matrix — so you’re googling “GDPR 72 hours” while the clock runs.

The wrong person is running point. The person running technical containment is also running executive communications and customer notification and legal coordination. All of it degrades. Severe incidents need at least two coordinators: one on the technical work, one on everything else.

No tabletop history. You’ve never done a practice run. The team’s first experience with the process is during a real incident, which is like their first time playing basketball being in an NBA playoff game.

Over-disclosure or under-disclosure. Panic drives one of two patterns: you say too much publicly before you know the facts, and spend the next six months walking statements back. Or you say too little, assuming silence is safer, and find out during discovery that silence counted as omission.

What mature orgs do differently

Tabletop exercises, at least twice a year. A two-hour facilitated scenario. Ransomware, data exposure, insider threat. Executives in the room. Decisions tracked. Gaps written down. You find the gaps in the tabletop; you don’t want to find them in the real thing.

Pre-negotiated panel counsel. You have a breach counsel firm on retainer, or at least on speed-dial. You’ve done a conflict check. They’ve read your incident response plan. When you call them, they don’t need a ramp-up.

Decision-delegation matrix. A written document, approved by the CEO and legal, that names who can decide what during an incident. Who shuts down production? Who decides on ransom negotiation? Who engages the FBI? Who speaks to media? When the answer to each question is an individual role — not “we’ll figure it out” — speed and judgment both improve.

Post-incident reviews that inform controls. Not a blameless recounting that ends in a Confluence page nobody reads. A review whose outputs are a small number of specific control changes, each owned, each on a calendar. If the review didn’t produce change, it didn’t happen.

Anchor: Colonial Pipeline, and MGM vs. Caesars

Colonial Pipeline, May 7, 2021. DarkSide operators deployed ransomware in Colonial’s IT environment. Colonial shut down operational pipeline control as a precaution — they couldn’t confirm the OT environment was clean and couldn’t verify billing systems. The pipeline delivers roughly 45% of the fuel consumed on the U.S. East Coast. Gas stations ran dry. Airlines rerouted. Panic buying made it worse. Colonial paid $4.4M to the attackers to obtain a decryption key, then found the key was too slow to use and fell back to backups. The FBI recovered most of the ransom three weeks later.

Every decision was made with partial information. Shut down the pipeline? Pay the ransom? Notify whom, when? There was no good version of this incident — only trade-offs between bad outcomes. The Colonial case is the archetype of incident decisions under information gaps.

MGM vs. Caesars, September 2023. The same threat actor — Scattered Spider — ran social engineering against both casino companies within weeks. Both were compromised. Both had ransomware deployed.

Caesars paid a reported ~$15M ransom and restored operations relatively quietly. MGM refused to pay and publicly committed to rebuilding from backups. MGM’s recovery took ~10 days with guest-facing systems down across their Las Vegas properties — slot machines offline, hotel key cards dead, restaurants unable to process checks. Total financial impact to MGM was around $100M. Caesars’ direct cost was lower, but their decision attracted its own regulatory and reputational weight.

Same attacker. Opposite decisions. Both legally defensible. The lesson is not that one was right and the other wrong. It’s that incident response is decision-making under constraint, and the organizations that do it well are the ones that rehearsed these trade-offs before the call came in. These scenarios set up Module 5, where we’ll walk through the decision structure.