“…to ensure we prevent this incident from ever happening again.” Every DevOps engineer has seen this phrase in almost every public incident write up of the modern tech age. But the terrible truth is, no matter how thorough the action items are, we can’t actually prevent incidents from happening again. Why? Because truly “repeated” incidents within complex systems at scale, are as likely as finding dozens of mosquitoes full of viable dino DNA preserved in amber.
Speaking of which, I’ll use the example of two popular incident documentaries: Jurassic Park (1993) and Jurassic World (2015) to illustrate. While they may appear identical (incidents at amusement parks where dinosaurs get loose and eat people) these are not actually the same incident: the contributing factors, response coordination, and tech platforms are drastically different.
Lessons learned: Making our processes and systems more resilient for the next time a failure happens serves us better than repeatedly trying to “prevent” incidents. (And also, don’t genetically splice dinosaurs with the DNA of other animals… we learn that too.)