Faultless Systems




On March 18, 2018 in Tempe, Arizona, a self-driving vehicle owned by Uber fatally hit a pedestrian. Who is at fault? “Handoff/handover” issues make human-assisted Autonomous Vehicles (AV) more accident prone than fully-AV systems. People within the automotive industry know this, but most AV tests currently being conducted rely on a human assistant in the driver seat, given the ridiculous task of taking over the car in the case of an emergency. We get the worst of all worlds: rely on automation to reduce the demands on your attention, but remain vigilant at all times; you may be called upon to make life-saving decisions instantaneously without having vital sensory context; all in service of developing a system that in theory will not be nearly as safe as one that is fully automated.

The AV industry suffered a major public relations setback, and the race is on to find who is at fault for this and similar incidents as they grab the headlines. This doesn't help anyone, least of all the engineers building the system. Engineering teams working on complex systems operate at their best within a context that has no valence for fault. This isn't a rebuke of responsibility or ethical consequences; quite the opposite. Holistic understanding of complexity transcends arbitrary attributions.

Who is at fault? In a complex system, no one is at fault. In order to unlock the potential of high performance teams and organizations (large teams) we need to internalize the fact that no one is at fault. Only after that can we make the changes that will have the greatest impact on improving the outcomes of complex systems.

Learning objectives include counter-intuitive gems like:

  • Simple systems are not safer; reducing the complexity in a system is usually a bad idea.
  • Complex systems are easier to optimize through verification rather than validation; whether a system works is more important to measure than how it works.
  • Testing and experimentation are not the same thing, and we all might want to get more comfortable with experimentation as all systems tend toward increasing complexity.
  • Redundancy and workload margin aren't safety strategies; systems will eat the redundancy in silence, masking hazards as they grow.
  • Efficiency is brittle; high performance teams operate with inefficiency.
  • Put-them-through training as an incident response is an organizational odor; you can't train your way to better alignment.

Most important theme: it really, really isn't anyone's fault when something goes wrong in a complex system. Root cause is a lie; avoiding blame in postmortems isn't a reprieve by the grace of management; the whole concept of fault is a cognitive error in situational analysis.