├── LICENSE.md
├── README.md
├── STAMP.md
├── boundary.graffle
├── boundary.png
├── graceful-extensibility.md
├── intro.md
├── laws.md
├── paries-keynote-2015.pptx
├── resilience-doodle.jpg
├── risk-management-framework.graffle
├── risk-management-framework.png
├── topics.md
└── topics
    ├── changing-perspective-on-safety.md
    ├── common-misconceptions.md
    ├── human-human-interaction.md
    ├── human-machine-interaction.md
    ├── incident-analysis-pragmatics.md
    ├── nature-of-complex-systems.md
    ├── the-nature-of-cognitive-work-during-an-incident.md
    ├── what-can-go-badly-during-an-incident.md
    └── what-we-mean-by-resilience.md


/LICENSE.md:
--------------------------------------------------------------------------------
  1 | ## Creative Commons Attribution-ShareAlike 4.0 International Public License
  2 | 
  3 | By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-ShareAlike 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions.
  4 | 
  5 | **Section 1 – Definitions.**
  6 | 
  7 | 1.  **Adapted Material** means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image.
  8 | 2.  **Adapter's License** means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License.
  9 | 3.  **BY-SA Compatible License** means a license listed at [creativecommons.org/compatiblelicenses](//creativecommons.org/compatiblelicenses), approved by Creative Commons as essentially the equivalent of this Public License.
 10 | 4.  **Copyright and Similar Rights** means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section [2(b)(1)-(2)](#s2b) are not Copyright and Similar Rights.
 11 | 5.  **Effective Technological Measures** means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements.
 12 | 6.  **Exceptions and Limitations** means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material.
 13 | 7.  **License Elements** means the license attributes listed in the name of a Creative Commons Public License. The License Elements of this Public License are Attribution and ShareAlike.
 14 | 8.  **Licensed Material** means the artistic or literary work, database, or other material to which the Licensor applied this Public License.
 15 | 9.  **Licensed Rights** means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license.
 16 | 10.  **Licensor** means the individual(s) or entity(ies) granting rights under this Public License.
 17 | 11.  **Share** means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them.
 18 | 12.  **Sui Generis Database Rights** means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world.
 19 | 13.  **You** means the individual or entity exercising the Licensed Rights under this Public License. **Your** has a corresponding meaning.
 20 | 
 21 | **Section 2 – Scope.**
 22 | 
 23 | 1.  **License grant**.
 24 |     1.  Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to:
 25 |         1.  reproduce and Share the Licensed Material, in whole or in part; and
 26 |         2.  produce, reproduce, and Share Adapted Material.
 27 |     2.  Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions.
 28 |     3.  Term. The term of this Public License is specified in Section [6(a)](#s6a).
 29 |     4.  Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section [2(a)(4)](#s2a4) never produces Adapted Material.
 30 |     5.  Downstream recipients.
 31 |         
 32 |         1.  Offer from the Licensor – Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License.
 33 |         2.  Additional offer from the Licensor – Adapted Material. Every recipient of Adapted Material from You automatically receives an offer from the Licensor to exercise the Licensed Rights in the Adapted Material under the conditions of the Adapter’s License You apply.
 34 |         3.  No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.
 35 |         
 36 |     6.  No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section [3(a)(1)(A)(i)](#s3a1Ai).
 37 | 2.  **Other rights**.
 38 |     
 39 |     1.  Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise.
 40 |     2.  Patent and trademark rights are not licensed under this Public License.
 41 |     3.  To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties.
 42 | 
 43 | **Section 3 – License Conditions.**
 44 | 
 45 | Your exercise of the Licensed Rights is expressly made subject to the following conditions.
 46 | 
 47 | 1.  **Attribution**.
 48 |     
 49 |     1.  If You Share the Licensed Material (including in modified form), You must:
 50 |         
 51 |         1.  retain the following if it is supplied by the Licensor with the Licensed Material:
 52 |             1.  identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);
 53 |             2.  a copyright notice;
 54 |             3.  a notice that refers to this Public License;
 55 |             4.  a notice that refers to the disclaimer of warranties;
 56 |             5.  a URI or hyperlink to the Licensed Material to the extent reasonably practicable;
 57 |         2.  indicate if You modified the Licensed Material and retain an indication of any previous modifications; and
 58 |         3.  indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.
 59 |     2.  You may satisfy the conditions in Section [3(a)(1)](#s3a1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.
 60 |     3.  If requested by the Licensor, You must remove any of the information required by Section [3(a)(1)(A)](#s3a1A) to the extent reasonably practicable.
 61 | 2.  **ShareAlike**.
 62 |     
 63 |     In addition to the conditions in Section [3(a)](#s3a), if You Share Adapted Material You produce, the following conditions also apply.
 64 |     
 65 |     1.  The Adapter’s License You apply must be a Creative Commons license with the same License Elements, this version or later, or a BY-SA Compatible License.
 66 |     2.  You must include the text of, or the URI or hyperlink to, the Adapter's License You apply. You may satisfy this condition in any reasonable manner based on the medium, means, and context in which You Share Adapted Material.
 67 |     3.  You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, Adapted Material that restrict exercise of the rights granted under the Adapter's License You apply.
 68 | 
 69 | **Section 4 – Sui Generis Database Rights.**
 70 | 
 71 | Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material:
 72 | 
 73 | 1.  for the avoidance of doubt, Section [2(a)(1)](#s2a1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database;
 74 | 2.  if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material, including for purposes of Section [3(b)](#s3b); and
 75 | 3.  You must comply with the conditions in Section [3(a)](#s3a) if You Share all or a substantial portion of the contents of the database.
 76 | 
 77 | For the avoidance of doubt, this Section [4](#s4) supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights.
 78 | 
 79 | **Section 5 – Disclaimer of Warranties and Limitation of Liability.**
 80 | 
 81 | 1.  **Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You.**
 82 | 2.  **To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You.**
 83 | 
 84 | 3.  The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability.
 85 | 
 86 | **Section 6 – Term and Termination.**
 87 | 
 88 | 1.  This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically.
 89 | 2.  Where Your right to use the Licensed Material has terminated under Section [6(a)](#s6a), it reinstates:
 90 |     
 91 |     1.  automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or
 92 |     2.  upon express reinstatement by the Licensor.
 93 |     
 94 |     For the avoidance of doubt, this Section [6(b)](#s6b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License.
 95 | 3.  For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License.
 96 | 4.  Sections [1](#s1), [5](#s5), [6](#s6), [7](#s7), and [8](#s8) survive termination of this Public License.
 97 | 
 98 | **Section 7 – Other Terms and Conditions.**
 99 | 
100 | 1.  The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed.
101 | 2.  Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License.
102 | 
103 | **Section 8 – Interpretation.**
104 | 
105 | 1.  For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License.
106 | 2.  To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions.
107 | 3.  No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor.
108 | 4.  Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority.
109 | 
110 | > Creative Commons is not a party to its public licenses. Notwithstanding, Creative Commons may elect to apply one of its public licenses to material it publishes and in those instances will be considered the “Licensor.” The text of the Creative Commons public licenses is dedicated to the public domain under the [CC0 Public Domain Dedication](//creativecommons.org/publicdomain/zero/1.0/legalcode). Except for the limited purpose of indicating that material is shared under a Creative Commons public license or as otherwise permitted by the Creative Commons policies published at [creativecommons.org/policies](//creativecommons.org/policies), Creative Commons does not authorize the use of the trademark “Creative Commons” or any other trademark or logo of Creative Commons without its prior written consent including, without limitation, in connection with any unauthorized modifications to any of its public licenses or any other arrangements, understandings, or agreements concerning use of licensed material. For the avoidance of doubt, this paragraph does not form part of the public licenses.  
111 | > 
112 | > Creative Commons may be contacted at [creativecommons.org](//creativecommons.org/).
113 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
   1 | # Resilience engineering papers
   2 | 
   3 | ## Overview
   4 | 
   5 | Alias: <http://resiliencepapers.club> (thanks to [John Allspaw](https://twitter.com/allspaw)).
   6 | 
   7 | This doc contains notes about people active in resilience engineering, as well as some influential
   8 | researchers who are no longer with us, organized alphabetically. It also includes people and papers
   9 | from related fields, such as cognitive systems engineering and naturalistic decision-making.
  10 | 
  11 | If you're not sure what to read first, check out [Resilience engineering: Where do I start?](intro.md)
  12 | 
  13 | ## Annotations
  14 | 
  15 | A [BH](https://safety177496371.wordpress.com/) link indicates Ben Hutchinson's [Safety & Performance Research Summaries](https://safety177496371.wordpress.com/) blog.
  16 | Ben writes summaries of safety papers, posting them to his blog as well as LinkedIOn.
  17 | 
  18 | A [TWRR](http://resilienceroundup.com) link indicates Thai Wood's [Resilience Roundup](http://resilienceroundup.com). Thai publishes a newsletter that
  19 | summarizes resilience engineering papers.
  20 | 
  21 | ## Other interesting links
  22 | 
  23 | [resilienceinsoftware.org](https://resilienceinsoftware.org) is the Resilience in Software Foundation, a community of software people who are interested in resilience engineering.
  24 | 
  25 | 
  26 | For a collection of talks, check out the [Resilience Engineering, Cognitive Systems
  27 | Engineering, and Human Factors Concepts in Software
  28 | Contexts](https://www.youtube.com/playlist?list=PLb1aZTnPf3-OMChMkrr6WsokRI6LOnuem)
  29 | YouTube playlist maintained by John Allspaw.
  30 | 
  31 | You might also be interested in my [notes on David Woods's Resilience Engineering short course](https://github.com/lorin/res-eng-short-course-notes).
  32 | 
  33 | The papers linked here are also in the [zotero res-eng group](https://www.zotero.org/groups/2335189/res-eng/items).
  34 | 
  35 | ## People
  36 | 
  37 | For each person, I list concepts that they reference in their writings, along
  38 | with some publications. The publications lists aren't comprehensive:
  39 | they're ones I've read or have added to my to-read list.
  40 | 
  41 | * [John Allspaw](#john-allspaw)
  42 | * [Lisanne Bainbridge](#lisanne-bainbridge)
  43 | * [Andrea Baker](#andrea-baker)
  44 | * [E. Asher Balkin](#e-asher-balkin)
  45 | * [Johan Bergström](#johan-bergström)
  46 | * [Matthieu Branlat](#matthieu-branlat)
  47 | * [Sheuwen Chuang](#sheuwen-chuang)
  48 | * [Todd Conklin](#todd-conklin)
  49 | * [Richard I. Cook](#richard-i-cook)
  50 | * [Sidney Dekker](#sidney-dekker)
  51 | * [John C. Doyle](#john-c-doyle)
  52 | * [Bob Edwards](#bob-edwards)
  53 | * [Anders Ericsson](#anders-ericsson)
  54 | * [Paul Feltovich](#paul-feltovich)
  55 | * [Pedro Ferreira](http://www.resilience-engineering-association.org/user/pedro/)
  56 | * [Meir Finkel](#meir-finkel)
  57 | * [Marisa Grayson](#marisa-grayson)
  58 | * [Ivonne Andrade Herrera](#ivonne-andrade-herrera)
  59 | * [Robert Hoffman](#robert-hoffman)
  60 | * [Erik Hollnagel](#erik-hollnagel)
  61 | * [Leila Johannesen](#leila-johannesen)
  62 | * [Gary Klein](#gary-klein)
  63 | * [Elizabeth Lay](#elizabeth-lay)
  64 | * [Jean-Christophe Le Coze](#jean-christophe-le-coze)
  65 | * [Nancy Leveson](#nancy-leveson)
  66 | * [Carl Macrae](#carl-macrae)
  67 | * [Laura Maguire](#laura-maguire)
  68 | * [Christopher Nemeth](#christopher-nemeth)
  69 | * [Anne-Sophie Nyssen](#anne-sophie-nyssen)
  70 | * [Elinor Ostrom](#elinor-ostrom)
  71 | * [Jean Pariès](#jean-paries)
  72 | * [Emily Patterson](#emily-patterson)
  73 | * [Charles Perrow](#charles-perrow)
  74 | * [Shawna J. Perry](#shawna-j-perry)
  75 | * [Jens Rasmussen](#jens-rasmussen)
  76 | * [Mike Rayo](#mike-rayo)
  77 | * [James Reason](#james-reason)
  78 | * [J. Paul Reed](#j-paul-reed)
  79 | * [Emilie M. Roth](#emilie-m-roth)
  80 | * [Nadine Sarter](#nadine-sarter)
  81 | * [James C. Scott](#james-c-scott)
  82 | * [Steven Shorrock](#steven-shorrock)
  83 | * [Barry Turner](#barry-turner)
  84 | * [Diane Vaughan](#diane-vaughan)
  85 | * [Robert L. Wears](#robert-l-wears)
  86 | * [David Woods](#david-woods)
  87 | * [John Wreathall](#john-wreathall)
  88 | 
  89 | ## Some big ideas
  90 | 
  91 | * [The adaptive universe](#the-adaptive-universe) (David Woods)
  92 | * [Dynamic safety model](#dynamic-safety-model) (Jens Rasmussen)
  93 | * [Safety-II](#safety-i-vs-safety-ii) (Erik Hollnagel)
  94 | * [Graceful extensibility](#graceful-extensibility) (David Woods)
  95 | * [ETTO: Efficiency-tradeoff principle](#etto-principle) (Erik Hollnagel)
  96 | * [Drift into failure](#drift-into-failure) (Sidney Dekker)
  97 | * Robust yet fragile (John C. Doyle)
  98 | * [STAMP: Systems-Theoretic Accident Model & Process](#stamp) (Nancy Leveson)
  99 | * Polycentric governance (Elinor Ostrom)
 100 | 
 101 | Note: there are now [multiple contributors](https://github.com/lorin/resilience-engineering/graphs/contributors) to this repository.
 102 | 
 103 | ## John Allspaw
 104 | 
 105 | Allspaw is the former CTO of Etsy. He applies concepts from resilience engineering to the tech industry.
 106 | He is one of the founders [Adaptive Capacity Labs](http://www.adaptivecapacitylabs.com/), a resilience engineering consultancy.
 107 | 
 108 | Allspaw tweets as [@allspaw](https://twitter.com/allspaw).
 109 | 
 110 | ### Selected publications
 111 | 
 112 | * [STELLA: Report from the SNAFUcatchers Workshop on Coping with Complexity](https://snafucatchers.github.io/)
 113 | * [Trade-Offs Under Pressure: Heuristics and Observations Of Teams Resolving Internet Service Outages](https://www.researchgate.net/publication/295011072_Trade-Offs_Under_Pressure_Heuristics_and_Observations_Of_Teams_Resolving_Internet_Service_Outages)
 114 | * [Etsy Debrief Facilitation Guide](http://extfiles.etsy.com/DebriefingFacilitationGuide.pdf)
 115 | * [Blameless PostMortems and a Just Culture](https://codeascraft.com/2012/05/22/blameless-postmortems/) (blog)
 116 | * [Resilience engineering: learning to embrace failure](https://doi.org/10.1145/2366316.2366331)
 117 | * [Fault Injection in Production: Making the case for resiliency testing](http://queue.acm.org/detail.cfm?id=2353017)
 118 | * [Technical Debt: Challenges and Perspectives](https://doi.org/10.1109/MS.2017.99)
 119 | * [Revealing the Critical Role of Human Performance in Software](https://queue.acm.org/detail.cfm?id=3380776)
 120 | * [SRE Cognitive Work] in [Seeking SRE]
 121 | * [The infinite hows: An argument against the Five Whys and an alternative approach you can apply](https://www.oreilly.com/radar/the-infinite-hows/)
 122 | 
 123 | [SRE Cognitive Work]: https://www.researchgate.net/publication/343430302_SRE_Cognitive_Work
 124 | [Seeking SRE]: https://www.oreilly.com/library/view/seeking-sre/9781491978856/
 125 | 
 126 | ### Selected talks
 127 | 
 128 | * [Resilience Engineering: The What and How](https://devopsdays.org/events/2019-washington-dc/program/john-allspaw/)
 129 | * [Incidents as we Imagine Them Versus How They Actually Are](https://www.youtube.com/watch?v=8DtzmV1jiyQ)
 130 | * [How your systems keep running day after day](https://www.youtube.com/watch?v=xA5U85LSk0M)
 131 | * [Problem detection (papers we love)](https://www.youtube.com/watch?v=NxctiGRI2y8)
 132 |   (presentation of [Problem detection] paper)
 133 | * [Common Ground and Coordination in Joint Activity (papers we love)](https://paperswelove.org/2016/video/john-allspaw-common-ground/) (presentation of [Common Ground and Coordination in Joint Activity] paper)
 134 | * [Amplifying sources of resilience](https://www.infoq.com/presentations/resilience-thinking-paradigm/) (presentation about applying Resilience Engineering thinking & paradigms to the world of software engineering)
 135 | * [Incidents: What Is Often Missed & What Can Be Done About That](https://www.adaptivecapacitylabs.com/blog/2020/03/30/incidents-what-is-often-missed-what-can-be-done-about-that/#fvp_10,1s)
 136 | * [Incident Analysis: How *Learning* is Different Than *Fixing*](https://www.adaptivecapacitylabs.com/blog/2020/05/06/how-learning-is-different-than-fixing/)
 137 | 
 138 | 
 139 | ## Lisanne Bainbridge
 140 | 
 141 | Bainbridge is a psychology researcher. She has a website at http://www.complexcognition.co.uk/
 142 | 
 143 | ### Contributions
 144 | 
 145 | #### Ironies of automation
 146 | 
 147 | Bainbridge is famous for her 1983 [Ironies of automation] paper, which continues to
 148 | be frequently cited.
 149 | 
 150 | ## Concepts
 151 | * automation
 152 | * design errors
 153 | * human factors/ ergonomics
 154 | * cognitive modelling
 155 | * cognitive architecture
 156 | * mental workload
 157 | * situation awareness
 158 | * cognitive error
 159 | * skill and training
 160 | * interface design
 161 | 
 162 | ## Selected publications
 163 | * [Ironies of automation] ([TWRR](https://resilienceroundup.com/issues/35/))
 164 | 
 165 | 
 166 | [Ironies of automation]: https://www.sciencedirect.com/science/article/abs/pii/0005109883900468
 167 | 
 168 | ## Andrea Baker
 169 | 
 170 | [Baker](https://www.thehopmentor.com/) is a practitioner who provides
 171 | training services in human and organizational performance (HOP) and learning
 172 | teams.
 173 | 
 174 | Baker tweets as [@thehopmentor](https://twitter.com/thehopmentor).
 175 | 
 176 | ### Concepts
 177 | 
 178 | * Human and organizational performance (HOP)
 179 | * Learning teams
 180 | * Industrial empathy
 181 | 
 182 | ### Selected publications
 183 | 
 184 | * [A bit about HOP](https://docs.wixstatic.com/ugd/1a0149_21bcf20f158540098d3d7987ffbf3f58.pdf) (editorial)
 185 | * [A short introduction to human and organizational performance (hop) and learning teams](http://www.safetydifferently.com/a-short-introduction-to-human-and-organizational-performance-hop-and-learning-teams/) (blog post)
 186 | 
 187 | ## E. Asher Balkin
 188 | 
 189 | ### Selected publications
 190 | 
 191 | * [Resiliency Trade Space Study: The Interaction of Degraded C2 Link and Detect and Avoid Autonomy on Unmanned Aircraft](https://www.researchgate.net/publication/330222613_Resiliency_Trade_Space_Study_The_Interaction_of_Degraded_C2_Link_and_Detect_and_Avoid_Autonomy_on_Unmanned_Aircraft)
 192 | * [Developing Systemic Contributors and Adaptations Diagramming (SCAD): systemic insights, multiple pragmatic implementations]
 193 | 
 194 | ### Selected talks
 195 | 
 196 | * [Root cause and the wrong path](https://www.youtube.com/watch?v=kK6t-gttsJw)
 197 | 
 198 | ## Johan Bergström
 199 | 
 200 | [Bergström](http://www.jbsafety.se/p/about-me.html) is a safety research and
 201 | consultant. He runs the [Master Program of Human Factors and Systems
 202 | Safety](http://www.humanfactors.lth.se/msc-programme/) at Lund University.
 203 | 
 204 | Bergström tweets as [@bergstrom_johan](https://twitter.com/bergstrom_johan).
 205 | 
 206 | ### Concepts
 207 | 
 208 | * Analytical traps in accident investigation
 209 |    - Counterfactual reasoning
 210 |    - Normative language
 211 |    - Mechanistic reasoning
 212 | * Generic competencies
 213 | 
 214 | ### Selected publications
 215 | 
 216 | * [Resilience engineering: Current status of the research and future challenges](https://www.sciencedirect.com/science/article/pii/S0925753516306130)
 217 | * [Rule- and role retreat: An empirical study of procedures and resilience](https://www.researchgate.net/publication/50917226_Rule-_and_role_retreat_An_empirical_study_of_procedures_and_resilience)
 218 | * [Team Coordination in Escalating Situations: An Empirical Study Using Mid-Fidelity Simulation]
 219 | 
 220 | [Team Coordination in Escalating Situations: An Empirical Study Using Mid-Fidelity Simulation]: https://portal.research.lu.se/ws/files/1376441/3014838.pdf
 221 | 
 222 | ### Selected talks
 223 | 
 224 | * [Three analytical traps in accident investigation](https://www.youtube.com/watch?v=TqaFT-0cY7U)
 225 | * [Two Views on Human Error](https://www.youtube.com/watch?v=rHeukoWWtQ8)
 226 | * [What, Where and When is Risk in System Design?](https://www.youtube.com/watch?v=BtJIumyCrtE&feature=youtu.be) (Velocity 2013)
 227 | 
 228 | ## Matthieu Branlat
 229 | 
 230 | ### Selected publications
 231 | 
 232 | * [Basic patterns in how adaptive systems fail](https://www.researchgate.net/publication/284324002_Basic_patterns_in_how_adaptive_systems_fail) ([TWRR](https://resilienceroundup.com/issues/34/))
 233 | * [A practitioner’s experiences operationalizing Resilience Engineering]
 234 | * [Noticing Brittleness, Designing for Resilience]
 235 | 
 236 | [A practitioner’s experiences operationalizing Resilience Engineering]: https://www.sciencedirect.com/science/article/abs/pii/S0951832015000812
 237 | [Noticing Brittleness, Designing for Resilience]: https://www.taylorfrancis.com/chapters/edit/10.1201/9781315605708-18/noticing-brittleness-designing-resilience-elizabeth-lay-matthieu-branlat
 238 | 
 239 | ## Sheuwen Chuang
 240 | 
 241 | ### Selected publications
 242 | 
 243 | * [Beyond surge: Coping with mass burn casualty in the closest hospital to the Formosa Fun Coast Dust Explosion]
 244 | * [Coping With a Mass Casualty: Insights into a Hospital’s Emergency Response and Adaptations After the Formosa Fun Coast Dust Explosion] ([TWRR](https://resilienceroundup.com/issues/76/))
 245 | 
 246 | [Beyond surge: Coping with mass burn casualty in the closest hospital to the Formosa Fun Coast Dust Explosion]: https://doi.org/10.1016/j.burns.2018.12.003
 247 | [Coping With a Mass Casualty: Insights into a Hospital’s Emergency Response and Adaptations After the Formosa Fun Coast Dust Explosion]: https://www.researchgate.net/publication/335366770_Coping_With_a_Mass_Casualty_Insights_into_a_Hospital's_Emergency_Response_and_Adaptations_After_the_Formosa_Fun_Coast_Dust_Explosion
 248 | 
 249 | 
 250 | 
 251 | ## Todd Conklin
 252 | 
 253 | Conklin's books are on my reading list, but I haven't read anything by him
 254 | yet. I have listened to his great [Preaccident investigation
 255 | podcast](https://preaccidentpodcast.podbean.com/).
 256 | 
 257 | Conklin tweets as [@preaccident](https://twitter.com/preaccident).
 258 | 
 259 | ### Selected publications
 260 | * [Pre-accident investigations: an introduction to organizational safety](https://www.amazon.com/Pre-Accident-Investigations-Todd-Conklin/dp/1409447820)
 261 | * [Pre-accident investigations: better questions - an applied approach to
 262 |   operational learning](https://www.amazon.com/gp/product/1472486137)
 263 | * [Do Safety Differently](https://www.amazon.com/Do-Safety-Differently-Sidney-Dekker/dp/B09RM3Z17V)
 264 | 
 265 | ### Selected talks
 266 | 
 267 | Quanta - [Risk and Safety Conf 2019](https://www.youtube.com/watch?v=5WTbeFj2kJY&feature=youtu.be)
 268 | 
 269 | ## Richard I. Cook
 270 | 
 271 | [Cook](https://en.wikipedia.org/wiki/Richard_Cook_(safety_researcher)) was an anasthesiologist who studies failures in complex systems.  He is one of the founders [Adaptive Capacity Labs](http://www.adaptivecapacitylabs.com/), a resilience engineering consultancy.
 272 | He tweeted as [@ri_cook](https://twitter.com/ri_cook).
 273 | 
 274 | ### Concepts
 275 | * how complex systems fail
 276 | * degraded mode
 277 | * sharp end (c.f. Reason's blunt end)
 278 | * Going solid
 279 | * Cycle of error
 280 | * "new look"
 281 | * first vs second stories
 282 | 
 283 | ### Selected publications
 284 | 
 285 | * [A celebration of the work of Richard Cook, MD: A pioneer in understanding accidents, safety, human factors, and resilience](https://www.researchgate.net/publication/371403498_A_celebration_of_the_work_of_Richard_Cook_MD_A_pioneer_in_understanding_accidents_safety_human_factors_and_resilience)
 286 | * [How complex systems fail](https://www.adaptivecapacitylabs.com/HowComplexSystemsFail.pdf) ([BH](https://safety177496371.wordpress.com/2022/11/04/how-complex-systems-fail-a-classic-from-richard-cook/))
 287 | * [A brief look at the New Look in complex system failure, error, safety, and resilience](https://www.adaptivecapacitylabs.com/BriefLookAtTheNewLook.pdf)
 288 | * [void \*: Incidents as Untyped Pointers. *Where* complex systems fail](https://www.snafucatchers.com/single-post/2017/11/14/void-Incidents-as-Untyped-Pointers)
 289 | * [Distancing through differencing: An obstacle to organizational learning following accidents](https://www.researchgate.net/publication/292504703_Distancing_through_differencing_An_obstacle_to_organizational_learning_following_accidents)
 290 | * [Being bumpable](http://csel.eng.ohio-state.edu/productions/woodscta/media/beingbump.pdf) ([TWRR](https://www.getrevue.co/profile/resilience/issues/resilience-roundup-being-bumpable-issue-33-177340))
 291 | * [Behind Human Error]
 292 | * [Incidents - markers of resilience or brittleness?](https://www.researchgate.net/publication/292504952_Incidents_-_markers_of_resilience_or_brittleness)
 293 | * [“Going solid”: a model of system dynamics and consequences for patient safety](https://qualitysafety.bmj.com/content/14/2/130) ([TWRR](https://resilienceroundup.com/issues/going-solid-a-model-of-system-dynamics-and-consequences-for-patient-safety/))
 294 | * [Operating at the Sharp End: The Complexity of Human Error](https://www.researchgate.net/publication/313407259_Operating_at_the_Sharp_End_The_Complexity_of_Human_Error)
 295 | * [Patient boarding in the emergency department as a symptom of complexity-induced risks](https://www.researchgate.net/publication/312624891_Patient_boarding_in_the_emergency_department_as_a_symptom_of_complexity-induced_risks)
 296 | * [Sensemaking, Safety, and Cooperative Work in the Intensive Care Unit](https://www.researchgate.net/publication/220579381_Sensemaking_Safety_and_Cooperative_Work_in_the_Intensive_Care_Unit)
 297 | * [Medication Reconciliation Is a Window into “Ordinary” Work](https://www.taylorfrancis.com/books/e/9781317164777/chapters/10.1201/9781315572529-4)
 298 | * [Cognitive consequences of clumsy automation on high workload, high consequence human performance]
 299 | * [Implications of automation surprises in aviation for the future of total intravenous anesthesia (TIVA)]
 300 | * [The Messy Details: Insights From the Study of Technical Work in Healthcare]
 301 | * [Nosocomial automation: technology-induced complexity and human performance]
 302 | * [The New Look at Error, Safety, and Failure: A Primer for Health Care]
 303 | * [Grounding explanations in evolving, diagnostic situations]
 304 | * [A Tale of Two Stories: Contrasting Views of Patient Safety] (appendix B, starting on page 64 (numbered 52) contains the talk by Charles Billings, MD, Chief Scientist (retired), NASA Ames on the lessons learned from incident reporting in aviation. Dr. Billings designed, started, and managed the Aviation Safety REporting System)
 305 | * ["Those found responsible have been sacked": some observations on the usefulness of error](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.623.5749&rep=rep1&type=pdf) ([BH](https://safety177496371.wordpress.com/2025/01/26/those-found-responsible-have-been-sacked-some-observations-on-the-usefulness-of-error/))
 306 | * [Perspectives on Human Error: Hindsight Biases and Local Rationality])
 307 | * [Mistaking Error]
 308 | * [Adapting to new technology in the operating room]
 309 | * [Verite, Abstraction, and Ordinateur Systems in the Evolution of Complex Process Control](https://www.researchgate.net/publication/3657912_Verite_abstraction_and_ordinateur_systems_in_the_evolution_of_complex_process_control)
 310 | * [Collaborative Cross-Checking to Enhance Resilience] ([TWRR](https://resilienceroundup.com/issues/73/))
 311 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems]
 312 | * [The Role of Automation in Complex System Failures]
 313 | * [Thinking about accidents and systems](https://www.researchgate.net/publication/228352596_Thinking_about_accidents_and_systems)
 314 | * [The Stockholm blizzard of 2012](https://www.taylorfrancis.com/books/e/9781315605739/chapters/10.1201/9781315605739-11)
 315 | * [New Arctic Air Crash Aftermath Role-Play Simulation Orchestrating a Fundamental Surprise]
 316 | * [Dissenting Statement: Health IT Is a Class III Medical Device](https://www.nap.edu/read/13269/chapter/14)
 317 | * [Nine Steps to Move Forward From Error] ([BH](https://safety177496371.wordpress.com/2022/11/03/nine-steps-to-move-forward-from-error/))
 318 | * [Gaps in the continuity of care and progress on patient safety]
 319 | * [Above the Line, Below the Line](https://queue.acm.org/detail.cfm?id=3380777) ([TWRR](https://resilienceroundup.com/issues/68/))
 320 | * [Coping With a Mass Casualty: Insights into a Hospital’s Emergency Response and Adaptations After the Formosa Fun Coast Dust Explosion] ([TWRR](https://resilienceroundup.com/issues/76/))
 321 | * [SRE Cognitive Work] in [Seeking SRE]
 322 | * [Building and revising adaptive capacity sharing for technical incident response: A case of resilience engineering](https://www.sciencedirect.com/science/article/pii/S0003687020301903) ([TWRR](https://resilienceroundup.com/issues/building-and-revising-adaptive-capacity-sharing-for-technical-incident-response-a-case-of-resilience-engineering/))
 323 | * [Automation, interaction, complexity, and failure: A case study]
 324 | * [Human Performance in Anesthesia]
 325 | * [Two years before the mast: Learning how to learn about patient safety](https://www.researchgate.net/publication/285346573_Two_years_before_the_mast_Learning_how_to_learn_about_patient_safety)
 326 | * [Resilience is not control: healthcare, crisis management, and ICT]
 327 | * [Taking Things in One’s Stride: Cognitive Features of Two Resilient Performances]
 328 | * [Human Performance in Anesthesia: A Corpus of Cases]
 329 | * [Minding the Gaps: Creating Resilience in Health Care]
 330 | * [From Counting Failures to Anticipating Risks: Possible Futures for Patient Safety]
 331 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems]
 332 | * [Behind Human Error: Taming Complexity to Improve Patient Safety]
 333 | * [The Illusion of Explanation]
 334 | 
 335 | 
 336 | [Behind Human Error]: https://www.amazon.com/Behind-Human-Error-David-Woods/dp/0754678342
 337 | [Cognitive consequences of clumsy automation on high workload, high consequence human performance]: https://ntrs.nasa.gov/search.jsp?R=19910011398
 338 | [Implications of automation surprises in aviation for the future of total intravenous anesthesia (TIVA)]: https://doi.org/10.1016/S0952-8180(96)90009-4
 339 | [The Messy Details: Insights From the Study of Technical Work in Healthcare]: https://doi.org/10.1109%2FTSMCA.2004.836802
 340 | [Nosocomial automation: technology-induced complexity and human performance]: https://www.researchgate.net/profile/David_Woods11/publication/224649052_Nosocomial_automation_technology-induced_complexity_and_human_performance/links/59399b1da6fdcc58ae902c49/Nosocomial-automation-technology-induced-complexity-and-human-performance.pdf
 341 | [The New Look at Error, Safety, and Failure: A Primer for Health Care]: https://pdfs.semanticscholar.org/67f7/53ec089e5a8879f241e2be867dad0a2026fb.pdf
 342 | [Grounding explanations in evolving, diagnostic situations]: https://pdfs.semanticscholar.org/1bed/356b5aa67c701f5bad6d943768622095f418.pdf
 343 | [A Tale of Two Stories: Contrasting Views of Patient Safety]: https://www.researchgate.net/publication/245102691_A_Tale_of_Two_Stories_Contrasting_Views_of_Patient_Safety
 344 | [Perspectives on Human Error: Hindsight Biases and Local Rationality]: https://www.nifc.gov/PUBLICATIONS/acc_invest_march2010/speakers/Perspectives%20on%20Human%20Error.pdf
 345 | [Mistaking Error]: https://www.researchgate.net/publication/328149714_Mistaking_Error
 346 | [Adapting to new technology in the operating room]: https://www.researchgate.net/publication/14230576_Adapting_to_New_Technology_in_the_Operating_Room
 347 | [Collaborative Cross-Checking to Enhance Resilience]: https://www.researchgate.net/publication/220579448_Collaborative_Cross-Checking_to_Enhance_Resilience
 348 | [Resilience Engineering: New directions for measuring and maintaining safety in complex systems]: https://pdfs.semanticscholar.org/a0d3/9cc66adc64e297048a32b71aeee209a451af.pdf
 349 | [The Role of Automation in Complex System Failures]: https://www.researchgate.net/publication/232191704_The_Role_of_Automation_in_Complex_System_Failures
 350 | [New Arctic Air Crash Aftermath Role-Play Simulation Orchestrating a Fundamental Surprise]: https://www.researchgate.net/publication/2484621_New_Arctic_Air_Crash_Aftermath_Role-Play_Simulation_Orchestrating_a_Fundamental_Surprise
 351 | [Nine Steps to Move Forward From Error]: http://csel.eng.ohio-state.edu/productions/pexis/readings/submod4/nine%20steps%20CTW2002.pdf
 352 | [Gaps in the continuity of care and progress on patient safety]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1117777/
 353 | [Automation, interaction, complexity, and failure: A case study]: https://doi.org/10.1016/j.ress.2006.01.009
 354 | [Human Performance in Anesthesia]: http://dx.doi.org/10.13140/RG.2.2.29675.36648
 355 | [Resilience is not control: healthcare, crisis management, and ICT]: https://www.researchgate.net/profile/Robert-Wears/publication/225108705_Resilience_is_Not_Control_Healthcare_Crisis_Management_and_ICT/links/00b49532b2c7f3ed62000000/Resilience-is-Not-Control-Healthcare-Crisis-Management-and-ICT.pdf
 356 | [Taking Things in One’s Stride: Cognitive Features of Two Resilient Performances]: https://www.taylorfrancis.com/chapters/edit/10.1201/9781315605685-19/taking-things-one-stride-cognitive-features-two-resilient-performances-richard-cook-christopher-nemeth
 357 | [Human Performance in Anesthesia: A Corpus of Cases]: https://www.researchgate.net/publication/347964304_Human_Performance_in_Anesthesia_Human_Performance_in_Anesthesia_Human_Performance_in_Anesthesia
 358 | [Minding the Gaps: Creating Resilience in Health Care]: https://europepmc.org/article/NBK/nbk43670
 359 | [From Counting Failures to Anticipating Risks: Possible Futures for Patient Safety]: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=ffe74633027ee354ebbf0ff9a6418e75f3b7a047
 360 | [Resilience Engineering: New directions for measuring and maintaining safety in complex systems]: https://www.academia.edu/download/83819345/Resilience_Engineering_New_directions_fo20220411-23835-1ipo8pk.pdf
 361 | [Behind Human Error: Taming Complexity to Improve Patient Safety]: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=15f31969c4e1f4f599c5c68aa63f3bf930e0406f
 362 | [The Illusion of Explanation]: https://onlinelibrary.wiley.com/doi/pdf/10.1197/j.aem.2004.07.001
 363 | 
 364 | ### Selected talks
 365 | * [How Complex Systems Fail](https://www.youtube.com/watch?v=2S0k12uZR14) (Velocity 2012)
 366 | * [Resilience in Complex Adaptive Systems: Operating at the Edge of Failure](https://www.youtube.com/watch?v=PGLYEDpNu60&feature=youtu.be) (Velocity 2013)
 367 | * [Lectures on the study of cognitive work](https://www.youtube.com/playlist?list=PLb1aZTnPf3-OEU1by77zZQQYckvXUGmNY) (Graduate student lecture-discussions at The Royal Institute of Technology, Huddinge, SWEDEN in 2012 )
 368 | * [Panel discussion: Safety Culture, Lean, and DevOps] (DOES 2017)
 369 | * [Working at the center of the Cyclone](https://www.youtube.com/watch?v=3ZP98stDUf0&feature=youtu.be) (DOES 2018)
 370 | * [A Few Observations on the Marvelous Resilience of Bone & Resilience Engineering](https://www.youtube.com/watch?v=8LbePBiOvZ4) (REdeploy 2019)
 371 | 
 372 | [Panel discussion: Safety Culture, Lean, and DevOps]: https://www.youtube.com/watch?v=gtxtb9z_4FY&feature=youtu.be
 373 | 
 374 | 
 375 | ## Jean-Christophe Le Coze
 376 | 
 377 | Le Coze is research director at INERIS (National Institute for the Industrial Environment and Risks) in France.
 378 | He frequently writes on historical views of safety.
 379 | 
 380 | Le Coze tweets as [@JcLeCoze](https://twitter.com/JcLeCoze).
 381 | 
 382 | ### Selected publications
 383 | 
 384 | * [Managing the Unexpected](https://www.academia.edu/36790092/Managing_the_unexpected)
 385 | * [The 'new view' of human error. Origins, ambiguities, success and critiques](https://www.sciencedirect.com/science/article/abs/pii/S0925753522001928)
 386 | * [1984-2014. Normal Accident. Was Charles Perrow right for the wrong reasons?](https://www.academia.edu/15301538/1984_2014_Normal_Accident_Was_Charles_Perrow_right_for_the_wrong_reasons)
 387 | * [Good and bad reasons: The Swiss cheese model and its critics](https://dx.doi.org/10.1016/j.ssci.2020.104660)
 388 | * [Recurring themes in the legacy of Jens Rasmussen](https://doi.org/10.1016/j.apergo.2016.10.002)
 389 | * [Reflecting on Jens Rasmussen’s legacy. A strong program for a hard problem](https://www.sciencedirect.com/science/article/pii/S0925753514000848)
 390 | * [Reflecting on Jens Rasmussen's legacy (2) behind and beyond, a ‘constructivist turn’](https://www.sciencedirect.com/science/article/abs/pii/S0003687015300429)
 391 | 
 392 | ## Sidney Dekker
 393 | 
 394 | Dekker is a human factors and safety researcher with a background in aviation.
 395 | His books aimed at a lay audience (Drift Into Failure, Just Culture, The Field Guide to 'Human Error' investigations)
 396 | have been enormously influential. He was a founder of the MSc programme in Human Factors & Systems Safety at Lund University.
 397 | His PhD advisor is [David Woods](#david-woods).
 398 | 
 399 | Dekker tweets as [@sidneydekkercom](https://twitter.com/sidneydekkercom).
 400 | 
 401 | ### Contributions
 402 | 
 403 | #### Drift into failure
 404 | 
 405 | Dekker developed the theory of *drift*, characterized by five concepts:
 406 | 
 407 | 1. Scarcity and competition
 408 | 1. Decrementalism, or small steps
 409 | 1. Sensitive dependence on initial conditions
 410 | 1. Unruly technology
 411 | 1. Contribution of the protective structure
 412 | 
 413 | #### Just Culture
 414 | 
 415 | Dekker examines how cultural norms defining justice can be re-oriented to minimize the negative impact and maximize learning when things go wrong.
 416 | 
 417 | 1.  Retributive justice as society's traditional idea of justice:  distributing punishment to those responsible based on severity of the violation
 418 | 2.  Restorative justice as an improvement for both victims and practicioners:  distributing obligations of rebuilding trust to those responsible based on who is hurt and what they need
 419 | 3.  First, second, and third victims:  an incident's negative impact is felt by more than just the obvious victims
 420 | 4.  Learning theory:  people break rules when they have learned there are no negative consequences, and there are actually positive consequences - in other words, they break rules to get things done to meet production pressure
 421 | 5.  Reporting culture:  contributing to reports of adverse events is meant to help the organization understand what went wrong and how to prevent recurrence, but accurate reporting requires appropriate and proportionate accountability actions
 422 | 6.  Complex systems:  normal behavior of practicioners and professionals in the context of a complex system can appear abnormal or deviant in hindsight, particularly in the eyes of non-expert juries and reviewers
 423 | 7.  The nature of practicioners:  professionals want to do good work, and therefore want to be held accountable for their mistakes; they generally want to help similarly-situated professionals avoid the same mistake.
 424 | 
 425 | ### Safety Differently
 426 | 
 427 | - There is a difference between the organization's prescribed processes for completing work and how work is actually completed.  (work as imagined vs work as done)
 428 |     - The difference between work as imagined and work as done is the result of the expertise that exists in your workers from contact with real-life pressures, heuristics, and unexpected conditions.
 429 |     - Old View: People are the problem to control with process
 430 |         - They did something wrong
 431 |         - They need more rules and enforcement
 432 |         - They need to try harder
 433 |         - We need to get rid of "bad apples"
 434 |         - Focus on the "sharp end" of the organization - the people closest to the work
 435 |     - New View: Work is done adaptively in an uncertain world
 436 |         - Things go wrong all the time
 437 |         - Workers often detect and correct these problems
 438 |         - Local adaptations are a source of organizational expertise
 439 |         - "What conditions existed that made the selected course of action seem correct to the people involved?"
 440 | - Traditional safety interventions have diminishing yields with increasing overhead.  Accumulated compliance burden and "safety clutter" makes it harder to get work done *and* to do so safely.
 441 |     - Safety Clutter is accountable to safety bureaucracy and compliance rather than the safety of the workers or the process
 442 |     - Safety Clutter is produced by the "blunt end" of the organization without local expertise of what is practicable or practical in-situ
 443 |     - Safety Clutter represents a broader "deprofessionalization" - a removal of trust and confidence in professionals to do their job well, removing their pride, autonomy, and achievement.
 444 |     - Paradoxically, Safety Clutter can result from government deregulation - organizations need to self-impose risk controls in the absence of external guidelines.
 445 |     - Sadly for organizations with Safety Clutter, more internal rules do not equal better legal protection.
 446 | - When a process is relatively safe or stable, measurements of bad outcomes lack statistical significance to understand trends or tie trends to interventions.
 447 |     - Fundamental Regulator Paradox: regulating a system so well that there are no useful measurements left to understand how the system is performing
 448 |     - Zero Paradox: A study of construction contractors showed more fatal accidents in firms with "goal zero" safety policies than in those without.  Non-fatal accidents were similar.
 449 |     - Risk Secrecy: "goal zero" commitments result in injury underreporting and hiding of incidents which prevents learning, particularly when tied to financial incentives for leadership.
 450 | - There are patterns (capacities) that help things go well
 451 |     - _Diversity of opinion_ - possibility to voice dissent
 452 |     - _Keeping the discussion on risk alive_ even when things go well
 453 |     - _Deference to expertise_ that already exists in people at the sharp end
 454 |     - _Psychological safety_ / "stop" ability
 455 |     - _Low barriers_ to interaction between organizational groups
 456 |     - _Sharp end improvements_ to existing systems based on local expertise
 457 |     - _Pride in work_ - process and results
 458 | - Rapid problem-solving can prevent effective problem-understanding
 459 | - Leadership buy-in and practice of New View safety is imperative to its success.  It's also difficult to foster.
 460 |     - Worker buy-in is rapid and fits their existing mental model
 461 |     - Leadership must abandon the mental model that has governed their past work and decision-making - difficult for anyone.
 462 |     - Peer discussions are especially helpful for leadership
 463 |     - Highlighting how local adaptations helped things go well also helps
 464 | 
 465 | ### Concepts
 466 | * Drift into failure
 467 | * Safety differently
 468 | * New view vs old view of human performance & error
 469 | * Just culture
 470 | * complexity
 471 | * broken part
 472 | * Newton-Descartes
 473 | * diversity
 474 | * systems theory
 475 | * unruly technology
 476 | * decrementalism
 477 | * generic competencies
 478 | * work as imagined vs work as done
 479 | 
 480 | ### Selected publications
 481 | 
 482 | * [Drift into failure](https://www.amazon.com/Drift-into-Failure-Sidney-Dekker/dp/1409422216)
 483 | * [Reconstructing human contributions to accidents: the new view on error and performance](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.411.4985&rep=rep1&type=pdf)
 484 | * [The field guide to understanding 'human error'](https://www.amazon.com/Field-Guide-Understanding-Human-Error/dp/1472439058s://www.amazon.com/Field-Guide-Understanding-Human-Error/dp/1472439058)
 485 | * [Behind Human Error]
 486 | * [Rule- and role retreat: An empirical study of procedures and resilience](https://www.researchgate.net/publication/50917226_Rule-_and_role_retreat_An_empirical_study_of_procedures_and_resilience?enrichId=rgreq-23625e555a0d8e5250c74f24b5fd01ca-XXX&enrichSource=Y292ZXJQYWdlOzUwOTE3MjI2O0FTOjk3MzU5NjY5MjM1NzQ1QDE0MDAyMjM3NjI5NDY%3D&el=1_x_2&_esc=publicationCoverPdf)
 487 | * [Anticipating the effects of technological change: A new era of dynamics for human factors](https://www.researchgate.net/publication/247512351_Anticipating_the_effects_of_technological_change_A_new_era_of_dynamics_for_human_factors)
 488 | * [Why do things go right?](http://www.safetydifferently.com/why-do-things-go-right/)
 489 | * [Six stages to the new view of human error](http://www.humanfactors.lth.se/fileadmin/lusa/Sidney_Dekker/articles/2007/SafetyScienceMonitor.pdf)
 490 | * [Employees: A Problem to Control or Solution to Harness?](http://sidneydekker.com/wp-content/uploads/2014/08/DekkerPS2014.pdf)
 491 | * [Team Coordination in Escalating Situations: An Empirical Study Using Mid-Fidelity Simulation]
 492 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems]
 493 | * [Illusions of explanation: A critical essay on error classification](http://www.humanfactors.lth.se/fileadmin/lusa/Sidney_Dekker/articles/2003_and_before/Illusions_of_explanation.pdf)
 494 | * [Failure to adapt or adaptations that fail: contrasting models on procedures and safety](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.606.3361&rep=rep1&type=pdf)
 495 | * [Human factors and folk models]
 496 | * [The High Reliability Organization Perspective] ([TWRR](https://resilienceroundup.com/issues/09/))
 497 | * [Illusions of explanation: A critical essay on error classification](http://www.humanfactors.lth.se/fileadmin/lusa/Sidney_Dekker/articles/2003_and_before/Illusions_of_explanation.pdf) ([TWRR](https://resilienceroundup.com/issues/42/))
 498 | * [Safety II professionals: How resilience engineering can transform safety practice] ([TWRR](https://resilienceroundup.com/issues/64/))
 499 | * [The complexity of failure: implications of complexity theory for safety investigation](https://static1.squarespace.com/static/53b78765e4b0949940758017/t/5722beb0d51cd4d11675a69c/1461894833950/Dekker%2C+Cilliers+and+Hofmeyr+-+The+Complexity+of+Failure.pdf)
 500 | * [The Safety Anarchist](https://www.amazon.com/Safety-Anarchist-innovation-bureaucracy-compliance/dp/1138300462))
 501 | * [Compliance Capitalism](https://www.amazon.com/Compliance-Capitalism-Overregulated-Management-Neoliberalism/dp/1032012366)
 502 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems]
 503 | * [Drifting into failure: Complexity theory and the management of risk](https://maritimesafetyinnovationlab.org/wp-content/uploads/2021/03/DekkerDriftRiskChapter2013.pdf) ([BH 1](https://safety177496371.wordpress.com/2025/05/03/drifting-into-failure-complexity-theory-and-the-management-of-risk/) [BH 2](https://safety177496371.wordpress.com/2025/05/03/complex-systems-and-drifting-into-failure-further-extracts-from-dekker-2013/))
 504 | 
 505 | [Human factors and folk models]: https://link.springer.com/article/10.1007%2Fs10111-003-0136-9
 506 | [The High Reliability Organization Perspective]: http://sidneydekker.com/wp-content/uploads/2013/01/CH005.pdf
 507 | [Safety II professionals: How resilience engineering can transform safety practice]: https://doi.org/10.1016/j.ress.2019.106740
 508 | 
 509 | ### Selected talks
 510 | 
 511 | * [Panel discussion: Safety Culture, Lean, and DevOps]
 512 | 
 513 | 
 514 | ## John C. Doyle
 515 | 
 516 | [Doyle](http://www.cds.caltech.edu/~doyle/wiki/index.php?title=Main_Page) is a
 517 | control systems researcher. He is seeking to identify the universal laws that capture the
 518 | behavior of resilient systems, and is concerned with the architecture of such
 519 | systems.
 520 | 
 521 | ### Concepts
 522 | * Robust yet fragile
 523 | * layered architectures
 524 | * constraints that deconstrain
 525 | * protocol-based architectures
 526 | * emergent constraints
 527 | * Universal laws and architectures
 528 | * conservation laws
 529 | * universal architectures
 530 | * Highly optimized tolerance
 531 | * Doyle's catch
 532 | 
 533 | #### Doyle's catch
 534 | 
 535 | *Doyle's catch* is a term introduced by David Woods, but attributed to John Doyle. Here's how
 536 | [Woods quotes Doyle](https://www.researchgate.net/publication/303832480_The_Risks_of_Autonomy_Doyles_Catch):
 537 | 
 538 | > Computer-based simulation and rapid prototyping tools are now broadly available and powerful enough that it is
 539 | > relatively easy to demonstrate almost anything, provided that conditions are made sufficiently idealized.
 540 | > However, the real world is typically far from idealized, and thus a system must have enough robustness in order to close
 541 | > the gap between demonstration and the real thing.
 542 | 
 543 | 
 544 | ### Selected publications
 545 | 
 546 | * [Universal Laws and Architectures](http://www.cis.upenn.edu/~ngns/docs/Review_2010/Doyle%20MURI%202010.pdf) (slides)
 547 | * [Contrasting Views of Complexity and Their Implications For Network-Centric Infrastructures](http://dx.doi.org/10.1109/TSMCA.2010.2048027)
 548 | * [Architecture, constraints, and behavior](https://www.pnas.org/content/108/Supplement_3/15624)
 549 | * [The “robust yet fragile” nature of the Internet](https://doi.org/10.1073/pnas.0501426102)
 550 | * [Highly Optimized Tolerance: Robustness and Design in Complex Systems](http://dx.doi.org/10.1103/physrevlett.84.2529)
 551 | * [Robust efficiency and actuator saturation explain healthy heart rate control and variability](https://doi.org/10.1073/pnas.1401883111)
 552 | 
 553 | ## Bob Edwards
 554 | 
 555 | [Edwards](http://hopcoach.net/) is a practitioner who provides
 556 | training services in human and organizational performance (HOP).
 557 | 
 558 | Edwards tweets as [@thehopcoach](https://twitter.com/thehopcoach).
 559 | 
 560 | ## Anders Ericsson
 561 | 
 562 | Ericsson introduced the idea of *deliberate practice* as a mechanism for
 563 | achieving high level of expertise.
 564 | 
 565 | Ericsson isn't directly associated with the field of resilience engineering.
 566 | However, Gary Klein's work is informed by his, and I have a particular
 567 | interest in how people improve in expertise, so I'm including him here.
 568 | 
 569 | ### Concepts
 570 | 
 571 | * Expertise
 572 | * Deliberate practice
 573 | * Protocol analysis
 574 | 
 575 | ### Selected publications
 576 | 
 577 | * [Peak: secrets from the new science of expertise](https://www.amazon.com/Peak-Secrets-New-Science-Expertise/dp/1531864880/)
 578 | * [Protocol analysis: verbal reports as data](https://www.amazon.com/Protocol-Analysis-Revd-Verbal-Reports/dp/0262550237)
 579 | 
 580 | ## Paul Feltovich
 581 | 
 582 | [Feltovich](https://www.ihmc.us/groups/pfeltovich/) is a retired Senior Research Scientist at the Florida Institute for Human & Machine Cognition (IHMC),
 583 | who has done extensive reserach in human expertise.
 584 | 
 585 | ### Selected publications
 586 | 
 587 | * [Common Ground and Coordination in Joint Activity]
 588 | * [Issue of expert flexibility in contexts characterized by complexity and change](https://www.researchgate.net/publication/232465540_Issue_of_expert_flexibility_in_contexts_characterized_by_complexity_and_change)
 589 | * [A rose by any other name...would probably be given an acronym]
 590 | * [Learners' (mis)understanding of important and difficult concepts: a challenge to smart machines in education](https://www.researchgate.net/publication/234818797_Learners'_misunderstanding_of_important_and_difficult_concepts_a_challenge_to_smart_machines_in_education)
 591 | * [Ten challenges for making automation a team player] ([TWRR](https://resilienceroundup.com/issues/ten-challenges-for-making-automation-a-team-player-in-joint-human-agent-activity/))
 592 | * [Issue of expert flexibility in contexts characterized by complexity and change](https://www.researchgate.net/publication/232465540_Issue_of_expert_flexibility_in_contexts_characterized_by_complexity_and_change)
 593 | 
 594 | [Common Ground and Coordination in Joint Activity]: http://jeffreymbradshaw.net/publications/Common_Ground_Single.pdf
 595 | [A rose by any other name...would probably be given an acronym]: https://www.researchgate.net/publication/3454029_A_rose_by_any_other_namewould_probably_be_given_an_acronym
 596 | [Ten challenges for making automation a team player]: https://ieeexplore.ieee.org/abstract/document/1363742
 597 | 
 598 | ## Meir Finkel
 599 | 
 600 | Finkel is a Colonel in the Israeli Defense Force (IDF) and the Director of the IDF's Ground Forces Concept Development and Doctrine Department
 601 | 
 602 | ### Selected publications
 603 | * [On Flexibility: Recovery from Technological and Doctrinal Surprise on the Battlefield](https://www.amazon.com/Flexibility-Recovery-Technological-Doctrinal-Battlefield/dp/0804774897/ref=sr_1_3?ie=UTF8&qid=1546046916&sr=8-3&keywords=on+flexibility)
 604 | 
 605 | ## Marisa Grayson
 606 | 
 607 | [Grayson](https://www.linkedin.com/in/marisa-grayson/) is a cognitive systems engineer at Mile Two, LLC.
 608 | 
 609 | ### Selected Publications
 610 | 
 611 | * [Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems](https://www.researchgate.net/publication/333091997_Approaching_Overload_Diagnosis_and_Response_to_Anomalies_in_Complex_and_Automated_Production_Software_Systems)
 612 | * [Cognitive Work of Hypothesis Exploration During Anomaly Response](https://queue.acm.org/detail.cfm?id=3380778)
 613 | 
 614 | ## Ivonne Andrade Herrera
 615 | 
 616 | [Herrera](https://www.ntnu.edu/employees/ivonne.a.herrera) is an associate professor in
 617 | the department of industrial economics and technology management at NTNU and a
 618 | senior research scientist at SINTEF. Her areas of expertise include safety management and
 619 | resilience engineering in avionics and air traffic management.
 620 | 
 621 | ### Selected publications
 622 | 
 623 | * [Organisational accidents and resilient organisations: six perspectives](https://www.sintef.no/globalassets/upload/teknologi_og_samfunn/sikkerhet-og-palitelighet/rapporter/sintef-a17034-organisational-accidents-and-resilience-organisations-six-perspectives.-revision-2.pdf) (SINTEF A17034 report)
 624 | 
 625 | See also: [list of publications](https://wo.cristin.no/as/WebObjects/cristin.woa/wa/fres?sort=ar&pnr=30556&action=sok)
 626 | 
 627 | 
 628 | ## Robert Hoffman
 629 | 
 630 | [Hoffman](https://www.ihmc.us/groups/rhoffman/) is a senior research scientist at Florida Institute for Human & Machine Cognition (IHMC),
 631 | who has done extensive reserach in human expertise.
 632 | 
 633 | ### Selected publications
 634 | 
 635 | * [Measuring resilience](https://journals.sagepub.com/doi/abs/10.1177/0018720816686248)
 636 | * [Myths of automation and their implications for military procurement]
 637 | * [The Seven Deadly Myths of "Autonomous Systems"]
 638 | * [A rose by any other name...would probably be given an acronym]
 639 | * [Seeing the invisible: perceptual-cognitive aspects of expertise](https://cmapspublic3.ihmc.us/rid=1G9NSY15K-N7MJMZ-LC5/SeeingTheInvisible.pdf)
 640 | * [Toward a Theory of Complex and Cognitive Systems]
 641 | * [Macrocognition] ([TWRR](https://resilienceroundup.com/issues/62/))
 642 | 
 643 | [Myths of automation and their implications for military procurement]:https://www.researchgate.net/publication/326000581_Myths_of_automation_and_their_implications_for_military_procurement
 644 | 
 645 | [The Seven Deadly Myths of "Autonomous Systems"]: https://www.researchgate.net/publication/260304859_The_Seven_Deadly_Myths_of_Autonomous_Systems
 646 | 
 647 | [Toward a Theory of Complex and Cognitive Systems]: https://www.researchgate.net/publication/3454245_Toward_a_Theory_of_Complex_and_Cognitive_Systems
 648 | 
 649 | [Macrocognition]: https://pdfs.semanticscholar.org/df74/b2909f54b41a485cd4c0189fc4aa19d176d0.pdf
 650 | 
 651 | 
 652 | ### Concepts
 653 | 
 654 | #### Seven deadly myths of autonomous systems:
 655 | 
 656 | 1. "Autonomy" is unidimensional.
 657 | 2. The conceptualization of "levels of autonomy" is a useful scientific grounding for the development of autonomous system roadmaps.
 658 | 3. Autonomy is a widget.
 659 | 4. Autonomous systems are autonomous.
 660 | 5. Once achieved, full autonomy obviates the need for human-machine collaboration.
 661 | 6. As machines acquire more autonomy, they will work as simple sibstitutes (or multipliers) of human capability
 662 | 7. "Full autonomy"  is not only possible, but is always desireable.
 663 | 
 664 | ## Erik Hollnagel
 665 | 
 666 | ### Contributions
 667 | 
 668 | #### ETTO principle
 669 | 
 670 | Hollnagel proposed that there is always a fundamental tradeoff between
 671 | efficiency and thoroughness, which he called the *ETTO principle*.
 672 | 
 673 | #### Safety-I vs. Safety-II
 674 | 
 675 | Safety-I: avoiding things that go wrong
 676 | * looking at what goes wrong
 677 | * bimodal view of work and activities (acceptable vs unacceptable)
 678 | * find-and-fix approach
 679 | * prevent transition from 'normal' to 'abnormal'
 680 | * causality credo: believe that adverse outcomes happen because something goes
 681 |   wrong (they have causes that can be found and treated)
 682 | * it either works or it doesn't
 683 | * systems are decomposable
 684 | * functioning is bimodal
 685 | 
 686 | Safety-II: performance variability rather than bimodality
 687 | * the system’s ability to succeed under varying conditions, so that the number
 688 |   of intended and acceptable outcomes (in other words, everyday activities) is
 689 |   as high as possible
 690 | * performance is always variable
 691 | * performance variation is ubiquitous
 692 | * things that go right
 693 | * focus on frequent events
 694 | * remain sensitive to possibility of failure
 695 | * be thorough as well as efficient
 696 | 
 697 | #### FRAM
 698 | 
 699 | Hollnagel proposed the Functional Resonance Analysis Method (FRAM) for modeling
 700 | complex socio-technical systems.
 701 | 
 702 | 
 703 | #### Four abilities necessary for resilient performance
 704 | * respond
 705 | * monitor
 706 | * learn
 707 | * anticipate
 708 | 
 709 | ### Concepts
 710 | * ETTO (efficiency thoroughness tradeoff) principle
 711 | * FRAM (functional resonance analysis method)
 712 | * Safety-I and Safety-II
 713 | * things that go wrong vs things that go right
 714 | * causality credo
 715 | * performance variability
 716 | * bimodality
 717 | * emergence
 718 | * work-as-imagined vs. work-as-done
 719 | * joint cognitive systems
 720 | * systems of the first, second, third, fourth kind
 721 | 
 722 | ### Selected publications
 723 | 
 724 | * [The ETTO Principle: Efficiency-Thoroughness Trade-Off: Why Things That Go Right Sometimes Go Wrong](https://www.amazon.com/ETTO-Principle-Efficiency-Thoroughness-Trade-Off-Sometimes/dp/0754676781/ref=sr_1_1?s=books&ie=UTF8&qid=1545965837&sr=1-1&keywords=etto+principle)
 725 | * [From Safety-I to Safety-II: A White Paper](https://www.skybrary.aero/bookshelf/books/2437.pdf)
 726 | * [Safety-II in Practice](https://www.amazon.com/Safety-II-Practice-Developing-Resilience-Potentials/dp/1138708925)
 727 | * [Safety-I and Safety-II: The past and future of safety management](https://www.amazon.com/gp/product/1472423089/ref=dbs_a_def_rwt_bibl_vppi_i0)
 728 | * [FRAM: The Functional Resonance Analysis Method: Modelling Complex Socio-technical System](https://www.amazon.com/gp/product/B010WIDYE8/ref=dbs_a_def_rwt_bibl_vppi_i15)
 729 | * [Joint Cognitive Systems: Patterns in Cognitive Systems Engineering](https://www.amazon.com/gp/product/0849339332/ref=x_gr_w_bb?ie=UTF8&tag=x_gr_w_bb-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0849339332&SubscriptionId=1MGPYB6YW3HWK55XCGG2)
 730 | * [Resilience Engineering: Concepts and Precepts]
 731 | * [I want to believe: some myths about the management of industrial safety](http://dx.doi.org/10.1007/s10111-012-0237-4)
 732 | * [Resilience engineering – Building a Culture of Resilience](http://www.ptil.no/getfile.php/1325150/PDF/Seminar%202013/Integrerte%20operasjoner/Hollnagel_RIO_presentation.pdf) (slides)
 733 | * [Anomaly Response]
 734 | * [Cognitive Systems Engineering: New wine in new bottles] ([TWRR](https://www.getrevue.co/profile/resilience/issues/resilience-roundup-cognitive-systems-engineering-new-wine-in-new-bottles-issue-32-175912))
 735 | * [Epilogue: Resilience Engineering Precepts](https://www.researchgate.net/publication/265074845_Epilogue_Resilience_Engineering_Precepts)
 736 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems]
 737 | * [Resilience Engineering](https://erikhollnagel.com/ideas/resilience-engineering.html) (web essay)
 738 | * [RAG - Resilience Analysis Grid](http://erikhollnagel.com/onewebmedia/RAG%20Outline%20V2.pdf)
 739 | * [Resilience engineering in practice: a guidebook]
 740 | * [Mapping Cognitive Demands in Complex Problem-Solving Worlds] (mentions disturbance management)
 741 | * [Human factors and folk models]
 742 | * [Designing for joint cognitive systems](https://www.researchgate.net/publication/4213914_Designing_for_joint_cognitive_systems)
 743 | * [Macrocognition] ([TWRR](https://resilienceroundup.com/issues/62/))
 744 | * [A day when (Almost) nothing happened](https://www.sciencedirect.com/science/article/abs/pii/S0925753521004719)
 745 | * [Minding the Gaps: Creating Resilience in Health Care]
 746 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems]
 747 | * [Understanding Accidents - From Root Causes to Performance Variability](https://www.researchgate.net/publication/3973687_Understanding_accidents-from_root_causes_to_performance_variability) ([BH](https://safety177496371.wordpress.com/2025/03/12/understanding-accidents-from-root-causes-to-performance-variability/))
 748 | 
 749 | 
 750 | [Resilience Engineering: Concepts and Precepts]: https://www.amazon.com/gp/product/B009KNDF64/ref=x_gr_w_glide_bb?ie=UTF8&tag=x_gr_w_glide_bb-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=B009KNDF64&SubscriptionId=1MGPYB6YW3HWK55XCGG2
 751 | [Anomaly Response]: https://docs.wixstatic.com/ugd/3ad081_f46dda684154447583c8a5b282b60cc2.pdf
 752 | [Cognitive Systems Engineering: New wine in new bottles]: https://www.ida.liu.se/~729A15/mtrl/CSEnew.pdf?utm_campaign=Resilience%20Roundup&utm_medium=email&utm_source=Revue%20newsletter
 753 | [Resilience Roundup]: https://resilienceroundup.com/
 754 | [Mapping Cognitive Demands in Complex Problem-Solving Worlds]: https://www.researchgate.net/publication/220108174_Mapping_Cognitive_Demands_in_Complex_Problem-Solving_Worlds<Paste>
 755 | 
 756 | ## Leila Johannesen
 757 | 
 758 | [Johannesen](https://www.linkedin.com/in/leilajohannesen/) is currently a UX researcher and community advocate at IBM.
 759 | Her PhD dissertation work examined how humans cooperate, including studies of anesthesiologists.
 760 | 
 761 | ### Concepts
 762 | 
 763 | * common ground
 764 | 
 765 | ### Selected publications
 766 | 
 767 | * [Grounding explanations in evolving, diagnostic situations]
 768 | * [Maintaining common ground: an analysis of cooperative communication in the operating room](https://www.abdn.ac.uk/iprc/documents/Communication%20Book%20Chapter.pdf)
 769 | * [Behind Human Error]
 770 | 
 771 | 
 772 | ## Gary Klein
 773 | 
 774 | Klein studies how experts are able to quickly make effective decisions in high-tempo situations.
 775 | 
 776 | Klein tweets as [@KleInsight](https://twitter.com/KleInsight).
 777 | 
 778 | ### Concepts
 779 | 
 780 | * naturalistic decision making (NDM)
 781 | * intuitive expertise
 782 | * cognitive task analysis
 783 | * common ground
 784 | * problem detection
 785 | * automation as a "team player"
 786 | 
 787 | ### Selected publications
 788 | 
 789 | * [Sources of power: how people make decisions](https://www.amazon.com/gp/product/0262534290/ref=dbs_a_def_rwt_bibl_vppi_i0)
 790 | * [Common Ground and Coordination in Joint Activity]
 791 | * [Working minds: a practitioner's guide to cognitive task analysis](https://www.amazon.com/gp/product/0262532816/ref=dbs_a_def_rwt_bibl_vppi_i5)
 792 | * [Patterns in Cooperative Cognition](https://www.researchgate.net/publication/262449980_Patterns_in_Cooperative_Cognition)
 793 | * [Can We Trust Best Practices? Six Cognitive Challenges of Evidence-Based Approaches]
 794 | * [Conditions for intuitive expertise: a failure to disagree](http://dx.doi.org/10.1037/a0016755)
 795 | * [Problem detection]
 796 | * [Ten challenges for making automation a team player] ([TWRR](https://resilienceroundup.com/issues/66))
 797 | * [Decision making in action: models and methods](http://www.macrocognition.com/documents/Decision-Making-in-Action-Models-and-Methods-0316.pdf)
 798 | * [Critical decision method for eliciting knowledge](https://ieeexplore.ieee.org/document/31053)
 799 | * [A recognition-primed decision (RPD) model of rapid decision making](https://pdfs.semanticscholar.org/0672/092ecc507fb41d81e82d2986cf86c4bff14f.pdf)
 800 | * [Seeing the invisible: perceptual-cognitive aspects of expertise](https://cmapspublic3.ihmc.us/rid=1G9NSY15K-N7MJMZ-LC5/SeeingTheInvisible.pdf)
 801 | * [Patterns in Cooperative Cognition]
 802 | * [The strengths and limitations of teams for detecting problems](https://link.springer.com/article/10.1007/s10111-005-0024-6)
 803 | * [Macrocognition] ([TWRR](https://resilienceroundup.com/issues/62/))
 804 | 
 805 | [Problem detection]: https://www.researchgate.net/publication/220579480_Problem_detection
 806 | [Patterns in Cooperative Cognition]: https://www.researchgate.net/publication/262449980_Patterns_in_Cooperative_Cognition
 807 | [Can We Trust Best Practices? Six Cognitive Challenges of Evidence-Based Approaches]: https://journals.sagepub.com/doi/abs/10.1177/1555343416637520?journalCode=edma
 808 | 
 809 | ### Selected talks
 810 | 
 811 | * [Problem detection](https://www.youtube.com/watch?v=UXx51qK4ItQ&feature=emb_title)
 812 | 
 813 | ## Elizabeth Lay
 814 | 
 815 | Elizabeth Lay is a resilience engineering practitioner. She is currently a director of safety and human performance at Lewis Tree Service.
 816 | 
 817 | ### Selected publications
 818 | 
 819 | * [Noticing Brittleness, Designing for Resilience]
 820 | * [A practitioner’s experiences operationalizing Resilience Engineering]
 821 | 
 822 | ## Nancy Leveson
 823 | 
 824 | Nancy Leveson is a computer science researcher with a focus in software safety.
 825 | 
 826 | ### Contributions
 827 | 
 828 | #### STAMP
 829 | 
 830 | Leveson developed the accident causality model known as STAMP: the Systems-Theoretic Accident Model and Process.
 831 | 
 832 | See [STAMP](STAMP.md) for some more detailed notes of mine.
 833 | 
 834 | ### Concepts
 835 | 
 836 | * Software safety
 837 | * STAMP (systems-theoretic accident model and processes)
 838 | * STPA (system-theoretic process analysis) hazard analysis technique
 839 | * CAST (causal analysis based on STAMP) accident analysis technique
 840 | * Systems thinking
 841 | * hazard
 842 | * interactive complexity
 843 | * system accident
 844 | * dysfunctional interactions
 845 | * safety constraints
 846 | * control structure
 847 | * dead time
 848 | * time constants
 849 | * feedback delays
 850 | 
 851 | ### Selected publications
 852 | * [A New Accident Model for Engineering Safer Systems](http://sunnyday.mit.edu/accidents/safetyscience-single.pdf)
 853 | * [Engineering a safer world](https://mitpress.mit.edu/books/engineering-safer-world)
 854 | * [STPA Handbook](http://psas.scripts.mit.edu/home/get_file.php?name=STPA_handbook.pdf)
 855 | * [Safeware](https://www.amazon.com/Safeware-Computers-Nancy-G-Leveson/dp/0201119722)
 856 | * [Resilience Engineering: Concepts and Precepts](https://www.amazon.com/gp/product/B009KNDF64/ref=x_gr_w_glide_bb?ie=UTF8&tag=x_gr_w_glide_bb-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=B009KNDF64&SubscriptionId=1MGPYB6YW3HWK55XCGG2)
 857 | * [High-pressure steam engines and computer software](http://dx.doi.org/10.1145/143062.143076)
 858 | * [Resilience Engineering: Concepts and Precepts]
 859 | 
 860 | ## Carl Macrae
 861 | 
 862 | [Macrae](https://www.nottingham.ac.uk/business/people/lizcjm.html) is a social psychology
 863 | researcher who has done safety research in multiple domains, including aviation
 864 | and healthcare. He helped set up the new healthcare investigation agency in
 865 | England. He is currently a professor of organizational behavior and psychology
 866 | at the Notthingham University Business School.
 867 | 
 868 | Macrae tweets at [@CarlMacrae](https://twitter.com/CarlMacrae).
 869 | 
 870 | ### Concepts
 871 | 
 872 | * risk resilience
 873 | 
 874 | ### Selected publications
 875 | 
 876 | * [Close calls](http://www.closecalls.cc/)
 877 | * [Early warnings, weak signals and learning from healthcare disasters](https://qualitysafety.bmj.com/content/23/6/440)
 878 | 
 879 | ## Laura Maguire
 880 | 
 881 | [Maguire](https://www.linkedin.com/in/lauramaguire/) is a cognitive systems
 882 | engineering researcher with a PhD from Ohio State
 883 | University. Maguire has done safety work in multiple domains, including
 884 | forestry, avalanches, and software services. She currently works as a researcher
 885 | at [jeli.io](jeli.io)
 886 | 
 887 | Maguire tweets as [@LauraMDMaguire](https://twitter.com/lauramdmaguire).
 888 | 
 889 | ### Selected publications
 890 | 
 891 | * [Managing the Hidden Costs of Coordination](https://queue.acm.org/detail.cfm?id=3380779)
 892 | * [Controlling the Costs of Coordination in Large-scale Distributed Software Systems](http://rave.ohiolink.edu/etdc/view?acc_num=osu1593661547087969) (PhD dissertation)
 893 | * [Howie: The Post-Incident Guide](https://www.jeli.io/howie-the-post-incident-guide/)
 894 | 
 895 | ### Selected talks
 896 | 
 897 | * [How Many Is Too Much? Exploring Costs of Coordination During Outages](https://www.infoq.com/presentations/incident-command-system/)
 898 | * [Mental models – why saying “I didn’t know it worked that way” is a sign of expertise not incompetence](https://www.youtube.com/watch?v=VEprjLtHzg0)
 899 | * [Operating at the edge of the envelope](https://re-deploy.io/videos/27-maguire.html)
 900 | 
 901 | ## Christopher Nemeth
 902 | 
 903 | [Nemeth](https://www.linkedin.com/in/christopher-nemeth-6651204) is a principal scientist at Applied Resesarch Associates, Inc.
 904 | 
 905 | ### Selected publications
 906 | 
 907 | * [Replacing Hindsight With Insight: Toward Better Understanding of Diagnostic Failures]
 908 | * [Resilience is not control: healthcare, crisis management, and ICT]
 909 | * [Taking Things in One’s Stride: Cognitive Features of Two Resilient Performances]
 910 | * [Minding the Gaps: Creating Resilience in Health Care]
 911 | 
 912 | 
 913 | [Replacing Hindsight With Insight: Toward Better Understanding of Diagnostic Failures]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.458.7283&rep=rep1&type=pdf
 914 | 
 915 | ## Anne-Sophie Nyssen
 916 | 
 917 | [Nyssen](http://www.lecit.ulg.ac.be/equipe/anne-sophie-nyssen/) is a psychology professor at the University of Liège,
 918 | who does research on human error in complex systems, in particular in medicine.
 919 | 
 920 | A list of publications can be found on her website linked above.
 921 | 
 922 | 
 923 | ## Elinor Ostrom
 924 | 
 925 | [Ostrom](http://www.elinorostrom.com/) was a Nobel-prize winning economics and
 926 | political science researcher.
 927 | 
 928 | ### Selected publications
 929 | * [Coping with tragedies of the commons](https://www.annualreviews.org/doi/abs/10.1146/annurev.polisci.2.1.493)
 930 | * [Governing the Commons: The Evolution of Institutions for Collective Action](https://www.amazon.com/Governing-Commons-Evolution-Institutions-Collective/dp/1107569788)
 931 | 
 932 | ### Concepts
 933 | 
 934 | * tragedy of the commons
 935 | * polycentric governance
 936 | * social-ecological system framework
 937 | 
 938 | ## Jean Pariès
 939 | 
 940 | Pariès is the president of [Dédale](http://www.dedale.net/dedale_en/), a safety and human factors consultancy.
 941 | 
 942 | ### Selected publications
 943 | * [Resilience engineering in practice: a guidebook]
 944 | 
 945 | 
 946 | [Resilience engineering in practice: a guidebook]: https://www.crcpress.com/Resilience-Engineering-in-Practice-A-Guidebook/Paries-Wreathall-Hollnagel/p/book/9781472420749
 947 | ### Selected talks
 948 | 
 949 | * [Predicting The fatal flaws: The challenge of The unpredictable...](paries-keynote-2015.pptx)
 950 | 
 951 | ## Emily Patterson
 952 | 
 953 | [Patterson](https://hrs.osu.edu/faculty-and-staff/faculty-directory/patterson-emily)
 954 | is a researcher who applies human factors engineering to improve patient safety
 955 | in healthcare.
 956 | 
 957 | ### Selected publications
 958 | 
 959 | * [Patient boarding in the emergency department as a symptom of complexity-induced risks](https://www.researchgate.net/publication/312624891_Patient_boarding_in_the_emergency_department_as_a_symptom_of_complexity-induced_risks)
 960 | * [Using observational study as a tool for discovery: uncovering cognitive and collaborative demands and adaptive strategies]
 961 | * [Voice Loops as Coordination Aids in Space Shuttle Mission Control]
 962 | * [Functionally distributed coordination during anomaly response in space shuttle mission control]
 963 | * [Patterns in Cooperative Cognition]
 964 | * [Collaborative Cross-Checking to Enhance Resilience] ([TWRR](https://resilienceroundup.com/issues/73/))
 965 | * [New Arctic Air Crash Aftermath Role-Play Simulation Orchestrating a Fundamental Surprise]
 966 | * [Handoff strategies in settings with high consequences for failure: lessons for health care operations] ([TWRR](https://resilienceroundup.com/issues/56))
 967 | * [How Unexpected Events Produce An Escalation Of Cognitive And Coordinative Demands] ([TWRR](https://resilienceroundup.com/issues/how-unexpected-events-produce-an-escalation-of-cognitive-and-coordinative-demands/))
 968 | * [Communication Strategies from High-reliability Organizations: Translation is Hard Work](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1876978/) ([TWRR](https://resilienceroundup.com/issues/communication-strategies-from-high-reliability-organizations-translation-is-hard-work/))
 969 | * [Understanding rigor in information analysis]
 970 | * [Behind Human Error: Taming Complexity to Improve Patient Safety]
 971 | 
 972 | [Using observational study as a tool for discovery: uncovering cognitive and collaborative demands and adaptive strategies]: https://www.researchgate.net/profile/Emily_Patterson2/publication/237138704_USING_OBSERVATIONAL_STUDY_AS_A_TOOL_FOR_DISCOVERY_UNCOVERING_COGNITIVE_AND_COLLABORATIVE_DEMANDS_AND_ADAPTIVE_STRATEGIES/links/0deec52c8e310b385a000000.pdf
 973 | 
 974 | [Voice Loops as Coordination Aids in Space Shuttle Mission Control]: https://www.semanticscholar.org/paper/Voice-Loops-as-Coordination-Aids-in-Space-Shuttle-Patterson-Watts-Perotti/068dfee1a859a63fa2ef82f008d239e6a81ed004
 975 | 
 976 | [Functionally distributed coordination during anomaly response in space shuttle mission control]: https://www.researchgate.net/publication/3657906_Functionally_distributed_coordination_during_anomaly_response_inspace_shuttle_mission_control
 977 | 
 978 | [How Unexpected Events Produce An Escalation Of Cognitive And Coordinative Demands]: http://csel.eng.ohio-state.edu/productions/laws/laws_mediapaper/2_4_escalation.pdf
 979 | 
 980 | [Handoff strategies in settings with high consequences for failure: lessons for health care operations]: https://www.researchgate.net/publication/8648890_Handoff_strategies_in_settings_with_high_consequences_for_failure_Lessons_for_health_care_operations
 981 | 
 982 | [Understanding rigor in information analysis]: https://www.researchgate.net/publication/228809190_Understanding_rigor_in_information_analysis
 983 | 
 984 | 
 985 | 
 986 | ## Charles Perrow
 987 | 
 988 | Perrow is a sociologist who studied the Three Mile Island disaster.  "Normal Accidents" is cited by numerous other influential systems engineering publications such as [Vaughan's](#diane-vaughan) "The Challenger Launch Decision".
 989 | 
 990 | ### Concepts
 991 | * Complex systems: A system of tightly-coupled components with common mode connections that is prone to unintended feedback loops, complex controls, low observability, and poorly-understood mechanisms.  They are not always high-risk, and thus their failure is not always catastrophic.
 992 | * Normal accidents: Complex systems with many components exhibit unexpected interactions in the face of inevitable component failures.  When these components are tightly-coupled, failed parts cannot be isolated from other parts, resulting in unpredictable system failures.  Crucially, adding more safety devices and automated system controls often makes these coupling problems worse.
 993 | * Common-mode:  The failure of one component that serves multiple purposes results in multiple associated failures, often with high interactivity and low linearity - both ingredients for unexpected behavior that is difficult to control.
 994 | * Production pressures and safety:  Organizations adopt processes and devices to improve safety and efficiency, but production pressure often defeats any safety gained from the additions:  the safety devices allow or encourage more risky behavior.  As an unfortunate side-effect, the system is now also more complex.
 995 | 
 996 | ### Selected publications
 997 | * [Normal Accidents: Living With High-Risk Technologies](https://www.amazon.com/Normal-Accidents-Living-Technologies-Updated-ebook/dp/B00CHRINUI)
 998 | 
 999 | ## Shawna J. Perry
1000 | 
1001 | Perry is a medical researcher who studies emergency medicine.
1002 | 
1003 | ### Concepts
1004 | * Underground adaptations
1005 | * Articulated functions vs. important functions
1006 | * Unintended effects
1007 | * Apparent success vs real success
1008 | * Exceptions
1009 | * Dynamic environments
1010 | 
1011 | ### Selected publications
1012 | 
1013 | * [Underground adaptations: case studies from health care](https://doi.org/10.1007/s10111-011-0207-2)
1014 | * [Can We Trust Best Practices? Six Cognitive Challenges of Evidence-Based Approaches]
1015 | * [The Role of Automation in Complex System Failures]
1016 | * [Extemporaneous Adaptation to Evolving Complexity: A Case Study of Resilience in Healthcare] ([TWRR](https://resilienceroundup.com/issues/55))
1017 | * [Automation, interaction, complexity, and failure: A case study]
1018 | 
1019 | ### Other
1020 | 
1021 | * [Interview on Naturalistic Decision Making podcast](https://open.spotify.com/episode/7lHcgt2KuDoLyvTP9wMbEn?si=nPIyk9L8QB2Iuck2fKKrNA)
1022 | 
1023 | 
1024 | 
1025 | [Extemporaneous Adaptation to Evolving Complexity: A Case Study of Resilience in Healthcare]: https://pdfs.semanticscholar.org/1423/f18530599b9de186af0eee4852bb7e619384.pdf
1026 | 
1027 | ## Jens Rasmussen
1028 | 
1029 | Jens Rasmussen was an enormously influential researcher in human factors and safety systems. In particular, you can see his influence in the work of Sidney Dekker, Nancy Leveson, David Woods.
1030 | 
1031 | ### Contributions
1032 | 
1033 | #### Skill-rule-knowledge (SKR) model
1034 | 
1035 | Rasmussen proposed three models of human performance.
1036 | 
1037 | **Skill-based** behavior doesn't require conscious attention. The prototypical example is riding a bicycle.
1038 | 
1039 | **Rule-based** behavior is based on a set of rules that we have internalized in
1040 | advance. We select which rule to use based on experience, and then carry it
1041 | out. An example would be: if threads are blocked, restart the server. You can think of rule-based behavior as a memorized runbook.
1042 | 
1043 | **Knowledge-based** behavior comes into play when facing an unfamiliar
1044 | situation. The person generates a set of plans based on their understanding of
1045 | the environment, and then selects which one to use. The challenging incidents
1046 | are the ones that require knowledge-based behavior to resolve.
1047 | 
1048 | He also proposed three types of information that humans process as they perform work.
1049 | 
1050 | **Signals**. Example: weather vane
1051 | 
1052 | **Signs**. Example: stop sign
1053 | 
1054 | **Symbols**. Example: written language
1055 | 
1056 | #### Abstraction hierarchy
1057 | 
1058 | Rasmussen proposed a model of how operators reason about the behavior of a
1059 | system they are supervising called the *abstraction hierarchy*.
1060 | The levels in the hierarchy are
1061 | 
1062 | 1. functional purpose
1063 | 2. abstract functions
1064 | 3. general functions
1065 | 4. physical funcitons
1066 | 5. physical form
1067 | 
1068 | The hierarchy forms a means-ends relationship: proper function is described top-down (ends), and problems are explained bottom-up (means)
1069 | 
1070 | 
1071 | #### Dynamic safety model
1072 | 
1073 | Rasmussen proposed a state-based model of a socio-technical system as a system
1074 | that moves within a region of a state space. The region is surrounded by
1075 | different boundaries:
1076 | 
1077 | * economic failure
1078 | * unacceptable work load
1079 | * functionality acceptable performance
1080 | 
1081 | ![Migration to the boundary](boundary.png)
1082 | 
1083 | Source: [Risk management in a dynamic society: a modelling problem]
1084 | 
1085 | Incentives push the system towards the boundary of acceptable performance:
1086 | accidents happen when the boundary is exceeded.
1087 | 
1088 | 
1089 | #### AcciMaps
1090 | 
1091 | The AcciMaps approach is a technique for reasoning about the causes of an accident, using a diagram.
1092 | 
1093 | 
1094 | #### Risk management framework
1095 | 
1096 | Rasmussen proposed a multi-layer view of socio-technical systems:
1097 | 
1098 | ![Risk management framework](risk-management-framework.png)
1099 | 
1100 | Source: [Risk management in a dynamic society: a modelling problem]
1101 | 
1102 | ### Concepts
1103 | * Dynamic safety model
1104 | * Migration toward accidents
1105 | * Risk management framework
1106 | * Boundaries:
1107 |     - boundary of functionally acceptable performance
1108 |     - boundary to economic failure
1109 |     - boundary to unacceptable work load
1110 | * Cognitive systems engineering
1111 | * Skill-rule-knowledge (SKR) model
1112 | * AcciMaps
1113 | * Means-ends hierarchy
1114 | * Ecological interface design
1115 | * Systems approach
1116 | * Control-theoretic
1117 | * decisions, acts, and errors
1118 | * hazard source
1119 | * anatomy of accidents
1120 | * energy
1121 | * systems thinking
1122 | * trial and error experiments
1123 | * defence in depth (fallacy)
1124 | * Role of managers
1125 | 	- Information
1126 | 	- Competency
1127 | 	- Awareness
1128 | 	- Commitment
1129 | * Going solid
1130 | * observability
1131 | 
1132 | ### Selected publications
1133 | * [Mental procedures in real-life tasks: a case study of electronic trouble shooting](https://www.tandfonline.com/doi/abs/10.1080/00140137408931355) (1974)
1134 | * [Coping with complexity](https://orbit.dtu.dk/en/publications/coping-with-complexity)
1135 | * [Skills, rules, and knowledge; signals, signs, and symbols, and other distinctions in human performance models](https://www.iwolm.com/wp-content/downloads/SkillsRulesAndKnowledge-Rasmussen.pdf)
1136 | * [“Going solid”: a model of system dynamics and consequences for patient safety](https://qualitysafety.bmj.com/content/14/2/130)
1137 | * [Human error and the problem of causality in analysis of accidents](https://www.ida.liu.se/~729A71/Literature/Human%20Error_T/Rasmussen_1990.pdf) ([TWRR](https://resilienceroundup.com/issues/human-error-and-the-problem-of-causality-in-analysis-of-accidents/))
1138 | * [Human Errors: A Taxonomy for Describing Human Malfunction in Industrial Installations](https://backend.orbit.dtu.dk/ws/portalfiles/portal/158020073/ERTAX1.PDF)
1139 | * [Ecological interfaces: A technological imperative in high‐tech systems](https://core.ac.uk/download/pdf/13788397.pdf)
1140 | * [Information processing and human-machine interaction: an approach to cognitive engineering](https://www.amazon.com/Information-Processing-Human-Machine-Interaction-North-Holland/dp/0444009876)
1141 | * [The role of hierarchical knowledge representation in decisionmaking and system management](https://backend.orbit.dtu.dk/ws/files/158019622/HISMC.PDF)
1142 | * [A Model of Human Decision Making in Complex Systems and its Use for Design of System Control Strategies](https://core.ac.uk/download/pdf/13777954.pdf)
1143 | * [The role of error in organizing behaviour](https://qualitysafety.bmj.com/content/qhc/12/5/377.full.pdf) ([TWRR](https://resilienceroundup.com/issues/the-role-of-error-in-organizing-behaviour/))
1144 | * [Information processing and human-machine interaction](https://www.osti.gov/biblio/7011990-information-processing-human-machine-interaction)
1145 | * [Risk management in a dynamic society: a modelling problem]
1146 | * [Proactive risk management in a dynamic society](https://rib.msb.se/Filer/pdf/16252.pdf)
1147 | * [Trends in Human Reliability Analysis](https://backend.orbit.dtu.dk/ws/portalfiles/portal/137294535/TREND.PDF)
1148 | * [The role of hierarchical knowledge representation in decisionmaking and system management](https://backend.orbit.dtu.dk/ws/portalfiles/portal/158019622/HISMC.PDF)
1149 | * [The role of error in organizing behaviour](https://backend.orbit.dtu.dk/ws/portalfiles/portal/137538698/ERRROLE_1_.PDF)
1150 | * [Human error and the problem of causality in analysis of accidents](https://backend.orbit.dtu.dk/ws/portalfiles/portal/158018718/ROYSOC.PDF)
1151 | * [Coping with human errors through system design: implications for ecological interface design](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=5fb7644d205b342aa52c594b7982a9e208086238)
1152 | * [Graphic representation of accident scenarios: mapping system structure and the causation of accidents](https://www.sciencedirect.com/science/article/abs/pii/S0925753500000369)
1153 | * [Diagnostic reasoning in action](https://backend.orbit.dtu.dk/ws/portalfiles/portal/158017532/DIAACT.PDF)
1154 | * [A framework for cognitive task analysis in systems design](https://orbit.dtu.dk/en/publications/a-framework-for-cognitive-task-analysis-in-systems-design)
1155 | * [Analysis of human errors in industrial incidents and accidents for mprovement of work safety](https://backend.orbit.dtu.dk/ws/portalfiles/portal/158019864/LEPRAS.PDF)
1156 | * [Why do complex organizational systems fail?](https://documents1.worldbank.org/curated/ru/535511468766200820/pdf/multi0page.pdf)
1157 | * [Notes on human error analysis and prediction](https://orbit.dtu.dk/en/publications/notes-on-human-error-analysis-and-prediction)
1158 | 
1159 | (These are written but others about Rasmussen's work)
1160 | * [Recurring themes in the legacy of Jens Rasmussen](https://www.sciencedirect.com/science/article/abs/pii/S0003687016302150?via%3Dihub) - special issue of Applied Ergonomics
1161 | * [Reflecting on Jens Rasmussen’s legacy. A strong program for a hard problem](https://doi.org/10.1016/j.ssci.2014.03.015) ([my notes](https://github.com/lorin/booknotes/blob/master/papers/Reflecting-on-Jens-Rasmussens-Legacy.md))
1162 | * [Reflecting on Jens Rasmussen's legacy (2) behind and beyond, a ‘constructivist turn’](https://doi.org/10.1016/j.apergo.2015.07.013)
1163 | * [Musings on Models and the Genius of Jens Rasmussen](https://www.sciencedirect.com/science/article/abs/pii/S0003687015301009?via%3Dihub)
1164 | 
1165 | [Risk management in a dynamic society: a modelling problem]: https://doi.org/10.1016/S0925-7535(97)00052-0
1166 | 
1167 | ## Mike Rayo
1168 | 
1169 | Rayo is the Director of the Cognitive Systems Engineering Laboratory at the Ohio State University.
1170 | 
1171 | ### Concepts
1172 | 
1173 | * SCAD (Systematic Contributors Analysis and Diagram)
1174 | 
1175 | ### Selected Publications
1176 | 
1177 | * [Developing Systemic Contributors and Adaptations Diagramming (SCAD): systemic insights, multiple pragmatic implementations]
1178 | * [Multiple Systemic Contributors versus Root Cause: Learning from a NASA Near Miss](https://www.researchgate.net/publication/308194080_Multiple_Systemic_Contributors_versus_Root_Cause_Learning_from_a_NASA_Near_Miss)
1179 | 
1180 | [Developing Systemic Contributors and Adaptations Diagramming (SCAD): systemic insights, multiple pragmatic implementations]: https://journals.sagepub.com/doi/10.1177/1071181322661334
1181 | 
1182 | ## James Reason
1183 | 
1184 | Reason is a psychology researcher who did work on understanding and categorizing human error.
1185 | 
1186 | ### Contributions
1187 | 
1188 | #### Accident causation model (Swiss cheese model)
1189 | 
1190 | Reason developed an accident causation model that is sometimes known as the *swiss cheese* model of accidents.
1191 | In this model, Reason introduced the terms "sharp end" and "blunt end".
1192 | 
1193 | #### Human Error model: Slips, lapses and mistakes
1194 | 
1195 | Reason developed a model of the types of errors that humans make:
1196 | 
1197 | * slips
1198 | * lapses
1199 | * mistakes
1200 | 
1201 | ### Concepts
1202 | 
1203 | * Blunt end
1204 | * Human error
1205 | * Slips, lapses and mistakes
1206 | * Swiss cheese model
1207 | 
1208 | ### Selected publications
1209 | 
1210 | * [Human error]
1211 | * [Organizational Accidents Revisited](https://www.amazon.com/Organizational-Accidents-Revisited-James-Reason/dp/1472447689)
1212 | 
1213 | [Human error]: https://www.amazon.com/gp/product/0521314194/ref=dbs_a_def_rwt_bibl_vppi_i0
1214 | 
1215 | ## J. Paul Reed
1216 | 
1217 | [Reed](https://jpaulreed.com/) is a Senior Applied Resilience engineer at Netflix and runs [REdeploy](https://re-deploy.io), a conference focused on Resilience Engineering in the software development and operations industry.
1218 | 
1219 | Reed tweets as [@jpaulreed](https://twitter.com/jpaulreed).
1220 | 
1221 | ### Selected Publications
1222 | 
1223 | * [Maps, Context, and Tribal Knowledge: On the Structure and Use of Post-Incident Analysis Artifacts in Software Development and Operations](https://lup.lub.lu.se/student-papers/search/publication/8966930j
1224 | * [Beyond the "Fix-it" Treadmill](https://queue.acm.org/detail.cfm?id=3380780d)
1225 | 
1226 | 
1227 | ### Concepts
1228 | 
1229 | * [Blame "Aware"](https://jpaulreed.com/blame-aware) (versus "Blameless") Culture
1230 | * Postmortem Artifact _Archetypes_
1231 | 
1232 | ## Emilie M. Roth
1233 | 
1234 | [Roth](http://www.rothsite.com/resume.html) is a cognitive psychologist who
1235 | serves as the principal scientist at [Roth Cognitive Engineering](http://www.rothsite.com/), a small
1236 | company that conducts research and application in the areas of human factors
1237 | and applied cognitive psychology (cognitive engineering)
1238 | 
1239 | ### Selected publications
1240 | 
1241 | * [Uncovering the Requirements of Cognitive Work](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.564.2044) ([TWRR](https://www.getrevue.co/profile/resilience/issues/resilience-roundup-uncovering-the-requirements-of-cognitive-work-issue-30-173410))
1242 | * [Using observational study as a tool for discovery: uncovering cognitive and collaborative demands and adaptive strategies]
1243 | * [Handoff strategies in settings with high consequences for failure: lessons for health care operations] ([TWRR](https://resilienceroundup.com/issues/56))
1244 | * [Bootstrapping multiple converging cognitive task analysis techniques for system design] ([TWRR](https://resilienceroundup.com/issues/70))
1245 | 
1246 | ### Other
1247 | 
1248 | * [Interview on Naturalistic Decision Making podcast](https://open.spotify.com/episode/3XqAhdpyrszLoB59VcRJWG)
1249 | 
1250 | ## Nadine Sarter
1251 | 
1252 | [Sarter](https://ioe.engin.umich.edu/people/nadine-sarter/) is a researcher in industrial and operations engineering.
1253 | She is the director of the Center for Ergonomics at the University of Michigan.
1254 | 
1255 | ### Concepts
1256 | 
1257 | * cognitive ergonomics
1258 | * organization safety
1259 | * human-automation/robot interaction
1260 | * human error / error management
1261 | * attention / interruption management
1262 | * design of decision support systems
1263 | 
1264 | 
1265 | ### Selected publications
1266 | 
1267 | * [Learning from Automation Surprises and "Going Sour" Accidents: Progress on Human-Centered Automation](https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19980016965.pdf)
1268 | * [Behind Human Error]
1269 | * [Designed-Induced Error and Error-Informed Design: A Two-Way Street](https://www.amazon.com/Cognitive-Systems-Engineering-Expertise-Applications-ebook/dp/B076TDR6H9/ref=sr_1_1?keywords=cognitive+systems+engineering&qid=1554075974&s=gateway&sr=8-1)
1270 | * [The Critical Incident Technique: A Method for Identifying System Strengths and Weaknesses Based on Observational Data](https://www.taylorfrancis.com/books/e/9780429134845)
1271 | * [Myths of automation and their implications for military procurement]
1272 | * [Automation surprises]
1273 | * [Team Play with a Powerful and Independent Agent: A Full-Mission Simulation Study] ([TWRR](https://resilienceroundup.com/issues/team-play-with-a-powerful-and-independent-agent-a-full-mission-simulation-study/))
1274 | 
1275 | [Bootstrapping multiple converging cognitive task analysis techniques for system design]: https://www.researchgate.net/publication/313737506_Bootstrapping_multiple_converging_cognitive_task_analysis_techniques_for_system_design
1276 | [Automation surprises]: https://www.researchgate.net/publication/270960170_Automation_surprises
1277 | [Team Play with a Powerful and Independent Agent: A Full-Mission Simulation Study]: https://www.researchgate.net/publication/12195752_Team_Play_with_a_Powerful_and_Independent_Agent_A_Full-Mission_Simulation_Study
1278 | 
1279 | ## James C. Scott
1280 | 
1281 | Scott is an anthropologist who also does research in political science. While
1282 | Scott is not a member of a resilience engineering community, his book *Seeing
1283 | like a state* has long been a staple of the cognitive systems engineering and
1284 | resilience engineering communities.
1285 | 
1286 | ### Concepts
1287 | 
1288 | * authoritarian high-modernism
1289 | * legibility
1290 | * mētis
1291 | 
1292 | ### Selected publications
1293 | 
1294 | * [Seeing like a state: how certain schemes to improve the human condition have failed](https://www.amazon.com/Seeing-like-State-Certain-Condition/dp/0300078153/ref=sr_1_1)
1295 | 
1296 | 
1297 | ## Steven Shorrock
1298 | 
1299 | Shorrock is a chartered psychologist and a chartered ergonomist and human
1300 | factors specialist. He is the editor-in-chief of EUROCONTROL
1301 | [HindSight](https://www.skybrary.aero/index.php/HindSight_-_EUROCONTROL)
1302 | magazine. He runs the excellent [Humanistic Systems](https://humanisticsystems.com/) blog.
1303 | 
1304 | Shorrock tweets as [@StevenShorrock](https://twitter.com/StevenShorrock).
1305 | 
1306 | ### Selected publications
1307 | 
1308 | * [Systems Thinking for Safety: Ten Principles A White Paper Moving towards Safety-II](https://skybrary.aero/sites/default/files/bookshelf/2882.pdf)
1309 | * [Human Factors and Ergonomics in Practice: Improving System Performance and Human Well-Being in the Real World](https://www.crcpress.com/Human-Factors-and-Ergonomics-in-Practice-Improving-System-Performance-and/Shorrock-Williams/p/book/9781472439253) (book)
1310 | * [State of science: evolving perspectives on ‘human error’](https://doi.org/10.1080/00140139.2021.1953615)
1311 | 
1312 | ### Selected talks
1313 | 
1314 | [Life After Human Error](https://www.youtube.com/watch?v=STU3Or6ZU60) (Velocity Europe 2014 keynote)
1315 | 
1316 | ## Diane Vaughan
1317 | 
1318 | Vaughan is a sociology researcher who did a famous study of the NASA Challenger accident, concluding that it was the result of organizational failure rather than a technical failure.  Specifically, production pressure overrode the rigorous scientific safety culture in place at NASA.
1319 | 
1320 | ### Concepts
1321 | 
1322 | * Structural Secrecy:  Organizational structure, processes, and information exchange patterns can systematically undermine the ability to "see the whole picture" and conceal risky decisions.
1323 | * Social Construction of Risk:  Out of the necessity to balance risk with the associated reward, any group of people will develop efficient heuristics to solve the problems they face.  The understanding of risk that faces one subgroup may not match that of another subgroup or of the whole group.  The ability of an individual to change a social construction of risk, formed over years with good intentions and often with evidence, is limited.  (Though the evidence is usually accurate, the conclusion might not be, leading to an inadvertent scientific paradigm.)
1324 | * Normalization of Deviance:  During operation of a complex system, inadvertent deviations from system design may occur and not result in a system failure.  Because the intial construction of risk is usually conservative, the deviation is seen as showing that the system and its redundancies "worked", leading to a new accepted safe operating envelope.
1325 | * Signals of potential danger:  Information gained through the operation of a system that may indicate the system does not work as designed.  Most risk constructions are based on a comprehensive understanding of the operation of the system, so information to the contrary is a sign that the system could leave the safe operation envelope in unexpected ways - a danger.
1326 | * Weak signals, mixed signals, missed signals:  signals of potential danger that have been interpreted as non-threats or acceptable risk because at the time they didn't represent a clear and present danger sufficient to overcome the Social Construction of Risk.  Often, post-hoc, these are seen as causes due to cherry-picking - such signals were ignored before with no negative consequences.
1327 | * Competition for Scarce Resources:  An ongoing need to justify investment to customers leads to Efficiency-Thoroughness Tradeoffs (ETTOs).  In NASA's case, justifying the cost of the Space Shuttle program to taxpayers and their congressional representatives meant pressure to quickly develop payload delivery capability at the lowest cost possible.
1328 | * Belief in Redundancy:  Constructing risk from a signal of potential danger such that a redundant subsystem becomes part of the normal operating strategy for a primary subsystem.  In NASA's case, signals that the primary O-ring assembly did not operate as expected formed an acceptable risk because a secondary O-ring would contain a failure.  Redundancy was eliminated from the design in this construction of risk - the secondary system now became part of the primary system, eliminating system redundancy.
1329 | 
1330 | ### Selected publications
1331 | 
1332 | * [The Challenger Launch Decision: Risky Technology, Culture, and Deviance at
1333 |   NASA](https://www.amazon.com/Challenger-Launch-Decision-Technology-Deviance/dp/022634682X/ref=sr_1_1?ie=UTF8&qid=1545966442&sr=8-1&keywords=diane+vaughan)
1334 | 
1335 | ## Barry Turner
1336 | 
1337 | [Turner](https://www.tandfonline.com/doi/pdf/10.1080/10245289508523441) was a sociologist who greatly influenced the field of organization studies.
1338 | 
1339 | ### Selected publications
1340 | 
1341 | * [Man-made disasters](https://www.amazon.com/Man-Made-Disasters-Second-Barry-Turner/dp/0750620870/ref=sr_1_1)
1342 | 
1343 | ## Robert L. Wears
1344 | 
1345 | [Wears](https://en.wikipedia.org/wiki/Robert_Wears) was a medical researcher who also had a PhD in industrial safety.
1346 | 
1347 | ### Concepts
1348 | 
1349 | * Underground adaptations
1350 | * Articulated functions vs. important functions
1351 | * Unintended effects
1352 | * Apparent success vs real success
1353 | * Exceptions
1354 | * Dynamic environments
1355 | * Systems of care are intrinsically hazardous
1356 | 
1357 | ### Selected publications
1358 | 
1359 | * [The error of counting "errors"](https://linkinghub.elsevier.com/retrieve/pii/S0196064408006070) [BH](https://safety177496371.wordpress.com/2023/09/20/the-error-of-counting-errors/)
1360 | * [Underground adaptations: case studies from health care](https://doi.org/10.1007/s10111-011-0207-2)
1361 | * [Fundamental On Situational Surprise: A Case Study With Implications For Resilience](https://books.openedition.org/pressesmines/1122)
1362 | * [Replacing Hindsight With Insight: Toward Better Understanding of Diagnostic Failures]
1363 | * [Seeing patient safety ‘Like a State’](http://dx.doi.org/10.1016%2Fj.ssci.2014.02.007)
1364 | * [Fundamental On Situational Surprise: A Case Study With Implications For Resilience](https://books.openedition.org/pressesmines/1122?lang=en)
1365 | * [The Role of Automation in Complex System Failures]
1366 | * [Exploring the Dynamics of Resilient Performance](https://pastel.archives-ouvertes.fr/pastel-00664145/document)
1367 | * [Extemporaneous Adaptation to Evolving Complexity: A Case Study of Resilience in Healthcare] ([TWRR](https://resilienceroundup.com/issues/55))
1368 | * [Automation, interaction, complexity, and failure: A case study]
1369 | * [Resilience is not control: healthcare, crisis management, and ICT]
1370 | * [The Secret Life of Policies](https://www.annemergmed.com/article/S0196-0644(17)30874-0/fulltext)
1371 | * [The tragedy of adaptability](https://www.annemergmed.com/article/S0196-0644(13)01554-0/abstract) [BH](https://safety177496371.wordpress.com/2021/04/19/the-tragedy-of-adaptability/)
1372 | * [Relying on resilience: too much of a good thing?](https://www.taylorfrancis.com/chapters/edit/10.1201/9781315605722-11/relying-resilience-much-good-thing-robert-wears-charles-vincent) [BH](https://safety177496371.wordpress.com/2024/03/20/relying-on-resilience-too-much-of-a-good-thing/)
1373 | * [Replacing hindsight with insight: toward better understanding of diagnostic failures] [BH](https://safety177496371.wordpress.com/2024/02/26/replacing-hindsight-with-insight-toward-better-understanding-of-diagnostic-failures/)
1374 | * [The science of human factors: separating fact from fiction](https://safety177496371.wordpress.com/2024/10/29/the-science-of-human-factors-separating-fact-from-fiction/) [BH](https://safety177496371.wordpress.com/2024/10/29/the-science-of-human-factors-separating-fact-from-fiction/)
1375 | * [Resilience skills as emergent phenomena: A study of emergency departments in Brazil and the United States](https://doi.org/10.1016/j.apergo.2016.02.012) [BH](https://safety177496371.wordpress.com/2023/01/20/resilience-skills-as-emergent-phenomena-a-study-of-emergency-departments-in-brazil-and-the-united-states/)
1376 | * [Our current approach to root cause analysis: is it contributing to our failure to improve patient safety?](https://qualitysafety.bmj.com/content/26/5/381) [BH](https://safety177496371.wordpress.com/2021/03/18/our-current-approach-to-root-cause-analysis-is-it-contributing-to-our-failure-to-improve-patient-safety/)
1377 | * [Error Reduction and Performance Improvement in the Emergency Department through Formal Teamwork Training: Evaluation Results of the MedTeams Project](https://pmc.ncbi.nlm.nih.gov/articles/PMC1464040/) [BH](https://safety177496371.wordpress.com/2021/03/18/our-current-approach-to-root-cause-analysis-is-it-contributing-to-our-failure-to-improve-patient-safety/)
1378 | * [In situ simulation: detection of safety threats and teamwork training in a high risk emergency department](https://www.academia.edu/download/85660593/468.full.pdf)
1379 | * [“Safeware”: Safety-Critical Computing and Health Care Information Technology](https://europepmc.org/article/nbk/nbk43774)
1380 | * [The Illusion of Explanation]
1381 | * [Thick Versus Thin: Description Versus Classification in Learning From Case Reviews](https://www.annemergmed.com/article/S0196-0644(07)01451-5/fulltext)
1382 | * [Safety, Error, and Resilience: a Meta-narrative Review](https://www.resilience-engineering-association.org/download/resources/symposium/symposium_2015/Wears_R.-Sutcliffe_K.-Safety-error-and-resilience-a-meta-narrative-review-Paper.pdf)
1383 | 
1384 | ### Selected talks
1385 | 
1386 | * [Design of resilient systems](https://www.youtube.com/watch?v=nV52yh6GDMg)
1387 | 
1388 | 
1389 | ## David Woods
1390 | 
1391 | [Woods](https://u.osu.edu/csel/member-directory/david-woods/) has a research background in cognitive systems engineering and did work
1392 | researching NASA accidents.  He is one of the founders [Adaptive Capacity
1393 | Labs](http://www.adaptivecapacitylabs.com/), a resilience engineering
1394 | consultancy.
1395 | 
1396 | Woods tweets as [@ddwoods2](https://twitter.com/ddwoods2).
1397 | 
1398 | ### Contributions
1399 | 
1400 | Woods has contributed an enormous number of concepts.
1401 | 
1402 | #### The adaptive universe
1403 | 
1404 | Woods uses *the adaptive universe* as a lens for understanding the behavior of
1405 | all different kinds of systems.
1406 | 
1407 | All systems exist in a dynamic environment, and must adapt to change.
1408 | 
1409 | A successful system will need to adapt by virtue of its success.
1410 | 
1411 | Systems can be viewed as units of adaptive behavior (UAB) that interact. UABs
1412 | exist at different scales (e.g., cell, organ, individual, group, organization).
1413 | 
1414 | All systems have competence envelopes, which are constrained by boundaries.
1415 | 
1416 | The resilience of a system is determined by how it behaves when it comes near
1417 | to a boundary.
1418 | 
1419 | See [Resilience Engineering Short Course](https://www.youtube.com/playlist?list=PLvlZBj1NU_ikTy1ot30EbEbYMAoBf9eAt) for more details.
1420 | 
1421 | #### Charting adaptive cycles
1422 | 
1423 | * Trigger
1424 | * Units of adaptive behavior
1425 | * Goals and goal conflicts
1426 | * Pressure points
1427 | * Subcycles
1428 | 
1429 | ### Graceful extensibility
1430 | 
1431 | From [The theory of graceful extensibility: basic rules that govern adaptive systems]:
1432 | 
1433 | (Longer wording)
1434 | 
1435 | 1. Adaptive capacity is finite
1436 | 2. Events will produce demands that challenge boundaries on the adaptive
1437 |    capacity of any UAB
1438 | 3. Adaptive capacities are regulated to manage the risk of saturating CfM
1439 | 4. No UAB can have sufficient ability to regulate CfM to manage the risk of saturation alone
1440 | 5. Some UABs monitor and regulate the CfM of other UABs in response to changes
1441 |    in the risk of saturation
1442 | 6. Adaptive capacity is the potential for adjusting patterns of action to
1443 |    handle future situations, events, opportunities and disruptions
1444 | 7. Performance of a UAB as it approaches saturation is different from the
1445 |    performance of that UAB when it operates far from saturation
1446 | 8. All UABs are local
1447 | 9. There are bounds on the perspective any UAB, but these limits are overcome
1448 |    by shifts and contrasts over multiple perspectives.
1449 | 10. Reflective systems risk mis-calibration
1450 | 
1451 | (Shorter wording)
1452 | 
1453 | 1. Boundaries are universal
1454 | 2. Surprise occurs, continuously
1455 | 3. Risk of saturation is monitored and regulated
1456 | 4. Synchronization across multiple units of adaptive behavior in a network is necessary
1457 | 5. Risk of saturation can be shared
1458 | 6. Pressure changes what is sacrificed when
1459 | 7. Pressure for optimality undermines graceful extensibility
1460 | 8. All adaptive units are local
1461 | 9. Perspective contrast overcomes bounds
1462 | 10. Mis-calibration is the norm
1463 | 
1464 | For more details, see [summary of graceful extensibility theorems](graceful-extensibility.md).
1465 | 
1466 | ### SCAD (Systemic Contributors Analysis and Diagram)
1467 | 
1468 | (tbd)
1469 | 
1470 | ### Concepts
1471 | 
1472 | Many of these are mentioned in Woods's [short course](https://www.youtube.com/playlist?list=PLvlZBj1NU_ikTy1ot30EbEbYMAoBf9eAt).
1473 | 
1474 | * adaptive capacity
1475 | * adaptive universe
1476 | * unit of adaptive behavior (UAB), adaptive unit
1477 | * continuous adaptation
1478 | * graceful extensibility
1479 | * sustained adaptability
1480 | * Tangled, layered networks (TLN)
1481 | * competence envelope
1482 | * adaptive cycles/histories
1483 | * precarious present (unease)
1484 | * resilient future
1485 | * tradeoffs, five fundamental
1486 | * efflorescence: the degree that changes in one area tend to recruit or open up
1487 |   beneficial changes in many other aspects of the network - which opens new
1488 |   opportunities across the network ...
1489 | * reverberation
1490 | * adaptive stalls
1491 | * borderlands
1492 | * anticipate
1493 | * synchronize
1494 | * proactive learning
1495 | * initiative
1496 | * reciprocity
1497 | * SNAFUs
1498 | * robustness
1499 | * surprise
1500 | * dynamic fault management
1501 | * software systems as "team players"
1502 | * multi-scale
1503 | * brittleness
1504 | * how adaptive systems fail (see: [How do systems manage their adaptive capacity to successfully handle disruptions? A resilience engineering perspective])
1505 |     - decompensation
1506 |     - working at cross-purposes
1507 |     - getting stuck in outdated behaviors
1508 | * proactive learning vs getting stuck
1509 | * oversimplification
1510 | * fixation
1511 | * fluency law, veil of fluency
1512 | * capacity for manoeuvre (CfM)
1513 | * crunches
1514 | * turnaround test
1515 | * sharp end, blunt end
1516 | * adaptive landscapes
1517 | * law of stretched systems: Every system is continuously stretched to operate at capacity.
1518 | * cascades
1519 | * adapt how to adapt
1520 | * unit working hard to stay in control
1521 | * you can monitor how hard you're working to stay in control (monitor risk of saturation)
1522 | * reality trumps algorithms
1523 | * stand down
1524 | * time matters
1525 | * Properties of resilient organizations
1526 |     - Tangible experience with surprise
1527 |     - uneasy about the precarious present
1528 |     - push initiative down
1529 |     - reciprocity
1530 |     - align goals across multiple units
1531 | * goal conflicts, goal interactions (follow them!)
1532 | * to understand system, must study it under load
1533 | * adaptive races are unstable
1534 | * adaptive traps
1535 | * roles, nesting of
1536 | * hidden interdependencies
1537 | * net adaptive value
1538 | * matching tempos
1539 | * tilt toward florescence
1540 | * linear simplification
1541 | * common ground
1542 | * problem detection
1543 | * joint cognitive systems
1544 | * automation as a "team player"
1545 | * "new look"
1546 | * sacrifice judgment
1547 | * task tailoring
1548 | * substitution myth
1549 | * observability
1550 | * directability
1551 | * directed attention
1552 | * inter-predictability
1553 | * error of the third kind: solving the wrong problem
1554 | * buffering capacity
1555 | * context gap
1556 | * Norbert's contrast
1557 | * anomaly response
1558 | * automation surprises
1559 | * disturbance management
1560 | * Doyle's catch
1561 | * Cooperative advocacy
1562 | 
1563 | ### Selected publications
1564 | 
1565 | * [Resilience Engineering: Concepts and Precepts](https://www.amazon.com/gp/product/B009KNDF64/ref=x_gr_w_glide_bb?ie=UTF8&tag=x_gr_w_glide_bb-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=B009KNDF64&SubscriptionId=1MGPYB6YW3HWK55XCGG2)
1566 | * [Prologue: Resilience Engineering Concepts](http://erikhollnagel.com/onewebmedia/Prologue.pdf)
1567 | * [Epilogue: Resilience Engineering Precepts](https://www.researchgate.net/publication/265074845_Epilogue_Resilience_Engineering_Precepts)
1568 | * [Resilience is a verb](https://www.researchgate.net/publication/329035477_Resilience_is_a_Verb)
1569 | * [Four concepts for resilience and the implications for the future of resilience engineering](https://www.researchgate.net/publication/276139783_Four_concepts_for_resilience_and_the_implications_for_the_future_of_resilience_engineering) ([TWRR](https://resilienceroundup.com/issues/65))
1570 | * [Basic patterns in how adaptive systems fail](https://www.researchgate.net/publication/284324002_Basic_patterns_in_how_adaptive_systems_fail) ([TWRR](https://resilienceroundup.com/issues/34/))
1571 | * [Resilience and the ability to anticipate](https://www.researchgate.net/publication/285487326_Resilience_and_the_ability_to_anticipate) ([TWRR](https://resilienceroundup.com/issues/resilience-and-the-ability-to-anticipate/))
1572 | * [Distancing through differencing: An obstacle to organizational learning following accidents](https://www.researchgate.net/publication/292504703_Distancing_through_differencing_An_obstacle_to_organizational_learning_following_accidents)
1573 | * [Essential characteristics of resilience](https://www.researchgate.net/publication/284328979_Essential_characteristics_of_resilience)
1574 | * [Essentials of resilience, revisited](https://www.researchgate.net/publication/330116587_4_Essentials_of_resilience_revisited) ([TWRR](https://resilienceroundup.com/issues/71/))
1575 | * [Learning from Automation Surprises and "Going Sour" Accidents: Progress on Human-Centered Automation](https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19980016965.pdf)
1576 | * [Behind Human Error]
1577 | * [Joint Cognitive Systems: Patterns in Cognitive Systems Engineering](https://www.amazon.com/gp/product/0849339332/ref=x_gr_w_bb?ie=UTF8&tag=x_gr_w_bb-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0849339332&SubscriptionId=1MGPYB6YW3HWK55XCGG2)
1578 | * [Patterns in Cooperative Cognition](https://www.researchgate.net/publication/262449980_Patterns_in_Cooperative_Cognition)
1579 | * [Origins of cognitive systems engineering](https://www.researchgate.net/publication/298793082_Origins_of_Cognitive_Systems_Engineering)
1580 | * [Incidents - markers of resilience or brittleness?](https://www.researchgate.net/publication/292504952_Incidents_-_markers_of_resilience_or_brittleness) [BH](https://safety177496371.wordpress.com/2023/12/18/incidents-markers-of-resilience-or-brittleness/)
1581 | * [The alarm problem and directed attention in dynamic fault management](https://www.researchgate.net/publication/40961767_The_Alarm_problem_and_directed_attention_in_dynamic_fault_management)
1582 | * [Can We Trust Best Practices? Six Cognitive Challenges of Evidence-Based Approaches]
1583 | * [Operating at the Sharp End: The Complexity of Human Error](https://www.researchgate.net/publication/313407259_Operating_at_the_Sharp_End_The_Complexity_of_Human_Error)
1584 | * [The theory of graceful extensibility: basic rules that govern adaptive systems]
1585 | * [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems](https://www.researchgate.net/publication/220628177_Beyond_Simon%27s_Slice_Five_Fundamental_Trade-Offs_that_Bound_the_Performance_of_Macrocognitive_Work_Systems) ([TWRR](https://resilienceroundup.com/issues/five-fundamental-trade-offs-in-cognitive-work/))
1586 | * [Anticipating the effects of technological change: A new era of dynamics for human factors](https://www.researchgate.net/publication/247512351_Anticipating_the_effects_of_technological_change_A_new_era_of_dynamics_for_human_factors)
1587 | * [Common Ground and Coordination in Joint Activity]
1588 | * [Resilience as Graceful Extensibility to Overcome Brittleness](https://www.irgc.org/wp-content/uploads/2016/04/Woods-Resilience-as-Graceful-Extensibility-to-Overcome-Brittleness-1.pdf)
1589 | * [Resilience Engineering: Redefining the Culture of Safety and Risk Management](http://ordvac.com/soro/library/Aviation/Aviation%20Safety/General%20Safety%20Articles/resilience%20engineering%20bulletin.pdf)
1590 | * [Problem detection]
1591 | * [Cognitive consequences of clumsy automation on high workload, high consequence human performance]
1592 | * [Implications of automation surprises in aviation for the future of total intravenous anesthesia (TIVA)]
1593 | * [Ten challenges for making automation a team player] ([TWRR](https://resilienceroundup.com/issues/66))
1594 | * [The Messy Details: Insights From the Study of Technical Work in Healthcare]
1595 | * [Nosocomial automation: technology-induced complexity and human performance]
1596 | * [Human-centered software agents: Lessons from clumsy automation](http://www.ifp.illinois.edu/nsfhcs/abstracts/woods.txt)
1597 | * [STELLA: Report from the SNAFUcatchers Workshop on Coping with Complexity](https://snafucatchers.github.io/)
1598 | * [The New Look at Error, Safety, and Failure: A Primer for Health Care]
1599 | * [Grounding explanations in evolving, diagnostic situations]
1600 | * [Resilience Engineering: Concepts and Precepts]
1601 | * [A Tale of Two Stories: Contrasting Views of Patient Safety]
1602 | * [Voice Loops as Coordination Aids in Space Shuttle Mission Control]
1603 | * [The Critical Incident Technique: 40 Years Later](https://journals.sagepub.com/doi/abs/10.1177/154193129403801702)
1604 | * [Functionally distributed coordination during anomaly response in space shuttle mission control]
1605 | * [Cooperative Advocacy: An Approach for Integrating Diverse Perspectives in Anomaly Response](https://www.researchgate.net/publication/225211285_Cooperative_Advocacy_An_Approach_for_Integrating_Diverse_Perspectives_in_Anomaly_Response)
1606 | * [Visual momentum: A concept to improve the cognitive coupling of person and computer](https://www.researchgate.net/publication/222737388_Visual_Momentum_A_Concept_to_Improve_the_Cognitive_Coupling_of_Person_and_Computer)
1607 | * [Cognitive demands and activities in dynamic fault management: abductive reasoning and disturbance management](https://www.researchgate.net/publication/262401824_Cognitive_demands_and_activities_in_dynamic_fault_management_abductive_reasoning_and_disturbance_management)
1608 | * [Coping with complexity: the psychology of human behaviour in complex systems](https://www.researchgate.net/publication/238727732_Coping_with_Complexity_The_psychology_of_human_behavior_in_complex_systems) ([TWRR](https://resilienceroundup.com/issues/coping-with-complexity/))
1609 | * [Process Tracing Methods for The Study of Cognition Outside of the Experimental Laboratory. In Klein GA, Orasanu J, Calderwood R, Zsambok CE, eds. Decision making in action: Models](https://www.researchgate.net/profile/David_Woods11/publication/232513565_Process-tracing_methods_for_the_study_of_cognition_outside_of_the_experimental_psychology_laboratory/links/00b7d53988a2f7a7f8000000.pdf)
1610 | * [Towards a theoretical base for representation design in the computer medium: ecological perception and aiding human cognition](https://www.researchgate.net/publication/239059408_Towards_a_theoretical_base_for_representation_design_in_the_computer_medium_ecological_perception_and_aiding_human_cognition)
1611 | * [Perspectives on Human Error: Hindsight Biases and Local Rationality]
1612 | * [Anomaly Response]
1613 | * [The Risks of Autonomy: Doyle's Catch](https://www.researchgate.net/publication/303832480_The_Risks_of_Autonomy_Doyles_Catch) ([TWRR](https://resilienceroundup.com/issues/the-risks-of-autonomy-doyles-catch/))
1614 | * [Mistaking Error]
1615 | * [Adapting to new technology in the operating room]
1616 | * [The Strategic Agility Gap: How organizations are slow and stale to adapt in turbulent worlds](https://www.researchgate.net/publication/330196218_The_Strategic_Agility_Gap_How_organizations_are_slow_and_stale_to_adapt_in_turbulent_worlds)
1617 | * [Resiliency Trade Space Study: The Interaction of Degraded C2 Link and Detect and Avoid Autonomy on Unmanned Aircraft](https://www.researchgate.net/publication/330222613_Resiliency_Trade_Space_Study_The_Interaction_of_Degraded_C2_Link_and_Detect_and_Avoid_Autonomy_on_Unmanned_Aircraft)
1618 | * [Cognitive Technologies: The Design of Joint Human-Machine Cognitive Systems](https://www.researchgate.net/publication/220604613_Cognitive_Technologies_The_Design_of_Joint_Human-Machine_Cognitive_Systems)
1619 | * [Cognitive Systems Engineering: New wine in new bottles] ([TWRR](https://resilienceroundup.com/issues/32/))
1620 | * [The Seven Deadly Myths of "Autonomous Systems"]
1621 | * [Resilience and the ability to anticipate](https://www.researchgate.net/publication/285487326_Resilience_and_the_ability_to_anticipate)
1622 | * [Patterns in Cooperative Cognition]
1623 | * [Collaborative Cross-Checking to Enhance Resilience] ([TWRR](https://resilienceroundup.com/issues/73/))
1624 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems]
1625 | * [A rose by any other name...would probably be given an acronym]
1626 | * [How do systems manage their adaptive capacity to successfully handle disruptions? A resilience engineering perspective](https://www.researchgate.net/publication/286581322_How_do_systems_manage_their_adaptive_capacity_to_successfully_handle_disruptions_A_resilience_engineering_perspective)
1627 | * [How Unexpected Events Produce An Escalation Of Cognitive And Coordinative Demands] ([TWRR](https://resilienceroundup.com/issues/how-unexpected-events-produce-an-escalation-of-cognitive-and-coordinative-demands/))
1628 | * [How to Make Automated Systems Team Players](https://www.researchgate.net/profile/David_Woods11/publication/2483863_How_to_Make_Automated_Systems_Team_Players/links/5a4f829eaca272940bf8202c/How-to-Make-Automated-Systems-Team-Players.pdf)
1629 | * [Toward a Theory of Complex and Cognitive Systems]
1630 | * [Multiple systemic contributors versus root cause: learning from a NASA Near Miss](https://www.researchgate.net/publication/308194080_Multiple_Systemic_Contributors_versus_Root_Cause_Learning_from_a_NASA_Near_Miss)
1631 | * [Bootstrapping multiple converging cognitive task analysis techniques for system design] ([TWRR](https://resilienceroundup.com/issues/70))
1632 | * [New Arctic Air Crash Aftermath Role-Play Simulation Orchestrating a Fundamental Surprise]
1633 | * [Mapping Cognitive Demands in Complex Problem-Solving Worlds] (mentions disturbance management)
1634 | * [Fixation Errors: Failures to Revise Situation Assessment in Dynamic and Risky Systems](https://www.researchgate.net/publication/290071190_Fixation_Errors_Failures_to_Revise_Situation_Assessment_in_Dynamic_and_Risky_Systems)
1635 | * [Nine Steps to Move Forward From Error] [BH](https://safety177496371.wordpress.com/2022/11/03/nine-steps-to-move-forward-from-error/)
1636 | * [Handoff strategies in settings with high consequences for failure: lessons for health care operations] ([TWRR](https://resilienceroundup.com/issues/56))
1637 | * [The High Reliability Organization Perspective] ([TWRR](https://resilienceroundup.com/issues/09/))
1638 | * [Automation surprises]
1639 | * [Safety II professionals: How resilience engineering can transform safety practice] ([TWRR](https://resilienceroundup.com/issues/64/))
1640 | * [Gaps in the continuity of care and progress on patient safety]
1641 | * [Systems with human monitors: a signal detection analysis](https://www.researchgate.net/publication/250890631_Systems_with_Human_Monitors_A_Signal_Detection_Analysis)
1642 | * [On taking human performance seriously](https://www.sciencedirect.com/science/article/abs/pii/095183209090022F), 1990
1643 | * [Beyond surge: Coping with mass burn casualty in the closest hospital to the Formosa Fun Coast Dust Explosion]
1644 | * [Designing for Expertise](https://www.researchgate.net/publication/284173210_Designing_for_Expertise)
1645 | * [Steering the Reverberations of Technology Change on Fields of Practice: Laws that Govern Cognitive Work](https://www.researchgate.net/publication/334267822_Steering_the_Reverberations_of_Technology_Change_on_Fields_of_Practice_Laws_that_Govern_Cognitive_Work) ([TWRR](https://resilienceroundup.com/issues/steering-the-reverberations-of-technology-change-on-fields-of-practice-laws-that-govern-cognitive-work/))
1646 | * [Distant Supervision–Local Action Given the Potential for Surprise](https://www.researchgate.net/profile/David_Woods11/publication/225921479_Distant_Supervision-Local_Action_Given_the_Potential_for_Surprise/links/0a85e53baa4009ad8e000000/Distant-Supervision-Local-Action-Given-the-Potential-for-Surprise.pdf) ([TWRR](https://resilienceroundup.com/issues/75/))
1647 | * [Coping With a Mass Casualty: Insights into a Hospital’s Emergency Response and Adaptations After the Formosa Fun Coast Dust Explosion] ([TWRR](https://resilienceroundup.com/issues/76/))
1648 | * [A Shared Pilot-Autopilot Control Architecture for Resilient Flight](http://aaclab.mit.edu/resources/FarjadianAnnaswamyWoods2019.pdf) ([TWRR](https://resilienceroundup.com/issues/a-shared-pilot-autopilot-control-architecture-for-resilient-flight/))
1649 | * [Team Play with a Powerful and Independent Agent: A Full-Mission Simulation Study] ([TWRR](https://resilienceroundup.com/issues/team-play-with-a-powerful-and-independent-agent-a-full-mission-simulation-study/))
1650 | * [How Not to Have to Navigate Through Too Many Displays](https://www.researchgate.net/publication/239030256_How_Not_to_Have_to_Navigate_Through_Too_Many_Displayjjs)
1651 | * [Discovering How Distributed Cognitive Systems Work](https://www.researchgate.net/publication/251196422_Discovering_How_Distributed_Cognitive_Systems_Work)
1652 | * [Human Performance in Anesthesia]
1653 | * [Creating Foresight: Lessons for Enhancing Resilience from Columbia](https://www.researchgate.net/profile/David-Woods-19/publication/jjgg255648297_Creating_Foresight_Lessons_for_Enhancing_Resilience_from_Columbia/links/542becf50cf29bbc126ac095/Creating-Foresight-Lessons-for-Enhancing-Resilience-from-Columbia.pdf)
1654 | * [Inventing the Future of Cognitive Work: Navigating the "Northwest Passage"](http://faculty.washington.edu/roesler/publications/design_cycle2005.pdf)
1655 | * [A practitioner’s experiences operationalizing Resilience Engineering]
1656 | * [Understanding rigor in information analysis]
1657 | * [Human Performance in Anesthesia: A Corpus of Cases]
1658 | * [Minding the Gaps: Creating Resilience in Health Care]
1659 | * [From Counting Failures to Anticipating Risks: Possible Futures for Patient Safety]
1660 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems]
1661 | * [Behind Human Error: Taming Complexity to Improve Patient Safety]
1662 | * [Escaping failures of foresight](https://www.researchgate.net/publication/239357782_Escaping_failures_of_foresight)
1663 | 
1664 | [The theory of graceful extensibility: basic rules that govern adaptive systems]: https://www.researchgate.net/publication/327427067_The_Theory_of_Graceful_Extensibility_Basic_rules_that_govern_adaptive_systems
1665 | [How do systems manage their adaptive capacity to successfully handle disruptions? A resilience engineering perspective]: https://www.researchgate.net/publication/286581322_How_do_systems_manage_their_adaptive_capacity_to_successfully_handle_disruptions_A_resilience_engineering_perspective
1666 | 
1667 | ### Selected talks
1668 | 
1669 | * [Overview of resilience engineering](https://www.youtube.com/watch?v=GnVXfgC-5Jw&feature=youtu.be)
1670 | * [Creating safety by engineering resilience](https://vimeo.com/104759707)
1671 | * [The Mystery of Sustained Adaptability](https://www.youtube.com/watch?v=7STcaWjJoww)
1672 | * [Resilience is a verb](https://www.youtube.com/watch?v=V2qj5gMsjrU)
1673 | * [Complexity workshop keynote](https://www.youtube.com/watch?v=KJJ2NCjc2Wg)
1674 | * [De-Confounding Reliability, Robustness, and Resilience](https://www.youtube.com/watch?v=QSiXEZLZ1y0&t=6s)
1675 | * [2003 Senate Hearing testimony](https://www.c-span.org/video/?c4531343/user-clip-david-woods-senate-hearing)
1676 | * [Shock and Resilience](https://www.youtube.com/watch?v=ZuLUp94wki4)
1677 | * [Hedging bets](https://www.youtube.com/watch?v=vlYtd-eUjY8)
1678 | * [REA 2021](https://youtu.be/OwdMgEf2MsA)
1679 | * [Adobe Summit Talk: Why do reliable systems fail?](https://www.youtube.com/watch?v=fbwDnpuys7w)
1680 | 
1681 | ### Online courses
1682 | 
1683 | * [Cognitive Systems Engineering Laboratory's (CSEL) Resilience Engineering 101 Series](https://resiliencefoundations.github.io/video-1-introduction-pt-1-it's-all-about-viability.html)
1684 | * [Resilience Engineering: An Introductory Short Course](https://www.youtube.com/playlist?list=PLvlZBj1NU_ikTy1ot30EbEbYMAoBf9eAt)
1685 | 
1686 | ## John Wreathall
1687 | 
1688 | Wreathall is an expert in human performance in safety. He works at the
1689 | [WreathWood Group](http://www.wreathall.com/), a risk and safety studies
1690 | consultancy.
1691 | Wreathall tweets as [@wreathall](https://twitter.com/wreathall).
1692 | 
1693 | ### Selected publications
1694 | * [Resilience engineering in practice: a guidebook](https://www.crcpress.com/Resilience-Engineering-in-Practice-A-Guidebook/Paries-Wreathall-Hollnagel/p/book/9781472420749)
1695 | 


--------------------------------------------------------------------------------
/STAMP.md:
--------------------------------------------------------------------------------
  1 | # STAMP
  2 | 
  3 | ## Introduction
  4 | 
  5 | STAMP (Systems-Theoretic Accident Model and Processes) is an accident model
  6 | developed by Prof. Nancy Leveson of MIT. It was designed for designing
  7 | safety-critical systems. 
  8 | 
  9 | STAMP views safety as a control problem. Safety is managed by a control
 10 | structure embedded in an adaptive socio-technical system. **The goal of the
 11 | control structure is to enforce constraints on system development and operation
 12 | that result in safe behavior**.
 13 | 
 14 | In STAMP, systems are viewed as interrelated components that are kept in a state of dynamic equilibrium by feedback loops of information and control. 
 15 | 
 16 | Safety is an emergent property that is achieved when appropriate constraints on behavior of the system and its components are satisfied. 
 17 | 
 18 | **In STAMP, accidents and losses result from not enforcing safety constraints on behavior.**
 19 | 
 20 | Basic concepts in STAMP 
 21 | 
 22 | 1. Safety constraints
 23 | 1. Hierarchical safety control structures
 24 | 1. Process models
 25 | 
 26 | 
 27 | ## Main concepts
 28 | ### Safety constraints
 29 | A constraint is the most basic concept in STAMP.
 30 | 
 31 | The cause of an accident is viewed as:
 32 | - the result of a lack of constraints imposed on the system design and on operations. 
 33 | - Inadequate enforcement of constraints on behavior at each level of a socio-technical system
 34 | 
 35 | System-level constraints must be identified.
 36 | 
 37 | Responsibility for enforcing constraints must be divided up and allocated to appropriate groups.
 38 | 
 39 | ### Hierarchical safety control structure
 40 | Systems are viewed as hierarchical structures
 41 | 
 42 | Each level imposes constraints on the activity beneath it
 43 | 
 44 | Control processes operate between levels to control processes at lower levels in the hierarchy.
 45 | 
 46 | Control processes enforce safety constraints.
 47 | 
 48 | Accidents occur when processes provide inadequate control & safety constraints are violated in the behavior of the lower-level components.
 49 | 
 50 | By describing accidents in terms of a hierarchy of control based on adaptive feedback mechanism, adaptation plays a central role in the understanding and prevention of accidents.
 51 | 
 52 | Inadequate control may result from:
 53 | - Missing constraints
 54 | - Inadequate safety control commands
 55 | - Commands that were not executed correctly at a lower level
 56 | - Inadequately communicated or processed feedback about constraint enforcement
 57 | 
 58 | Between hierarchical levels, need:
 59 | - Downward *reference channel* providing info necessary to impose safety constraints on the level below
 60 | - Upward *measuring channel* to provide feedback about how effectively constraints are being satisfied
 61 | 
 62 | *Time lags* may affect flow of control actions and feedback and may impact effectiveness of the control loop in enforcing safety constraints
 63 | 
 64 | ### Process models
 65 | 
 66 | Four conditions to control a process:
 67 | 
 68 | 1. Goal: safety constraints that must be enforced by each controller in the hierarchical safety control structure 
 69 | 1. Action condition: implemented in the downward control channels
 70 | 1. Observability condition: embodied in the upward feedback or measuring channels
 71 | 1. Model condition: any controller needs a model of the process being controlled to control it effectively
 72 | 
 73 | Component interaction accidents can usually be explained in terms of incorrect process models.
 74 | 
 75 | In general, accidents often occur when the process model used by the controller does not match the process and, as a result:
 76 | 
 77 | 1. Incorrect or unsafe control commands are given
 78 | 1. Required control actions (for safety) are not provided
 79 | 1. Potentially correct control commands are provided at the wrong time (too early or too late), or
 80 | 1. Control is stopped too soon or applied too long
 81 | 
 82 | Process models play an important role:
 83 | 
 84 | 1. In understanding why accidents occur and why humans provide inadequate control over safety-critical systems
 85 | 1. In designing safer systems.
 86 | 
 87 | ## Accidents
 88 | Accidents in STAMP are the result of a complex process that results in the system behavior violating the safety constraints. The safety constraints are enforced by the control loops between the various levels of the hierarchical control structure that are in place during design, development, manufacturing, and operations.
 89 | 
 90 | Using the STAMP causality model, if there is an accident, one or more of the following must have occurred:
 91 | 
 92 | 1. The safety constraints were not enforced by the controller.
 93 |     a. The control actions necessary to enforce the associated safety constraint at each level of the sociotechnical control structure for the system were not provided.
 94 |     b. The necessary control actions were provided but at the wrong time (too early or too late) or stopped too soon
 95 |     c. Unsafe control actions were provided that caused a violation of the safety constraints.
 96 | 2. Appropriate control actions were provided but not followed.
 97 | 
 98 | Classification of accident causal factors starts by examining each of the basic components of a control loop and determining how their improper operation may contribute to the general types of inadequate control.
 99 | 
100 | Causal factors in accidents can be divided into three general categories:
101 | 
102 | 1. The controller operation
103 | 1. The behavior of actuators and controlled processes
104 | 1. Communication and coordination among controllers and decision makers
105 | 
106 | When humans are involved in the control structure, context and behavior-shaping mechanisms also play an important role in causality.
107 | 
108 | ### Controller operation
109 | Three primary parts:
110 | 
111 | 1. Control inputs and other relevant external information sources
112 | 1. Control algorithms
113 | 1. Process model
114 | 
115 | Inadequate, ineffective or missing control actions necessary to enforce safety constraints and ensure safety can stem from flaws in each of these parts.
116 | 
117 | For human controllers and actuators, context is also an important factor.
118 | 
119 | #### Unsafe inputs
120 | Control actions and other info required for safe behavior may be missing or wrong. 
121 | 
122 | #### Unsafe control algorithms
123 | Control algorithms may not enforce safety constraints because:
124 | 
125 | - Algorithms are inadequately designed originally
126 | - Process may change and algorithms become unsafe
127 | - Control algorithms may be inadequately modified by maintainers if the algorithms are automated or through various types of natural adaptation if they are implemented by humans
128 | 
129 | **Time delays** are important consideration in designing control algorithms.
130 | Feedback delays generate requirements to predict when a prior control action
131 | has taken effect and when resources will be available again. When time delays
132 | are not adequately considered in the control algorithm, accidents can result.
133 | 
134 | Many accidents relate to *asynchronous evolution* where one part of the system
135 | changes without the related necessary changes in the other parts.
136 | 
137 | Communication is a critical factor here as well as monitoring for changes that may occur and feeding back this information to the higher-level control. For example, the safety analysis process that generates constraints always involves some basic assumptions about the operating environment of the process. When the environment changes such that those assumptions are no longer true.
138 | 
139 | #### Inconsistent, incomplete or incorrect process models
140 | 
141 | Accidents, particularly component interaction accidents, most often result from inconsistencies between the models of the process used by the controllers (both human and automated) and the actual process state. When the controller's model of the process (either the human mental model or the software or hardware model) diverges from the process state, erroneous control commands (based on the incorrect model) can lead to an accident.
142 | 
143 | The most common form of inconsistency occurs when one or more process models is incomplete in terms of not defining appropriate behavior for all possible process states or all possible disturbances, including unhandled or incorrectly handled component failures.
144 | 
145 | Inconsistency happens when:
146 | - The process model designed into the system is wrong from the beginning
147 | - Missing or incorrect feedback for updating the process model as the controlled process changes state
148 | - Process model is updated incorrectly
149 | - Time lags are not accounted for.
150 | 
151 | No control system will perform better than its measuring channel.
152 | 
153 | Feedback is missing inadequate when:
154 | - Not included in the system design
155 | - Flaws exist in the monitoring or feedback communication channel
156 | - Feedback is not timely
157 | - Measuring instrument operates inadequately
158 | 
159 | ### Actuators and controlled processes
160 | Problem: the control commands maintain the safety constraints, but the controlled process does not implement the commands.
161 | 
162 | Possible reasons:
163 | - failure/flaw in reference channel (transmission of control commands)
164 | - Actuator or controlled component fault or failure
165 | - Safety depends on inputs from other system components (e.g., power) for execution of controlled actions, where these inputs are missing or inadequate
166 | - External disturbances not handled by the controller
167 | 
168 | ### Coordination and communication among controllers and decision makers
169 | When there are multiple controllers (human and/or automated), control actions may be inadequately coordinated, including unexpected side effects of decisions or actions or conflicting control actions. Communication flaws play an important role here.
170 | 
171 | Accidents are most likely in overlap areas or in boundary areas or where two or more controllers (human or automated) control the same process or processes with common boundaries
172 | 
173 | #### Context and environment
174 | Human behavior is greatly impacted by the context and environment in which the human is working. These factors have been called "behavior shaping mechanisms".
175 | 
176 | 
177 | ## Definitions
178 | ### Accident
179 | An undesired or unplanned event that results in a loss, including loss of human life or human injury, property damage, environmental pollution, mission loss, etc.
180 | 
181 | ### Hazard
182 | A system state or set of conditions that, together with a particular set of worst-case environmental conditions, will lead to an accident (loss).
183 | 
184 | Hazards may be defined in terms of conditions, as here, or in terms of events as long as one of these choices is used consistently.
185 | 
186 | Hazards are not identical to failures: failures can occur without resulting in a hazard and a hazard may occur without any precipitating failures.
187 | 
188 | Draw the system boundaries
189 | Identify high-level system hazards
190 | Specify system-level safety requirements and design constraints necessary to prevent hazards from occurring
191 | 
192 | ## STPA - hazard analysis
193 | STPA (System-Theoretic Process Analysis) is a *hazard analysis* technique. The goal of hazard analysis is to identify potential causes of accidents so they can  be eliminated or controlled before damage occurs.
194 | 
195 | Goals of STPA:
196 | - Identify accident scenarios that encompass the entire accident process
197 | - Provide guidance to the users in getting good results
198 | 
199 | Two main steps:
200 | 
201 | 1. Identify the potential for inadequate control of the system that could lead to a hazardous state.
202 | 
203 |     Hazardous states result from inadequate control or enforcement of the safety constraints, which can occur because:
204 | 
205 |     1. A control action required for safety is not provided or not followed.
206 |     1. An unsafe control action is provided.
207 |     1. A potentially safe control action is provided too early or too late, that is, at the wrong time or in the wrong sequence.
208 |     1. A control action required for safety is stopped too soon or applied too long.
209 | 
210 | 
211 | 2. Determine how each potentially hazardous control action identified in step 1 could occur.
212 | 
213 |     a. For each unsafe control action, examine the parts of the control loop to see if they could cause it.
214 | 
215 |     Design controls and mitigation measures if they do not already exist or evaluate existing measures if the analysis is being performed on an existing design.
216 | 
217 |     For multiple controllers of the same component or safety constraint, identify conflicts and potential coordination problems.
218 | 
219 |     b. Consider how the designed controls could degrade over time and build in protection, including
220 | 
221 |     1. Management of change procedures to ensure safety constraints are enforced in planned changes.
222 |     1. Performance audits where the assumptions underlying the hazard analysis sis are the preconditions for the operational audits and controls so that unplanned changes that violate the safety constraints can be detected.
223 |     1. Accident and incident analysis to trace anomalies to the hazards and to the system design.
224 | 
225 | 
226 | ## CAST - accident/incident analysis
227 | CAST - causal analysis based on STAMP
228 | 
229 | 1. Identify the system(s) and hazard(s) involved in the loss
230 | 2. Identify the system safety constraints and system requirements associated with that hazard.
231 | 3. Document the safety control structure in place to control the hazard and enforce the safety constraints. This structure includes the roles and responsibilities of each component in the structure as well as the controls provided or created to execute their responsibilities and the relevant feedback provided to them to help them do this. This structure may be completed in parallel with the later steps.
232 | 4. Determine the proximate events leading to the loss.
233 | 5. Analyze the loss at the physical system level. Identify the contribution of each of the following to the events: physical and operational controls, physical failures, dysfunctional interactions, communication and coordination flaws, and unhandled disturbances. Determine why the physical controls in place were ineffective in preventing the hazard.
234 | 6. Moving up the levels of the safety control structure, determine how and why each successive higher level allowed or contributed to the inadequate control at the current level.
235 | 
236 |    For each system safety constraint, either the responsibility for enforcing it was never assigned to a component in the safety control structure or a component or components did not exercise adequate control to ensure their assigned responsibilities (safety constraints) were enforced in the components below them.
237 | 
238 |    Any human decisions or flawed control actions need to be understood in terms of (at least):
239 | 
240 |     1. the information available to the decision maker as well as any required information that was not available
241 |     1. the behavior-shaping mechanisms (the context and influences on the decision-making making process)
242 |     1. the value structures underlying the decision
243 |     1. any flaws in the process models of those making the decisions and why those flaws existed.
244 | 7. Examine overall coordination and communication contributors to the loss.
245 | 8. Determine the dynamics and changes in the system and the safety control structure relating to the loss and any weakening of the safety control structure over time.
246 | 9. Generate recommendations.
247 | 
248 | 
249 | 


--------------------------------------------------------------------------------
/boundary.graffle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lorin/resilience-engineering/1ace2a9fbe9d45d86cb9d6f740bb68ed28b4300e/boundary.graffle


--------------------------------------------------------------------------------
/boundary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lorin/resilience-engineering/1ace2a9fbe9d45d86cb9d6f740bb68ed28b4300e/boundary.png


--------------------------------------------------------------------------------
/graceful-extensibility.md:
--------------------------------------------------------------------------------
 1 | # Theorems of graceful extensibility
 2 | 
 3 | Source: [The theory of graceful extensibility: basic rules that govern adaptive systems]
 4 | 
 5 | 
 6 | ## Managing risk of saturation
 7 | ### Adaptive capacity is finite / boundaries are universal
 8 | The location of boundaries to the ability to meet demands is uncertain.
 9 | 
10 | Given a finite range, there is a general parameter - capacity for maneuver (CfM) which specifies how much of the range the unit has used and what capacity reamints to handle upcoming demands.
11 | 
12 | ### Events will produce demands that challenge boundaries on the adaptive capacity of any UAB / Surprise occurs, continuously
13 | 
14 | There are recurring patterns that characterize model surprise - how events challenge boundaries:
15 | 
16 | * Events will occur at some rate and of some size and of some kind that increase the risk of saturation - exhausting the remaining CfM
17 | * Brittleness is how rapidly a unit's eprformance declines when it nears and reaches its boundaries.
18 | * The range of adaptive behavior of a UAB is a model of fitness.
19 | * Events that occur near or outside a UAB's boundary increases the risk of stauration, and this occurs independent ofhow well that UAB matches responses to demands.
20 | 
21 | ### Adaptive capacities are regulated to manage the risk of saturating CfM / Risk of saturation is monitored and regulated
22 | * The work required to adapt and handle changing demands increases as CfM decreases.
23 | * As risk of saturation increases and CfM approaches exhaustion, UABs need to adapt to stretch or extend their base range of adaptive behavior to accomodate surprises.
24 | 
25 | ## Network of adaptive units
26 | ### No UAB can have sufficient ability to regulate CfM to manage the risk of saturation alone / Synchronization across multiple units of adaptive behavior in a network is necessary
27 | 
28 | UABs exist in and are defined relative to a network of interacting and interdependent UABs at multiple scales → networks with multiple roles, multiple echelons
29 | 
30 | ### Some UABs monitor and regulate the CfM of other UABs in response to changes in the risk of saturation / Risk of saturation can be shared
31 | 
32 | Misalignment and mis-coordination across UABs increaes the risk of saturating control as demands grow and cascade.
33 | 
34 | ### Adaptive capacity is the potential for adjusting patterns of action to handle future situations, events, opportunities and disruptions / Pressure changes what is sacrificed when
35 | What architectural properties of the network influence the way units in a network respond to varying pressures on trade-offs?
36 | 
37 | ## Outmaneuvering constraints
38 | ### Performance of a UAB as it approaches saturation is different from the performance of that UAB when it operates far from saturation / Pressure for optimality undermines graceful extensibility
39 | 
40 | ### All UABs are local / All adaptive units are local
41 | 
42 | ### There are bounds on the perspective any UAB, but these limits are overcome by shifts and contrasts over multiple perspectives / Perspective contrast overcomes bounds
43 | 
44 | ### Reflective systems risk mis-calibration / Mis-calibration is the norm
45 | 
46 | 
47 | [The theory of graceful extensibility: basic rules that govern adaptive systems]: https://www.researchgate.net/publication/327427067_The_Theory_of_Graceful_Extensibility_Basic_rules_that_govern_adaptive_systems
48 | 


--------------------------------------------------------------------------------
/intro.md:
--------------------------------------------------------------------------------
  1 | # Resilience engineering: Where do I start?
  2 | 
  3 | This an introductory guide to readings in *resilience engineering*, aimed at software engineers.
  4 | 
  5 | Key papers are organized into themes:
  6 | 
  7 | 
  8 | * [What is resilience?](#what-is-resilience)
  9 | * [Changing perspectives on accidents and safety](#changing-perspectives-on-accidents-and-safety)
 10 | * [Complex systems](#complex-systems)
 11 | * [Coordination](#coordination)
 12 | * [Automation](#automation)
 13 | * [Boundary as a model (Rasmussen)](#boundary-as-a-model-rasmussen)
 14 | * [David Woods](#david-woods)
 15 | 
 16 | The papers linked here should all be accessible to casual readers.
 17 | 
 18 | When you're ready for more, check out [resilience engineering notes](README.md).
 19 | 
 20 | ## What is resilience?
 21 | 
 22 | A *resilient* organization **adapts effectively to surprise**.
 23 | 
 24 | Here I'm using the definition proposed by [David Woods](https://u.osu.edu/csel/member-directory/david-woods/).
 25 | Before going into more detail about *resilience*, it's important to distinguish it from
 26 | a different concept that Woods calls *robustness*.
 27 | 
 28 | ### Robustness vs. resilience
 29 | 
 30 | ![Resilience vs robustness](resilience-doodle.jpg)
 31 | 
 32 | When we talk about designing highly available systems, we usually cover
 33 | techniques such as redundancy, retries, fallbacks, and failovers. We think about
 34 | what might go wrong (e.g., server failure, network partition), and design our
 35 | system to gracefully handle these situations.
 36 | 
 37 | Woods uses the term **robustness** to refer to systems that are designed to
 38 | effectively handle known failure modes.
 39 | 
 40 | **Resilience**, on the other hand, describes how well the system can handle
 41 | troubles that were not foreseeable by the designer. You can think of robustness
 42 | as being able to deal well with *known unknowns*, and resilience as being able
 43 | to deal well with *unknown unknowns*.
 44 | 
 45 | * [Four concepts for resilience and the implications for
 46 | the future of resilience
 47 | engineering]
 48 | by Woods discusses four different common usages of the term *resilience*.
 49 | In particular, he describes why he considers *robustness* to be a different concept.
 50 | * [Resilience is a verb] is another very readable paper on how Woods defines resilience.
 51 | 
 52 | 
 53 | [Four concepts for resilience and the implications for the future of resilience engineering]: https://www.researchgate.net/publication/276139783_Four_concepts_for_resilience_and_the_implications_for_the_future_of_resilience_engineering
 54 | [Resilience is a verb]: https://www.researchgate.net/publication/329035477_Resilience_is_a_Verb
 55 | 
 56 | ## Changing perspectives on accidents and safety
 57 | 
 58 | Resilience engineering as a field emerged from the safety science community.
 59 | That's why you'll often see examples from aviation and medicine, as well as
 60 | other safety critical areas like maritime, space flight, nuclear power, and rail.
 61 | 
 62 | Because of this history, the earlier papers that we associate with resilience
 63 | engineering are reactions to previous ways of thinking about accidents in
 64 | particular and safety in general.
 65 | 
 66 | Note that traditional approaches to safety often focus on minimizing variance
 67 | associated with humans doing work, using techniques such as documented
 68 | procedures and enforcement mechanisms for deviating from them.
 69 | 
 70 | ### New look / new view
 71 | 
 72 | The "new look" or "new view" refers to a change in perspective on how accidents
 73 | happen, which focuses on understanding how actions taken
 74 | by actors involved in the incident were rational, given what information those
 75 | actors had at the time that events were unfolding.
 76 | 
 77 | Johan Bergström of Lund University has three excellent short (<10 minute) videos:
 78 | 
 79 | * [Was it technical failure or human error?](https://www.youtube.com/watch?v=Ygx2AI2RtkI)
 80 | * [Three analytical traps in accident investigation](https://www.youtube.com/watch?v=TqaFT-0cY7U)
 81 | * [Two views on human error](https://www.youtube.com/watch?v=rHeukoWWtQ8)
 82 | 
 83 | Two great introductory papers (alas, 2nd one is paywalled) are:
 84 | 
 85 | * [Reconstructing human contributions to accidents: the new view on error and performance](http://sidneydekker.stackedsite.com/wp-content/uploads/sites/899/2013/01/SafetyResearch.pdf)
 86 | by Dekker
 87 | * [The error of counting errors](https://doi.org/10.1016/j.annemergmed.2008.03.015) by Robert Wears
 88 | 
 89 | A great book on putting these ideas into practice in incident investigations is:
 90 | 
 91 | * [The Field Guide to Understanding "Human Error"](https://www.amazon.com/Field-Guide-Understanding-Human-Error/dp/1472439058/) by Dekker
 92 | 
 93 | 
 94 | ### Safety-II
 95 | 
 96 | Safety-II is a perspective on the role that humans play in safety-critical
 97 | systems, proposed by Erik Hollnagel. In the Safety-II perspective,
 98 | it is the everyday, normal work of the humans in the system that create the safety,
 99 | as opposed to the errors of humans that erode it.
100 | 
101 | * [From Safety-I to Safety-II: A White Paper](https://www.skybrary.aero/bookshelf/books/2437.pdf) by Hollnagel is a very readable
102 | introduction to Safety-II concepts.
103 | * [Why do things go right?](http://www.safetydifferently.com/why-do-things-go-right/) by Dekker on the [Safety Differently](http://www.safetydifferently.com) website is another good article.
104 | 
105 | ## Complex systems
106 | 
107 | Ever wonder why resilience engineering advocates natter on about "no root cause?"
108 | 
109 | A recurring theme in resilience engineering is about reasoning holistically
110 | about *systems*, as opposed to breaking things up into components and reasoning
111 | about components separately. This perspective is known as *systems thinking*,
112 | which is a school of thought that has been influential in the resilience
113 | engineering community.
114 | 
115 | When you view the world as a system, the idea of *cause* becomes meaningless,
116 | because there's no way to isolate an individual cause. Instead, the world is
117 | a tangled web of influences.
118 | 
119 | You'll often hear the phrase *socio-technical system*. This language emphasizes that
120 | systems should be thought of as encompassing both humans and technologies, as opposed to
121 | thinking about technological aspects in isolation.
122 | 
123 | 
124 | * [How complex systems fail](https://www.adaptivecapacitylabs.com/HowComplexSystemsFail.pdf) by Richard I. Cook is a great starting point. It's a short paper and very easy to read.
125 | * [Drift into failure](https://www.goodreads.com/book/show/10258783) by Sidney Dekker is a book written for a lay audience, so it is also very readable. Dekker draws heavily from systems thinking to propose a theory about how complex systems can evolve into unsafe states.
126 | 
127 | 
128 | ## Coordination
129 | 
130 | The systems we are interested in often involve a collection of people working together
131 | in some way to achieve a task. One particularly relevant example involves a collection of engineers
132 | working together to troubleshoot and repair a system during an ongoing
133 | incident.
134 | 
135 | * [Common Ground and Coordination in Joint Activity] is an oft-cited paper on what is required for people
136 | to effectively coordinate when working on tasks together.
137 | 
138 | [Common Ground and Coordination in Joint Activity]: http://jeffreymbradshaw.net/publications/Common_Ground_Single.pdf
139 | 
140 | ## Automation
141 | 
142 | One thing we software folk do have in common with the safety-critical world is
143 | the increased adoption of automation. Automation introduces challenges, and
144 | the nature of these challenges is a topic of many resilience engineering papers.
145 | 
146 | You might hear the phrase *joint cognitive system* in the context of automation. This terms refers to
147 | systems that do cognitive work that are made up of a combination of humans and software.
148 | There is an entire research discipline that studies joint cognitive systems called *cognitive systems engineering*, initially
149 | developed by David Woods and Erik Hollnagel, both of whom would both later go on to play a significant role in
150 | developing the field of resilience engineering.
151 | 
152 | Because resilience engineering researchers like Woods and Hollnagel have their roots in cognitive
153 | systems engineering, and because of the ever-increasing use of software automation in society,
154 | this community is very concerned about the potential *brittleness* associated with poor
155 | use of automation.
156 | 
157 | 
158 | * [Ironies of automation](https://ckrybus.com/static/papers/Bainbridge_1983_Automatica.pdf) by Lisanne
159 | Bainbridge is a classic paper on the problems that automation can introduce.
160 | The paper was originally written in 1983, and continues to be widely cited.
161 | 
162 | * [How to make automated systems team players](https://researchgate.net/publication/2483863)
163 | by Christoffersen and Woods discusses how previous automated systems have been problematic and proposes strategies
164 | for improving automating.
165 | 
166 | * [Ten challenges for making automation a team player](https://ieeexplore.ieee.org/abstract/document/1363742)
167 | by Klein et al. is a more recent paper that outlines the requirements for automation to be genuinely effective in
168 | socio-technical systems. This work draws heavily from the theme of *coordination* discussed earlier.
169 | 
170 | ## Boundary as a model (Rasmussen)
171 | 
172 | 
173 | The late Jens Rasmussen is an enormously influential figure in the resilience engineering community.
174 | 
175 | * [Risk management in a dynamic society: a modelling problem](https://doi.org/10.1016/S0925-7535(97)00052-0), published in 1997,
176 | is one of Rasmussen's most famous papers, which introduces Rasmussen's *dynamic safety model*.
177 | 
178 | In this widely cited paper, Rasmussen advocates for a cross-disciplinary,
179 | systems-based approach to thinking about how accidents occur. He argues that
180 | accidents occur because the system migrates across a dangerous boundary, and
181 | this migration occurs during the course of normal work.
182 | 
183 | Here is a depiction of the model from that paper:
184 | 
185 | ![boundary](boundary.png)
186 | 
187 | 
188 | 
189 | ## David Woods
190 | 
191 | We've already referenced several papers authored or co-authored by
192 | David Woods. Woods is a force of nature in the field of resilience engineering, having
193 | played a key role in creating the field itself. Woods is incredibly prolific,
194 | and has introduced a wide variety of concepts related to resilience
195 | engineering.
196 | 
197 | Woods is interested in resilience engineering principles that apply across an
198 | enormous range of different types of systems: whether we're talking about
199 | the organs in a biological organism up to organizations like NASA.
200 | 
201 | Because he's interested in general principles, many of his papers are written at
202 | a very abstract level, where he discusses generic concepts such as *units of adaptive
203 | behavior* or *saturation*.
204 | 
205 | ### Dragons at the boundary
206 | 
207 | David Woods uses the metaphor of a system moving within a boundary in his writings on resilience engineering, but in
208 | a slightly different way than Rasmussen.
209 | 
210 | Woods sees the boundary as a *competence envelope*. There are two different regimes of system behavior: far from the boundary and near the boundary.
211 | 
212 | When a system is far from the boundary, the system (and its environment) behave as expected. By contrast, when a system
213 | grows near to the boundary, surprises happen. Woods uses the metaphor of *dragons* to capture the surprises that occur when a system moves near the boundary, and how the system's model of the world is violated when it enters this regime.
214 | 
215 | It is how units within a system adapt when the system moves near the boundary, how these units deal with the dragons,
216 | that is one of the prime concerns of Woods.
217 | 
218 | Woods's [Essentials of Resilience, revisited](https://www.researchgate.net/profile/David_Woods11/publication/330116587_4_Essentials_of_resilience_revisited/links/5c2e448ba6fdccd6b58f871e/4-Essentials-of-resilience-revisited.pdf?origin=publication_detail) discusses behavior at the boundary, although it doesn't use the *dragon* metaphor.
219 | 
220 | ### The adaptive universe
221 | 
222 | Woods's idea of the *adaptive universe* is characterized by three properties:
223 | 
224 | * Resources are finite
225 | * Surprise is fundamental
226 | * Change never stops
227 | 
228 | I haven't found a good introductory paper for the adaptive universe, as it
229 | encompasses an enormous number of topics, including the topic of *dragons at the boundaries*
230 | that we discussed earlier.
231 | 
232 | I recommend watching Woods's [Resilience Engineering short
233 | course](https://www.youtube.com/playlist?list=PLvlZBj1NU_ikTy1ot30EbEbYMAoBf9eAt), which
234 | covers this topic.  I've written my own [notes on the short
235 | course](https://github.com/lorin/res-eng-short-course-notes), which you might
236 | find useful. In particular, you might be interested in my [summary
237 | notes](https://github.com/lorin/res-eng-short-course-notes/blob/master/summary.md).
238 | 
239 | ### Graceful extensibility
240 | 
241 | Woods introduced the theory of *graceful extensibility* to capture how successful
242 | systems adapt effectively to surprise. The most relevant paper here is:
243 | 
244 | * [The theory of graceful extensibility: basic rules that govern adaptive systems](https://link.springer.com/article/10.1007%2Fs10669-018-9708-3).
245 | 
246 | 


--------------------------------------------------------------------------------
/laws.md:
--------------------------------------------------------------------------------
  1 | # Laws, tradeoffs and theorems
  2 | 
  3 | Many of these are documented in [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems]
  4 | by Hoffman and Woods.
  5 | 
  6 | [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems]: https://www.researchgate.net/publication/220628177_Beyond_Simon%27s_Slice_Five_Fundamental_Trade-Offs_that_Bound_the_Performance_of_Macrocognitive_Work_Systems
  7 | 
  8 | * Laws
  9 |   - Law of fluency
 10 |   - Law of stretched systems
 11 |   - Law of requisite variety
 12 |   - Laws of the adaptive universe
 13 |   - Law of coordinative entropy
 14 |   - Mr. Weasley's Law
 15 |   - The Law of the Kludge
 16 |   - First law of cooperative systems
 17 |   - (Robin) Murphy's Law
 18 | * Tradeoffs
 19 |   - Efficiency-thoroughness tradeoff
 20 |   - Optimality-brittleness tradeoff
 21 | * Theorems
 22 |    - Theorems of graceful extensibility
 23 | 
 24 | ## Laws
 25 | 
 26 | ### Law of fluency
 27 | 
 28 | Well-adapted cognitive work occurs with a facility that belies the difficulty
 29 | of resolving demands and balancing dilemmas
 30 | 
 31 | Source: [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems]
 32 | 
 33 | ### Law of stretched systems
 34 | 
 35 | Every system is stretched to operate at its capacity.
 36 | 
 37 | Sources:
 38 | 
 39 | * [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems]
 40 | 
 41 | This law is attributed to Lawrence Hirschhorn, and has been popularized by David Woods and Richard Cook.
 42 | 
 43 | ### Law of requisite variety
 44 | 
 45 | The larger the variety of actions available to a control system, the larger the
 46 | variety of perturbations it is able to compensate.
 47 | 
 48 | This is also called the first law of cybernetics or Ashby's law.
 49 | 
 50 | Source: <http://pespmc1.vub.ac.be/REQVAR.html>
 51 | 
 52 | ### Laws of the adaptive universe
 53 | 
 54 | * Resources are finite
 55 | * Surprise is fundamental
 56 | * Change never stops
 57 | 
 58 | Source: <https://www.youtube.com/playlist?list=PLvlZBj1NU_ikTy1ot30EbEbYMAoBf9eAt.>
 59 | 
 60 | ### Law of coordinative entropy
 61 | 
 62 | Coordination costs, continuously.
 63 | 
 64 | Source: [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems]
 65 | 
 66 | ### Law of systems as surrogates
 67 | 
 68 | Technology reflects the stances, agendas, and goals of those who design and deploy the technology.
 69 | 
 70 | Source: [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems]
 71 | 
 72 | ### Mr. Weasley's Law
 73 | 
 74 | Never trust anything that can think for itself if you can’t see where it keeps its brain.
 75 | 
 76 | Source: [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems]
 77 | 
 78 | ### The Law of the Kludge
 79 | 
 80 | Work systems always require workarounds, with resultant kludges that attempt
 81 | to bridge the gap between the original design objectives and current realities
 82 | or to reconcile conflicting goals among workers.
 83 | 
 84 | Source: [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems]
 85 | 
 86 | ### First law of cooperative systems
 87 | 
 88 | It's not cooperation, if either you do it all or I do it all.
 89 | 
 90 | Source: David Woods. Not sure where he first wrote this, but it's referenced in *Cognitive Systems Engineering: The Future for a Changing World*
 91 | 
 92 | ### (Robin) Murphy's Law
 93 | 
 94 | Any deployment of robotic systems will fall short of the target level of autonomy, creating or exacerbating a shortfall
 95 | in mechanisms for coordination with human stakeholders.
 96 | 
 97 | Source: This is mentioned in [Joint Cognitive Systems: Patterns in Cogntive Systems Engineering](https://www.amazon.com/Joint-Cognitive-Systems-Patterns-Engineering-ebook/dp/B00918NQOE/ref=sr_1_1?keywords=joint+cognitive+systems&qid=1557092907&s=gateway&sr=8-1), Chapter 10 (Automation Surprises).
 98 | 
 99 | ## Tradeoffs
100 | 
101 | ### Optimality vs. resilience
102 | 
103 | The pursuit of increases in optimality with respect to some criteria
104 | guarantees an increase in brittleness with respect to changes or variations
105 | that fall outside of those criteria.
106 | 
107 | Described in *Beyond Simon's Slice* as:
108 | * bounded ecology
109 | * *optimality-resilience of adaptive capacity trade-off*
110 | 
111 | 
112 | ### Efficiency vs. thoroughness
113 | 
114 | People (and organisations) as part of their activities frequently – or always –
115 | have to make a trade-off between the resources (primarily time and effort) they
116 | spend on preparing to do something and the resources (primarily time and
117 | effort) they spend on doing it.
118 | 
119 | 
120 | Described in *Beyond Simon's Slice* as:
121 | * bounded cognizance
122 | * *efficiency-thoroughness of situated plans trade-off*
123 | 
124 | Source: <http://erikhollnagel.com/ideas/etto-principle/index.html>
125 | 
126 | ### Revelation vs. reflection
127 | 
128 | Because every perspective reveals some details and hides others, we
129 | gain an advantage from reflecting on different perspectives. But this
130 | reflection has a cost, it takes effort.
131 | 
132 | (The text itself doesn't describe "revelation", but my sense is that this is an explore/exploit
133 | style tradeoff, where we have to trade off going broader on perspectives with going deeper in
134 | a specific perspective).
135 | 
136 | Described in *Beyond Simon's Slice* as:
137 | * bounded perspectives
138 | * *revelation-reflection on perspectives trade-off*
139 | 
140 | ### Acute goal vs. chronic goal
141 | 
142 | There are ongoing (chronic) goals that we are always responsible for (e.g., safety), but we often
143 | face some shorter term deadline (acute) that demands more of our attention.
144 | 
145 | Described in *Beyond Simon's Slice* as:
146 | * bounded responsibility
147 | * *acute-chronic goal responsibility trade-off*
148 | 
149 | 
150 | ### Concentrated action vs. distributed action
151 | 
152 | Distributing autonomy allows systems to act more quickly, but it makes synchronization across
153 | actions more difficult.
154 | 
155 | Described in *Beyond Simon's Slice* as:
156 | * bounded effectiveness
157 | * *concentrated-distributed action trade-off*
158 | 
159 | 
160 | 
161 | ## Theorems
162 | 
163 | ### Theorems of graceful extensibility
164 | 
165 | * *UAB* stands for unit of adaptive behavior
166 | * *CfM* stands for capacity for manoeuvre
167 | 
168 | 1. Adaptive capacity is finite
169 | 2. Events will produce demands that challenge boundaries on the adaptive
170 |    capacity of any UAB
171 | 3. Adaptive capacities are regulated to manage the risk of saturating CfM
172 | 4. No UAB can have sufficient ability to regulate CfM to manage the risk of saturation alone
173 | 5. Some UABs monitor and regulate the CfM of other UABs in response to changes
174 |    in the risk of saturation
175 | 6. Adaptive capacity is the potential for adjusting patterns of action to
176 |    handle future situations, events, opportunities and disruptions
177 | 7. Performance of a UAB as it approaches saturation is different from the
178 |    performance of that UAB when it operates far from saturation
179 | 8. All UABs are local
180 | 9. There are bounds on the perspective any UAB, but these limits are overcome
181 |    by shifts and contrasts over multiple perspectives.
182 | 10. Reflective systems risk mis-calbiration
183 | 
184 | Source: [The Theory of Graceful Extensibility: Basic rules that govern adaptive systems](https://www.researchgate.net/publication/327427067_The_Theory_of_Graceful_Extensibility_Basic_rules_that_govern_adaptive_systems)
185 | 


--------------------------------------------------------------------------------
/paries-keynote-2015.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lorin/resilience-engineering/1ace2a9fbe9d45d86cb9d6f740bb68ed28b4300e/paries-keynote-2015.pptx


--------------------------------------------------------------------------------
/resilience-doodle.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lorin/resilience-engineering/1ace2a9fbe9d45d86cb9d6f740bb68ed28b4300e/resilience-doodle.jpg


--------------------------------------------------------------------------------
/risk-management-framework.graffle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lorin/resilience-engineering/1ace2a9fbe9d45d86cb9d6f740bb68ed28b4300e/risk-management-framework.graffle


--------------------------------------------------------------------------------
/risk-management-framework.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lorin/resilience-engineering/1ace2a9fbe9d45d86cb9d6f740bb68ed28b4300e/risk-management-framework.png


--------------------------------------------------------------------------------
/topics.md:
--------------------------------------------------------------------------------
 1 | # Papers by topic
 2 | 
 3 | These pages cluster notable resilience engineering [papers](README.md) by topic.
 4 | 
 5 | - [The nature of cognitive work during an incident](topics/the-nature-of-cognitive-work-during-an-incident.md)
 6 | - [Human-human interaction](topics/human-human-interaction.md)
 7 | - [Nature of complex systems](topics/nature-of-complex-systems.md)
 8 | - [Changing perspective on safety](topics/changing-perspective-on-safety.md)
 9 | - [Common misconceptions](topics/common-misconceptions.md)
10 | - [Human-machine interaction](topics/human-machine-interaction.md)
11 | - [What can go badly during an incident](topics/what-can-go-badly-during-an-incident.md)
12 | - [What we mean by "resilience"](topics/what-we-mean-by-resilience.md)
13 | - [Incident analysis pragmatics](topics/incident-analysis-pragmatics.md)
14 | 


--------------------------------------------------------------------------------
/topics/changing-perspective-on-safety.md:
--------------------------------------------------------------------------------
 1 | # Changing perspective on safety
 2 | 
 3 | ## Concepts
 4 | * old view vs. new view
 5 | * safety-I vs safety-II
 6 | * safety as "the capacity of people and systems to provide good outcomes" rather than "preventing things from going wrong"
 7 | * "work as imagined" vs "work as done"
 8 | 
 9 | ## Readings
10 | 
11 | * [Reconstructing human contributions to accidents: the new view on error and performance](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.411.4985&rep=rep1&type=pdf)
12 | * [From Safety-I to Safety-II: A White Paper](https://www.skybrary.aero/bookshelf/books/2437.pdf)
13 | * [I want to believe: some myths about the management of industrial safety](http://dx.doi.org/10.1007/s10111-012-0237-4)
14 | * [Do Safety Differently](https://www.amazon.com/Do-Safety-Differently-Sidney-Dekker/dp/B09RM3Z17V)
15 | 
16 | 


--------------------------------------------------------------------------------
/topics/common-misconceptions.md:
--------------------------------------------------------------------------------
 1 | # Common misconceptions
 2 | 
 3 | ## Concepts
 4 | 
 5 | * hindsight
 6 | * human error
 7 | * root cause
 8 | * systems thinking
 9 | 
10 | ## Readings
11 | 
12 | * [Replacing Hindsight With Insight: Toward Better Understanding of Diagnostic Failures](https://www.semanticscholar.org/paper/Replacing-hindsight-with-insight%3A-toward-better-of-Wears-Nemeth/1bef45cae7375eddc8ee584dff100d200d812a8d)
13 | * [Applying systems thinking to analyze and learn from events](https://dspace.mit.edu/handle/1721.1/108102)
14 | * [The error of counting errors](https://doi.org/10.1016/j.annemergmed.2008.03.015) by Robert Wears
15 | 
16 | 
17 | 


--------------------------------------------------------------------------------
/topics/human-human-interaction.md:
--------------------------------------------------------------------------------
 1 | # Human-human interaction
 2 | 
 3 | ## Topics
 4 | 
 5 | * Common ground and coordination
 6 | * Being bumpable
 7 | * Polycentric governance
 8 | 
 9 | ## Readings
10 | 
11 | * [Governing the Commons: The Evolution of Institutions for Collective Action](https://www.amazon.com/Governing-Commons-Evolution-Institutions-Collective/dp/1107569788)
12 | * [Common Ground and Coordination in Joint Activity](http://jeffreymbradshaw.net/publications/Common_Ground_Single.pdf)
13 | * [Patterns in Cooperative Cognition](https://www.researchgate.net/publication/262449980_Patterns_in_Cooperative_Cognition)
14 | 
15 | 


--------------------------------------------------------------------------------
/topics/human-machine-interaction.md:
--------------------------------------------------------------------------------
 1 | # Human-machine interaction
 2 | 
 3 | ## Concepts
 4 | 
 5 | * ironies of automation
 6 | * team player
 7 | 
 8 | ## Readings
 9 | 
10 | * [Ironies of automation](https://www.ise.ncsu.edu/wp-content/uploads/2017/02/Bainbridge_1983_Automatica.pdf)
11 | * [How to Make Automated Systems Team Players](https://www.researchgate.net/profile/David_Woods11/publication/2483863_How_to_Make_Automated_Systems_Team_Players/links/5a4f829eaca272940bf8202c/How-to-Make-Automated-Systems-Team-Players.pdf)
12 | * [Ten challenges for making automation a team player](https://ieeexplore.ieee.org/abstract/document/1363742)
13 | 
14 | 


--------------------------------------------------------------------------------
/topics/incident-analysis-pragmatics.md:
--------------------------------------------------------------------------------
1 | # Incident analysis pragmatics
2 | 
3 | "Nuts and bolts" of incident analysis work.
4 | 
5 | * [Etsy Debrief Facilitation Guide](http://extfiles.etsy.com/DebriefingFacilitationGuide.pdf)
6 | * [The field guide to understanding 'human error'](https://www.amazon.com/Field-Guide-Understanding-Human-Error/dp/1472439058s://www.amazon.com/Field-Guide-Understanding-Human-Error/dp/1472439058)
7 | * [Pre-accident investigations: an introduction to organizational safety](https://www.amazon.com/Pre-Accident-Investigations-Todd-Conklin/dp/1409447820)
8 | 


--------------------------------------------------------------------------------
/topics/nature-of-complex-systems.md:
--------------------------------------------------------------------------------
 1 | # Nature of complex systems
 2 | 
 3 | ## Concepts
 4 | 
 5 | * sharp-end vs. blunt-end
 6 | * practitioner actions as gambles
 7 | * coping with complexity
 8 | * robust yet fragile
 9 | * drift
10 | * strange loops
11 | * dark debt
12 | * well-adapted, under-adapted, over-adapted
13 | * decompensation, working at cross-purposes, getting stuck in outdated behaviors
14 | 
15 | ## Readings
16 | 
17 | * [How complex systems fail](http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf)
18 | * [Basic Patterns in How Adaptive Systems Fail](https://www.researchgate.net/publication/284324002_Basic_patterns_in_how_adaptive_systems_fail)
19 | * [STELLA: Report from the SNAFUcatchers Workshop on Coping with Complexity](https://snafucatchers.github.io/)
20 | * [Highly Optimized Tolerance: Robustness and Design in Complex Systems](http://dx.doi.org/10.1103/physrevlett.84.2529)
21 | * [Drift into failure](https://www.amazon.com/Drift-into-Failure-Sidney-Dekker/dp/1409422216)
22 | 
23 | 
24 | 


--------------------------------------------------------------------------------
/topics/the-nature-of-cognitive-work-during-an-incident.md:
--------------------------------------------------------------------------------
 1 | # The nature of cognitive work during an incident
 2 | 
 3 | ## Concepts
 4 | * problem detection
 5 | * anomaly repsonse
 6 | 
 7 | ## Readings
 8 | 
 9 | * [Anomaly Response](https://docs.wixstatic.com/ugd/3ad081_f46dda684154447583c8a5b282b60cc2.pdf)
10 | * [Problem detection](https://www.researchgate.net/publication/220579480_Problem_detection)
11 | * [The strengths and limitations of teams for detecting problems](https://link.springer.com/article/10.1007/s10111-005-0024-6)
12 | 
13 | 
14 | 


--------------------------------------------------------------------------------
/topics/what-can-go-badly-during-an-incident.md:
--------------------------------------------------------------------------------
 1 | # What can go badly during an incident
 2 | 
 3 | ## Concepts
 4 | 
 5 | * Going solid
 6 | * Going sour
 7 | * Fixation
 8 | * Vagabonding
 9 | 
10 | 
11 | ## Readings
12 | 
13 | * [“Going solid”: a model of system dynamics and consequences for patient safety](https://qualitysafety.bmj.com/content/14/2/130)
14 | * [Learning from Automation Surprises and "Going Sour" Accidents: Progress on Human-Centered Automation](https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19980016965.pdf)
15 | 
16 | 
17 | 


--------------------------------------------------------------------------------
/topics/what-we-mean-by-resilience.md:
--------------------------------------------------------------------------------
 1 | # What we mean by "resilience"
 2 | 
 3 | ## Concepts
 4 | 
 5 | * resilience
 6 | * robustness
 7 | 
 8 | ## Readings
 9 | 
10 | * [Resilience is a verb](https://www.researchgate.net/publication/329035477_Resilience_is_a_Verb)
11 | * [Four concepts for resilience and the implications for the future of resilience engineering](https://www.researchgate.net/publication/276139783_Four_concepts_for_resilience_and_the_implications_for_the_future_of_resilience_engineering)
12 | 
13 | 


--------------------------------------------------------------------------------