├── LICENSE.md ├── README.md ├── STAMP.md ├── boundary.graffle ├── boundary.png ├── graceful-extensibility.md ├── intro.md ├── laws.md ├── paries-keynote-2015.pptx ├── resilience-doodle.jpg ├── risk-management-framework.graffle ├── risk-management-framework.png ├── topics.md └── topics ├── changing-perspective-on-safety.md ├── common-misconceptions.md ├── human-human-interaction.md ├── human-machine-interaction.md ├── incident-analysis-pragmatics.md ├── nature-of-complex-systems.md ├── the-nature-of-cognitive-work-during-an-incident.md ├── what-can-go-badly-during-an-incident.md └── what-we-mean-by-resilience.md /LICENSE.md: -------------------------------------------------------------------------------- 1 | ## Creative Commons Attribution-ShareAlike 4.0 International Public License 2 | 3 | By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-ShareAlike 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions. 4 | 5 | **Section 1 – Definitions.** 6 | 7 | 1. **Adapted Material** means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image. 8 | 2. **Adapter's License** means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License. 9 | 3. **BY-SA Compatible License** means a license listed at [creativecommons.org/compatiblelicenses](//creativecommons.org/compatiblelicenses), approved by Creative Commons as essentially the equivalent of this Public License. 10 | 4. **Copyright and Similar Rights** means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section [2(b)(1)-(2)](#s2b) are not Copyright and Similar Rights. 11 | 5. **Effective Technological Measures** means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements. 12 | 6. **Exceptions and Limitations** means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material. 13 | 7. **License Elements** means the license attributes listed in the name of a Creative Commons Public License. The License Elements of this Public License are Attribution and ShareAlike. 14 | 8. **Licensed Material** means the artistic or literary work, database, or other material to which the Licensor applied this Public License. 15 | 9. **Licensed Rights** means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license. 16 | 10. **Licensor** means the individual(s) or entity(ies) granting rights under this Public License. 17 | 11. **Share** means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them. 18 | 12. **Sui Generis Database Rights** means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world. 19 | 13. **You** means the individual or entity exercising the Licensed Rights under this Public License. **Your** has a corresponding meaning. 20 | 21 | **Section 2 – Scope.** 22 | 23 | 1. **License grant**. 24 | 1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to: 25 | 1. reproduce and Share the Licensed Material, in whole or in part; and 26 | 2. produce, reproduce, and Share Adapted Material. 27 | 2. Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions. 28 | 3. Term. The term of this Public License is specified in Section [6(a)](#s6a). 29 | 4. Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section [2(a)(4)](#s2a4) never produces Adapted Material. 30 | 5. Downstream recipients. 31 | 32 | 1. Offer from the Licensor – Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License. 33 | 2. Additional offer from the Licensor – Adapted Material. Every recipient of Adapted Material from You automatically receives an offer from the Licensor to exercise the Licensed Rights in the Adapted Material under the conditions of the Adapter’s License You apply. 34 | 3. No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material. 35 | 36 | 6. No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section [3(a)(1)(A)(i)](#s3a1Ai). 37 | 2. **Other rights**. 38 | 39 | 1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise. 40 | 2. Patent and trademark rights are not licensed under this Public License. 41 | 3. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties. 42 | 43 | **Section 3 – License Conditions.** 44 | 45 | Your exercise of the Licensed Rights is expressly made subject to the following conditions. 46 | 47 | 1. **Attribution**. 48 | 49 | 1. If You Share the Licensed Material (including in modified form), You must: 50 | 51 | 1. retain the following if it is supplied by the Licensor with the Licensed Material: 52 | 1. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated); 53 | 2. a copyright notice; 54 | 3. a notice that refers to this Public License; 55 | 4. a notice that refers to the disclaimer of warranties; 56 | 5. a URI or hyperlink to the Licensed Material to the extent reasonably practicable; 57 | 2. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and 58 | 3. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License. 59 | 2. You may satisfy the conditions in Section [3(a)(1)](#s3a1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information. 60 | 3. If requested by the Licensor, You must remove any of the information required by Section [3(a)(1)(A)](#s3a1A) to the extent reasonably practicable. 61 | 2. **ShareAlike**. 62 | 63 | In addition to the conditions in Section [3(a)](#s3a), if You Share Adapted Material You produce, the following conditions also apply. 64 | 65 | 1. The Adapter’s License You apply must be a Creative Commons license with the same License Elements, this version or later, or a BY-SA Compatible License. 66 | 2. You must include the text of, or the URI or hyperlink to, the Adapter's License You apply. You may satisfy this condition in any reasonable manner based on the medium, means, and context in which You Share Adapted Material. 67 | 3. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, Adapted Material that restrict exercise of the rights granted under the Adapter's License You apply. 68 | 69 | **Section 4 – Sui Generis Database Rights.** 70 | 71 | Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material: 72 | 73 | 1. for the avoidance of doubt, Section [2(a)(1)](#s2a1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database; 74 | 2. if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material, including for purposes of Section [3(b)](#s3b); and 75 | 3. You must comply with the conditions in Section [3(a)](#s3a) if You Share all or a substantial portion of the contents of the database. 76 | 77 | For the avoidance of doubt, this Section [4](#s4) supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights. 78 | 79 | **Section 5 – Disclaimer of Warranties and Limitation of Liability.** 80 | 81 | 1. **Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You.** 82 | 2. **To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You.** 83 | 84 | 3. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability. 85 | 86 | **Section 6 – Term and Termination.** 87 | 88 | 1. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically. 89 | 2. Where Your right to use the Licensed Material has terminated under Section [6(a)](#s6a), it reinstates: 90 | 91 | 1. automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or 92 | 2. upon express reinstatement by the Licensor. 93 | 94 | For the avoidance of doubt, this Section [6(b)](#s6b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License. 95 | 3. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License. 96 | 4. Sections [1](#s1), [5](#s5), [6](#s6), [7](#s7), and [8](#s8) survive termination of this Public License. 97 | 98 | **Section 7 – Other Terms and Conditions.** 99 | 100 | 1. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed. 101 | 2. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License. 102 | 103 | **Section 8 – Interpretation.** 104 | 105 | 1. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License. 106 | 2. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions. 107 | 3. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor. 108 | 4. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority. 109 | 110 | > Creative Commons is not a party to its public licenses. Notwithstanding, Creative Commons may elect to apply one of its public licenses to material it publishes and in those instances will be considered the “Licensor.” The text of the Creative Commons public licenses is dedicated to the public domain under the [CC0 Public Domain Dedication](//creativecommons.org/publicdomain/zero/1.0/legalcode). Except for the limited purpose of indicating that material is shared under a Creative Commons public license or as otherwise permitted by the Creative Commons policies published at [creativecommons.org/policies](//creativecommons.org/policies), Creative Commons does not authorize the use of the trademark “Creative Commons” or any other trademark or logo of Creative Commons without its prior written consent including, without limitation, in connection with any unauthorized modifications to any of its public licenses or any other arrangements, understandings, or agreements concerning use of licensed material. For the avoidance of doubt, this paragraph does not form part of the public licenses. 111 | > 112 | > Creative Commons may be contacted at [creativecommons.org](//creativecommons.org/). 113 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Resilience engineering papers 2 | 3 | ## Overview 4 | 5 | Alias: (thanks to [John Allspaw](https://twitter.com/allspaw)). 6 | 7 | This doc contains notes about people active in resilience engineering, as well as some influential 8 | researchers who are no longer with us, organized alphabetically. It also includes people and papers 9 | from related fields, such as cognitive systems engineering and naturalistic decision-making. 10 | 11 | If you're not sure what to read first, check out [Resilience engineering: Where do I start?](intro.md) 12 | 13 | ## Annotations 14 | 15 | A [BH](https://safety177496371.wordpress.com/) link indicates Ben Hutchinson's [Safety & Performance Research Summaries](https://safety177496371.wordpress.com/) blog. 16 | Ben writes summaries of safety papers, posting them to his blog as well as LinkedIOn. 17 | 18 | A [TWRR](http://resilienceroundup.com) link indicates Thai Wood's [Resilience Roundup](http://resilienceroundup.com). Thai publishes a newsletter that 19 | summarizes resilience engineering papers. 20 | 21 | ## Other interesting links 22 | 23 | [resilienceinsoftware.org](https://resilienceinsoftware.org) is the Resilience in Software Foundation, a community of software people who are interested in resilience engineering. 24 | 25 | 26 | For a collection of talks, check out the [Resilience Engineering, Cognitive Systems 27 | Engineering, and Human Factors Concepts in Software 28 | Contexts](https://www.youtube.com/playlist?list=PLb1aZTnPf3-OMChMkrr6WsokRI6LOnuem) 29 | YouTube playlist maintained by John Allspaw. 30 | 31 | You might also be interested in my [notes on David Woods's Resilience Engineering short course](https://github.com/lorin/res-eng-short-course-notes). 32 | 33 | The papers linked here are also in the [zotero res-eng group](https://www.zotero.org/groups/2335189/res-eng/items). 34 | 35 | ## People 36 | 37 | For each person, I list concepts that they reference in their writings, along 38 | with some publications. The publications lists aren't comprehensive: 39 | they're ones I've read or have added to my to-read list. 40 | 41 | * [John Allspaw](#john-allspaw) 42 | * [Lisanne Bainbridge](#lisanne-bainbridge) 43 | * [Andrea Baker](#andrea-baker) 44 | * [E. Asher Balkin](#e-asher-balkin) 45 | * [Johan Bergström](#johan-bergström) 46 | * [Matthieu Branlat](#matthieu-branlat) 47 | * [Sheuwen Chuang](#sheuwen-chuang) 48 | * [Todd Conklin](#todd-conklin) 49 | * [Richard I. Cook](#richard-i-cook) 50 | * [Sidney Dekker](#sidney-dekker) 51 | * [John C. Doyle](#john-c-doyle) 52 | * [Bob Edwards](#bob-edwards) 53 | * [Anders Ericsson](#anders-ericsson) 54 | * [Paul Feltovich](#paul-feltovich) 55 | * [Pedro Ferreira](http://www.resilience-engineering-association.org/user/pedro/) 56 | * [Meir Finkel](#meir-finkel) 57 | * [Marisa Grayson](#marisa-grayson) 58 | * [Ivonne Andrade Herrera](#ivonne-andrade-herrera) 59 | * [Robert Hoffman](#robert-hoffman) 60 | * [Erik Hollnagel](#erik-hollnagel) 61 | * [Leila Johannesen](#leila-johannesen) 62 | * [Gary Klein](#gary-klein) 63 | * [Elizabeth Lay](#elizabeth-lay) 64 | * [Jean-Christophe Le Coze](#jean-christophe-le-coze) 65 | * [Nancy Leveson](#nancy-leveson) 66 | * [Carl Macrae](#carl-macrae) 67 | * [Laura Maguire](#laura-maguire) 68 | * [Christopher Nemeth](#christopher-nemeth) 69 | * [Anne-Sophie Nyssen](#anne-sophie-nyssen) 70 | * [Elinor Ostrom](#elinor-ostrom) 71 | * [Jean Pariès](#jean-paries) 72 | * [Emily Patterson](#emily-patterson) 73 | * [Charles Perrow](#charles-perrow) 74 | * [Shawna J. Perry](#shawna-j-perry) 75 | * [Jens Rasmussen](#jens-rasmussen) 76 | * [Mike Rayo](#mike-rayo) 77 | * [James Reason](#james-reason) 78 | * [J. Paul Reed](#j-paul-reed) 79 | * [Emilie M. Roth](#emilie-m-roth) 80 | * [Nadine Sarter](#nadine-sarter) 81 | * [James C. Scott](#james-c-scott) 82 | * [Steven Shorrock](#steven-shorrock) 83 | * [Barry Turner](#barry-turner) 84 | * [Diane Vaughan](#diane-vaughan) 85 | * [Robert L. Wears](#robert-l-wears) 86 | * [David Woods](#david-woods) 87 | * [John Wreathall](#john-wreathall) 88 | 89 | ## Some big ideas 90 | 91 | * [The adaptive universe](#the-adaptive-universe) (David Woods) 92 | * [Dynamic safety model](#dynamic-safety-model) (Jens Rasmussen) 93 | * [Safety-II](#safety-i-vs-safety-ii) (Erik Hollnagel) 94 | * [Graceful extensibility](#graceful-extensibility) (David Woods) 95 | * [ETTO: Efficiency-tradeoff principle](#etto-principle) (Erik Hollnagel) 96 | * [Drift into failure](#drift-into-failure) (Sidney Dekker) 97 | * Robust yet fragile (John C. Doyle) 98 | * [STAMP: Systems-Theoretic Accident Model & Process](#stamp) (Nancy Leveson) 99 | * Polycentric governance (Elinor Ostrom) 100 | 101 | Note: there are now [multiple contributors](https://github.com/lorin/resilience-engineering/graphs/contributors) to this repository. 102 | 103 | ## John Allspaw 104 | 105 | Allspaw is the former CTO of Etsy. He applies concepts from resilience engineering to the tech industry. 106 | He is one of the founders [Adaptive Capacity Labs](http://www.adaptivecapacitylabs.com/), a resilience engineering consultancy. 107 | 108 | Allspaw tweets as [@allspaw](https://twitter.com/allspaw). 109 | 110 | ### Selected publications 111 | 112 | * [STELLA: Report from the SNAFUcatchers Workshop on Coping with Complexity](https://snafucatchers.github.io/) 113 | * [Trade-Offs Under Pressure: Heuristics and Observations Of Teams Resolving Internet Service Outages](https://www.researchgate.net/publication/295011072_Trade-Offs_Under_Pressure_Heuristics_and_Observations_Of_Teams_Resolving_Internet_Service_Outages) 114 | * [Etsy Debrief Facilitation Guide](http://extfiles.etsy.com/DebriefingFacilitationGuide.pdf) 115 | * [Blameless PostMortems and a Just Culture](https://codeascraft.com/2012/05/22/blameless-postmortems/) (blog) 116 | * [Resilience engineering: learning to embrace failure](https://doi.org/10.1145/2366316.2366331) 117 | * [Fault Injection in Production: Making the case for resiliency testing](http://queue.acm.org/detail.cfm?id=2353017) 118 | * [Technical Debt: Challenges and Perspectives](https://doi.org/10.1109/MS.2017.99) 119 | * [Revealing the Critical Role of Human Performance in Software](https://queue.acm.org/detail.cfm?id=3380776) 120 | * [SRE Cognitive Work] in [Seeking SRE] 121 | * [The infinite hows: An argument against the Five Whys and an alternative approach you can apply](https://www.oreilly.com/radar/the-infinite-hows/) 122 | 123 | [SRE Cognitive Work]: https://www.researchgate.net/publication/343430302_SRE_Cognitive_Work 124 | [Seeking SRE]: https://www.oreilly.com/library/view/seeking-sre/9781491978856/ 125 | 126 | ### Selected talks 127 | 128 | * [Resilience Engineering: The What and How](https://devopsdays.org/events/2019-washington-dc/program/john-allspaw/) 129 | * [Incidents as we Imagine Them Versus How They Actually Are](https://www.youtube.com/watch?v=8DtzmV1jiyQ) 130 | * [How your systems keep running day after day](https://www.youtube.com/watch?v=xA5U85LSk0M) 131 | * [Problem detection (papers we love)](https://www.youtube.com/watch?v=NxctiGRI2y8) 132 | (presentation of [Problem detection] paper) 133 | * [Common Ground and Coordination in Joint Activity (papers we love)](https://paperswelove.org/2016/video/john-allspaw-common-ground/) (presentation of [Common Ground and Coordination in Joint Activity] paper) 134 | * [Amplifying sources of resilience](https://www.infoq.com/presentations/resilience-thinking-paradigm/) (presentation about applying Resilience Engineering thinking & paradigms to the world of software engineering) 135 | * [Incidents: What Is Often Missed & What Can Be Done About That](https://www.adaptivecapacitylabs.com/blog/2020/03/30/incidents-what-is-often-missed-what-can-be-done-about-that/#fvp_10,1s) 136 | * [Incident Analysis: How *Learning* is Different Than *Fixing*](https://www.adaptivecapacitylabs.com/blog/2020/05/06/how-learning-is-different-than-fixing/) 137 | 138 | 139 | ## Lisanne Bainbridge 140 | 141 | Bainbridge is a psychology researcher. She has a website at http://www.complexcognition.co.uk/ 142 | 143 | ### Contributions 144 | 145 | #### Ironies of automation 146 | 147 | Bainbridge is famous for her 1983 [Ironies of automation] paper, which continues to 148 | be frequently cited. 149 | 150 | ## Concepts 151 | * automation 152 | * design errors 153 | * human factors/ ergonomics 154 | * cognitive modelling 155 | * cognitive architecture 156 | * mental workload 157 | * situation awareness 158 | * cognitive error 159 | * skill and training 160 | * interface design 161 | 162 | ## Selected publications 163 | * [Ironies of automation] ([TWRR](https://resilienceroundup.com/issues/35/)) 164 | 165 | 166 | [Ironies of automation]: https://www.sciencedirect.com/science/article/abs/pii/0005109883900468 167 | 168 | ## Andrea Baker 169 | 170 | [Baker](https://www.thehopmentor.com/) is a practitioner who provides 171 | training services in human and organizational performance (HOP) and learning 172 | teams. 173 | 174 | Baker tweets as [@thehopmentor](https://twitter.com/thehopmentor). 175 | 176 | ### Concepts 177 | 178 | * Human and organizational performance (HOP) 179 | * Learning teams 180 | * Industrial empathy 181 | 182 | ### Selected publications 183 | 184 | * [A bit about HOP](https://docs.wixstatic.com/ugd/1a0149_21bcf20f158540098d3d7987ffbf3f58.pdf) (editorial) 185 | * [A short introduction to human and organizational performance (hop) and learning teams](http://www.safetydifferently.com/a-short-introduction-to-human-and-organizational-performance-hop-and-learning-teams/) (blog post) 186 | 187 | ## E. Asher Balkin 188 | 189 | ### Selected publications 190 | 191 | * [Resiliency Trade Space Study: The Interaction of Degraded C2 Link and Detect and Avoid Autonomy on Unmanned Aircraft](https://www.researchgate.net/publication/330222613_Resiliency_Trade_Space_Study_The_Interaction_of_Degraded_C2_Link_and_Detect_and_Avoid_Autonomy_on_Unmanned_Aircraft) 192 | * [Developing Systemic Contributors and Adaptations Diagramming (SCAD): systemic insights, multiple pragmatic implementations] 193 | 194 | ### Selected talks 195 | 196 | * [Root cause and the wrong path](https://www.youtube.com/watch?v=kK6t-gttsJw) 197 | 198 | ## Johan Bergström 199 | 200 | [Bergström](http://www.jbsafety.se/p/about-me.html) is a safety research and 201 | consultant. He runs the [Master Program of Human Factors and Systems 202 | Safety](http://www.humanfactors.lth.se/msc-programme/) at Lund University. 203 | 204 | Bergström tweets as [@bergstrom_johan](https://twitter.com/bergstrom_johan). 205 | 206 | ### Concepts 207 | 208 | * Analytical traps in accident investigation 209 | - Counterfactual reasoning 210 | - Normative language 211 | - Mechanistic reasoning 212 | * Generic competencies 213 | 214 | ### Selected publications 215 | 216 | * [Resilience engineering: Current status of the research and future challenges](https://www.sciencedirect.com/science/article/pii/S0925753516306130) 217 | * [Rule- and role retreat: An empirical study of procedures and resilience](https://www.researchgate.net/publication/50917226_Rule-_and_role_retreat_An_empirical_study_of_procedures_and_resilience) 218 | * [Team Coordination in Escalating Situations: An Empirical Study Using Mid-Fidelity Simulation] 219 | 220 | [Team Coordination in Escalating Situations: An Empirical Study Using Mid-Fidelity Simulation]: https://portal.research.lu.se/ws/files/1376441/3014838.pdf 221 | 222 | ### Selected talks 223 | 224 | * [Three analytical traps in accident investigation](https://www.youtube.com/watch?v=TqaFT-0cY7U) 225 | * [Two Views on Human Error](https://www.youtube.com/watch?v=rHeukoWWtQ8) 226 | * [What, Where and When is Risk in System Design?](https://www.youtube.com/watch?v=BtJIumyCrtE&feature=youtu.be) (Velocity 2013) 227 | 228 | ## Matthieu Branlat 229 | 230 | ### Selected publications 231 | 232 | * [Basic patterns in how adaptive systems fail](https://www.researchgate.net/publication/284324002_Basic_patterns_in_how_adaptive_systems_fail) ([TWRR](https://resilienceroundup.com/issues/34/)) 233 | * [A practitioner’s experiences operationalizing Resilience Engineering] 234 | * [Noticing Brittleness, Designing for Resilience] 235 | 236 | [A practitioner’s experiences operationalizing Resilience Engineering]: https://www.sciencedirect.com/science/article/abs/pii/S0951832015000812 237 | [Noticing Brittleness, Designing for Resilience]: https://www.taylorfrancis.com/chapters/edit/10.1201/9781315605708-18/noticing-brittleness-designing-resilience-elizabeth-lay-matthieu-branlat 238 | 239 | ## Sheuwen Chuang 240 | 241 | ### Selected publications 242 | 243 | * [Beyond surge: Coping with mass burn casualty in the closest hospital to the Formosa Fun Coast Dust Explosion] 244 | * [Coping With a Mass Casualty: Insights into a Hospital’s Emergency Response and Adaptations After the Formosa Fun Coast Dust Explosion] ([TWRR](https://resilienceroundup.com/issues/76/)) 245 | 246 | [Beyond surge: Coping with mass burn casualty in the closest hospital to the Formosa Fun Coast Dust Explosion]: https://doi.org/10.1016/j.burns.2018.12.003 247 | [Coping With a Mass Casualty: Insights into a Hospital’s Emergency Response and Adaptations After the Formosa Fun Coast Dust Explosion]: https://www.researchgate.net/publication/335366770_Coping_With_a_Mass_Casualty_Insights_into_a_Hospital's_Emergency_Response_and_Adaptations_After_the_Formosa_Fun_Coast_Dust_Explosion 248 | 249 | 250 | 251 | ## Todd Conklin 252 | 253 | Conklin's books are on my reading list, but I haven't read anything by him 254 | yet. I have listened to his great [Preaccident investigation 255 | podcast](https://preaccidentpodcast.podbean.com/). 256 | 257 | Conklin tweets as [@preaccident](https://twitter.com/preaccident). 258 | 259 | ### Selected publications 260 | * [Pre-accident investigations: an introduction to organizational safety](https://www.amazon.com/Pre-Accident-Investigations-Todd-Conklin/dp/1409447820) 261 | * [Pre-accident investigations: better questions - an applied approach to 262 | operational learning](https://www.amazon.com/gp/product/1472486137) 263 | * [Do Safety Differently](https://www.amazon.com/Do-Safety-Differently-Sidney-Dekker/dp/B09RM3Z17V) 264 | 265 | ### Selected talks 266 | 267 | Quanta - [Risk and Safety Conf 2019](https://www.youtube.com/watch?v=5WTbeFj2kJY&feature=youtu.be) 268 | 269 | ## Richard I. Cook 270 | 271 | [Cook](https://en.wikipedia.org/wiki/Richard_Cook_(safety_researcher)) was an anasthesiologist who studies failures in complex systems. He is one of the founders [Adaptive Capacity Labs](http://www.adaptivecapacitylabs.com/), a resilience engineering consultancy. 272 | He tweeted as [@ri_cook](https://twitter.com/ri_cook). 273 | 274 | ### Concepts 275 | * how complex systems fail 276 | * degraded mode 277 | * sharp end (c.f. Reason's blunt end) 278 | * Going solid 279 | * Cycle of error 280 | * "new look" 281 | * first vs second stories 282 | 283 | ### Selected publications 284 | 285 | * [A celebration of the work of Richard Cook, MD: A pioneer in understanding accidents, safety, human factors, and resilience](https://www.researchgate.net/publication/371403498_A_celebration_of_the_work_of_Richard_Cook_MD_A_pioneer_in_understanding_accidents_safety_human_factors_and_resilience) 286 | * [How complex systems fail](https://www.adaptivecapacitylabs.com/HowComplexSystemsFail.pdf) ([BH](https://safety177496371.wordpress.com/2022/11/04/how-complex-systems-fail-a-classic-from-richard-cook/)) 287 | * [A brief look at the New Look in complex system failure, error, safety, and resilience](https://www.adaptivecapacitylabs.com/BriefLookAtTheNewLook.pdf) 288 | * [void \*: Incidents as Untyped Pointers. *Where* complex systems fail](https://www.snafucatchers.com/single-post/2017/11/14/void-Incidents-as-Untyped-Pointers) 289 | * [Distancing through differencing: An obstacle to organizational learning following accidents](https://www.researchgate.net/publication/292504703_Distancing_through_differencing_An_obstacle_to_organizational_learning_following_accidents) 290 | * [Being bumpable](http://csel.eng.ohio-state.edu/productions/woodscta/media/beingbump.pdf) ([TWRR](https://www.getrevue.co/profile/resilience/issues/resilience-roundup-being-bumpable-issue-33-177340)) 291 | * [Behind Human Error] 292 | * [Incidents - markers of resilience or brittleness?](https://www.researchgate.net/publication/292504952_Incidents_-_markers_of_resilience_or_brittleness) 293 | * [“Going solid”: a model of system dynamics and consequences for patient safety](https://qualitysafety.bmj.com/content/14/2/130) ([TWRR](https://resilienceroundup.com/issues/going-solid-a-model-of-system-dynamics-and-consequences-for-patient-safety/)) 294 | * [Operating at the Sharp End: The Complexity of Human Error](https://www.researchgate.net/publication/313407259_Operating_at_the_Sharp_End_The_Complexity_of_Human_Error) 295 | * [Patient boarding in the emergency department as a symptom of complexity-induced risks](https://www.researchgate.net/publication/312624891_Patient_boarding_in_the_emergency_department_as_a_symptom_of_complexity-induced_risks) 296 | * [Sensemaking, Safety, and Cooperative Work in the Intensive Care Unit](https://www.researchgate.net/publication/220579381_Sensemaking_Safety_and_Cooperative_Work_in_the_Intensive_Care_Unit) 297 | * [Medication Reconciliation Is a Window into “Ordinary” Work](https://www.taylorfrancis.com/books/e/9781317164777/chapters/10.1201/9781315572529-4) 298 | * [Cognitive consequences of clumsy automation on high workload, high consequence human performance] 299 | * [Implications of automation surprises in aviation for the future of total intravenous anesthesia (TIVA)] 300 | * [The Messy Details: Insights From the Study of Technical Work in Healthcare] 301 | * [Nosocomial automation: technology-induced complexity and human performance] 302 | * [The New Look at Error, Safety, and Failure: A Primer for Health Care] 303 | * [Grounding explanations in evolving, diagnostic situations] 304 | * [A Tale of Two Stories: Contrasting Views of Patient Safety] (appendix B, starting on page 64 (numbered 52) contains the talk by Charles Billings, MD, Chief Scientist (retired), NASA Ames on the lessons learned from incident reporting in aviation. Dr. Billings designed, started, and managed the Aviation Safety REporting System) 305 | * ["Those found responsible have been sacked": some observations on the usefulness of error](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.623.5749&rep=rep1&type=pdf) ([BH](https://safety177496371.wordpress.com/2025/01/26/those-found-responsible-have-been-sacked-some-observations-on-the-usefulness-of-error/)) 306 | * [Perspectives on Human Error: Hindsight Biases and Local Rationality]) 307 | * [Mistaking Error] 308 | * [Adapting to new technology in the operating room] 309 | * [Verite, Abstraction, and Ordinateur Systems in the Evolution of Complex Process Control](https://www.researchgate.net/publication/3657912_Verite_abstraction_and_ordinateur_systems_in_the_evolution_of_complex_process_control) 310 | * [Collaborative Cross-Checking to Enhance Resilience] ([TWRR](https://resilienceroundup.com/issues/73/)) 311 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems] 312 | * [The Role of Automation in Complex System Failures] 313 | * [Thinking about accidents and systems](https://www.researchgate.net/publication/228352596_Thinking_about_accidents_and_systems) 314 | * [The Stockholm blizzard of 2012](https://www.taylorfrancis.com/books/e/9781315605739/chapters/10.1201/9781315605739-11) 315 | * [New Arctic Air Crash Aftermath Role-Play Simulation Orchestrating a Fundamental Surprise] 316 | * [Dissenting Statement: Health IT Is a Class III Medical Device](https://www.nap.edu/read/13269/chapter/14) 317 | * [Nine Steps to Move Forward From Error] ([BH](https://safety177496371.wordpress.com/2022/11/03/nine-steps-to-move-forward-from-error/)) 318 | * [Gaps in the continuity of care and progress on patient safety] 319 | * [Above the Line, Below the Line](https://queue.acm.org/detail.cfm?id=3380777) ([TWRR](https://resilienceroundup.com/issues/68/)) 320 | * [Coping With a Mass Casualty: Insights into a Hospital’s Emergency Response and Adaptations After the Formosa Fun Coast Dust Explosion] ([TWRR](https://resilienceroundup.com/issues/76/)) 321 | * [SRE Cognitive Work] in [Seeking SRE] 322 | * [Building and revising adaptive capacity sharing for technical incident response: A case of resilience engineering](https://www.sciencedirect.com/science/article/pii/S0003687020301903) ([TWRR](https://resilienceroundup.com/issues/building-and-revising-adaptive-capacity-sharing-for-technical-incident-response-a-case-of-resilience-engineering/)) 323 | * [Automation, interaction, complexity, and failure: A case study] 324 | * [Human Performance in Anesthesia] 325 | * [Two years before the mast: Learning how to learn about patient safety](https://www.researchgate.net/publication/285346573_Two_years_before_the_mast_Learning_how_to_learn_about_patient_safety) 326 | * [Resilience is not control: healthcare, crisis management, and ICT] 327 | * [Taking Things in One’s Stride: Cognitive Features of Two Resilient Performances] 328 | * [Human Performance in Anesthesia: A Corpus of Cases] 329 | * [Minding the Gaps: Creating Resilience in Health Care] 330 | * [From Counting Failures to Anticipating Risks: Possible Futures for Patient Safety] 331 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems] 332 | * [Behind Human Error: Taming Complexity to Improve Patient Safety] 333 | * [The Illusion of Explanation] 334 | 335 | 336 | [Behind Human Error]: https://www.amazon.com/Behind-Human-Error-David-Woods/dp/0754678342 337 | [Cognitive consequences of clumsy automation on high workload, high consequence human performance]: https://ntrs.nasa.gov/search.jsp?R=19910011398 338 | [Implications of automation surprises in aviation for the future of total intravenous anesthesia (TIVA)]: https://doi.org/10.1016/S0952-8180(96)90009-4 339 | [The Messy Details: Insights From the Study of Technical Work in Healthcare]: https://doi.org/10.1109%2FTSMCA.2004.836802 340 | [Nosocomial automation: technology-induced complexity and human performance]: https://www.researchgate.net/profile/David_Woods11/publication/224649052_Nosocomial_automation_technology-induced_complexity_and_human_performance/links/59399b1da6fdcc58ae902c49/Nosocomial-automation-technology-induced-complexity-and-human-performance.pdf 341 | [The New Look at Error, Safety, and Failure: A Primer for Health Care]: https://pdfs.semanticscholar.org/67f7/53ec089e5a8879f241e2be867dad0a2026fb.pdf 342 | [Grounding explanations in evolving, diagnostic situations]: https://pdfs.semanticscholar.org/1bed/356b5aa67c701f5bad6d943768622095f418.pdf 343 | [A Tale of Two Stories: Contrasting Views of Patient Safety]: https://www.researchgate.net/publication/245102691_A_Tale_of_Two_Stories_Contrasting_Views_of_Patient_Safety 344 | [Perspectives on Human Error: Hindsight Biases and Local Rationality]: https://www.nifc.gov/PUBLICATIONS/acc_invest_march2010/speakers/Perspectives%20on%20Human%20Error.pdf 345 | [Mistaking Error]: https://www.researchgate.net/publication/328149714_Mistaking_Error 346 | [Adapting to new technology in the operating room]: https://www.researchgate.net/publication/14230576_Adapting_to_New_Technology_in_the_Operating_Room 347 | [Collaborative Cross-Checking to Enhance Resilience]: https://www.researchgate.net/publication/220579448_Collaborative_Cross-Checking_to_Enhance_Resilience 348 | [Resilience Engineering: New directions for measuring and maintaining safety in complex systems]: https://pdfs.semanticscholar.org/a0d3/9cc66adc64e297048a32b71aeee209a451af.pdf 349 | [The Role of Automation in Complex System Failures]: https://www.researchgate.net/publication/232191704_The_Role_of_Automation_in_Complex_System_Failures 350 | [New Arctic Air Crash Aftermath Role-Play Simulation Orchestrating a Fundamental Surprise]: https://www.researchgate.net/publication/2484621_New_Arctic_Air_Crash_Aftermath_Role-Play_Simulation_Orchestrating_a_Fundamental_Surprise 351 | [Nine Steps to Move Forward From Error]: http://csel.eng.ohio-state.edu/productions/pexis/readings/submod4/nine%20steps%20CTW2002.pdf 352 | [Gaps in the continuity of care and progress on patient safety]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1117777/ 353 | [Automation, interaction, complexity, and failure: A case study]: https://doi.org/10.1016/j.ress.2006.01.009 354 | [Human Performance in Anesthesia]: http://dx.doi.org/10.13140/RG.2.2.29675.36648 355 | [Resilience is not control: healthcare, crisis management, and ICT]: https://www.researchgate.net/profile/Robert-Wears/publication/225108705_Resilience_is_Not_Control_Healthcare_Crisis_Management_and_ICT/links/00b49532b2c7f3ed62000000/Resilience-is-Not-Control-Healthcare-Crisis-Management-and-ICT.pdf 356 | [Taking Things in One’s Stride: Cognitive Features of Two Resilient Performances]: https://www.taylorfrancis.com/chapters/edit/10.1201/9781315605685-19/taking-things-one-stride-cognitive-features-two-resilient-performances-richard-cook-christopher-nemeth 357 | [Human Performance in Anesthesia: A Corpus of Cases]: https://www.researchgate.net/publication/347964304_Human_Performance_in_Anesthesia_Human_Performance_in_Anesthesia_Human_Performance_in_Anesthesia 358 | [Minding the Gaps: Creating Resilience in Health Care]: https://europepmc.org/article/NBK/nbk43670 359 | [From Counting Failures to Anticipating Risks: Possible Futures for Patient Safety]: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=ffe74633027ee354ebbf0ff9a6418e75f3b7a047 360 | [Resilience Engineering: New directions for measuring and maintaining safety in complex systems]: https://www.academia.edu/download/83819345/Resilience_Engineering_New_directions_fo20220411-23835-1ipo8pk.pdf 361 | [Behind Human Error: Taming Complexity to Improve Patient Safety]: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=15f31969c4e1f4f599c5c68aa63f3bf930e0406f 362 | [The Illusion of Explanation]: https://onlinelibrary.wiley.com/doi/pdf/10.1197/j.aem.2004.07.001 363 | 364 | ### Selected talks 365 | * [How Complex Systems Fail](https://www.youtube.com/watch?v=2S0k12uZR14) (Velocity 2012) 366 | * [Resilience in Complex Adaptive Systems: Operating at the Edge of Failure](https://www.youtube.com/watch?v=PGLYEDpNu60&feature=youtu.be) (Velocity 2013) 367 | * [Lectures on the study of cognitive work](https://www.youtube.com/playlist?list=PLb1aZTnPf3-OEU1by77zZQQYckvXUGmNY) (Graduate student lecture-discussions at The Royal Institute of Technology, Huddinge, SWEDEN in 2012 ) 368 | * [Panel discussion: Safety Culture, Lean, and DevOps] (DOES 2017) 369 | * [Working at the center of the Cyclone](https://www.youtube.com/watch?v=3ZP98stDUf0&feature=youtu.be) (DOES 2018) 370 | * [A Few Observations on the Marvelous Resilience of Bone & Resilience Engineering](https://www.youtube.com/watch?v=8LbePBiOvZ4) (REdeploy 2019) 371 | 372 | [Panel discussion: Safety Culture, Lean, and DevOps]: https://www.youtube.com/watch?v=gtxtb9z_4FY&feature=youtu.be 373 | 374 | 375 | ## Jean-Christophe Le Coze 376 | 377 | Le Coze is research director at INERIS (National Institute for the Industrial Environment and Risks) in France. 378 | He frequently writes on historical views of safety. 379 | 380 | Le Coze tweets as [@JcLeCoze](https://twitter.com/JcLeCoze). 381 | 382 | ### Selected publications 383 | 384 | * [Managing the Unexpected](https://www.academia.edu/36790092/Managing_the_unexpected) 385 | * [The 'new view' of human error. Origins, ambiguities, success and critiques](https://www.sciencedirect.com/science/article/abs/pii/S0925753522001928) 386 | * [1984-2014. Normal Accident. Was Charles Perrow right for the wrong reasons?](https://www.academia.edu/15301538/1984_2014_Normal_Accident_Was_Charles_Perrow_right_for_the_wrong_reasons) 387 | * [Good and bad reasons: The Swiss cheese model and its critics](https://dx.doi.org/10.1016/j.ssci.2020.104660) 388 | * [Recurring themes in the legacy of Jens Rasmussen](https://doi.org/10.1016/j.apergo.2016.10.002) 389 | * [Reflecting on Jens Rasmussen’s legacy. A strong program for a hard problem](https://www.sciencedirect.com/science/article/pii/S0925753514000848) 390 | * [Reflecting on Jens Rasmussen's legacy (2) behind and beyond, a ‘constructivist turn’](https://www.sciencedirect.com/science/article/abs/pii/S0003687015300429) 391 | 392 | ## Sidney Dekker 393 | 394 | Dekker is a human factors and safety researcher with a background in aviation. 395 | His books aimed at a lay audience (Drift Into Failure, Just Culture, The Field Guide to 'Human Error' investigations) 396 | have been enormously influential. He was a founder of the MSc programme in Human Factors & Systems Safety at Lund University. 397 | His PhD advisor is [David Woods](#david-woods). 398 | 399 | Dekker tweets as [@sidneydekkercom](https://twitter.com/sidneydekkercom). 400 | 401 | ### Contributions 402 | 403 | #### Drift into failure 404 | 405 | Dekker developed the theory of *drift*, characterized by five concepts: 406 | 407 | 1. Scarcity and competition 408 | 1. Decrementalism, or small steps 409 | 1. Sensitive dependence on initial conditions 410 | 1. Unruly technology 411 | 1. Contribution of the protective structure 412 | 413 | #### Just Culture 414 | 415 | Dekker examines how cultural norms defining justice can be re-oriented to minimize the negative impact and maximize learning when things go wrong. 416 | 417 | 1. Retributive justice as society's traditional idea of justice: distributing punishment to those responsible based on severity of the violation 418 | 2. Restorative justice as an improvement for both victims and practicioners: distributing obligations of rebuilding trust to those responsible based on who is hurt and what they need 419 | 3. First, second, and third victims: an incident's negative impact is felt by more than just the obvious victims 420 | 4. Learning theory: people break rules when they have learned there are no negative consequences, and there are actually positive consequences - in other words, they break rules to get things done to meet production pressure 421 | 5. Reporting culture: contributing to reports of adverse events is meant to help the organization understand what went wrong and how to prevent recurrence, but accurate reporting requires appropriate and proportionate accountability actions 422 | 6. Complex systems: normal behavior of practicioners and professionals in the context of a complex system can appear abnormal or deviant in hindsight, particularly in the eyes of non-expert juries and reviewers 423 | 7. The nature of practicioners: professionals want to do good work, and therefore want to be held accountable for their mistakes; they generally want to help similarly-situated professionals avoid the same mistake. 424 | 425 | ### Safety Differently 426 | 427 | - There is a difference between the organization's prescribed processes for completing work and how work is actually completed. (work as imagined vs work as done) 428 | - The difference between work as imagined and work as done is the result of the expertise that exists in your workers from contact with real-life pressures, heuristics, and unexpected conditions. 429 | - Old View: People are the problem to control with process 430 | - They did something wrong 431 | - They need more rules and enforcement 432 | - They need to try harder 433 | - We need to get rid of "bad apples" 434 | - Focus on the "sharp end" of the organization - the people closest to the work 435 | - New View: Work is done adaptively in an uncertain world 436 | - Things go wrong all the time 437 | - Workers often detect and correct these problems 438 | - Local adaptations are a source of organizational expertise 439 | - "What conditions existed that made the selected course of action seem correct to the people involved?" 440 | - Traditional safety interventions have diminishing yields with increasing overhead. Accumulated compliance burden and "safety clutter" makes it harder to get work done *and* to do so safely. 441 | - Safety Clutter is accountable to safety bureaucracy and compliance rather than the safety of the workers or the process 442 | - Safety Clutter is produced by the "blunt end" of the organization without local expertise of what is practicable or practical in-situ 443 | - Safety Clutter represents a broader "deprofessionalization" - a removal of trust and confidence in professionals to do their job well, removing their pride, autonomy, and achievement. 444 | - Paradoxically, Safety Clutter can result from government deregulation - organizations need to self-impose risk controls in the absence of external guidelines. 445 | - Sadly for organizations with Safety Clutter, more internal rules do not equal better legal protection. 446 | - When a process is relatively safe or stable, measurements of bad outcomes lack statistical significance to understand trends or tie trends to interventions. 447 | - Fundamental Regulator Paradox: regulating a system so well that there are no useful measurements left to understand how the system is performing 448 | - Zero Paradox: A study of construction contractors showed more fatal accidents in firms with "goal zero" safety policies than in those without. Non-fatal accidents were similar. 449 | - Risk Secrecy: "goal zero" commitments result in injury underreporting and hiding of incidents which prevents learning, particularly when tied to financial incentives for leadership. 450 | - There are patterns (capacities) that help things go well 451 | - _Diversity of opinion_ - possibility to voice dissent 452 | - _Keeping the discussion on risk alive_ even when things go well 453 | - _Deference to expertise_ that already exists in people at the sharp end 454 | - _Psychological safety_ / "stop" ability 455 | - _Low barriers_ to interaction between organizational groups 456 | - _Sharp end improvements_ to existing systems based on local expertise 457 | - _Pride in work_ - process and results 458 | - Rapid problem-solving can prevent effective problem-understanding 459 | - Leadership buy-in and practice of New View safety is imperative to its success. It's also difficult to foster. 460 | - Worker buy-in is rapid and fits their existing mental model 461 | - Leadership must abandon the mental model that has governed their past work and decision-making - difficult for anyone. 462 | - Peer discussions are especially helpful for leadership 463 | - Highlighting how local adaptations helped things go well also helps 464 | 465 | ### Concepts 466 | * Drift into failure 467 | * Safety differently 468 | * New view vs old view of human performance & error 469 | * Just culture 470 | * complexity 471 | * broken part 472 | * Newton-Descartes 473 | * diversity 474 | * systems theory 475 | * unruly technology 476 | * decrementalism 477 | * generic competencies 478 | * work as imagined vs work as done 479 | 480 | ### Selected publications 481 | 482 | * [Drift into failure](https://www.amazon.com/Drift-into-Failure-Sidney-Dekker/dp/1409422216) 483 | * [Reconstructing human contributions to accidents: the new view on error and performance](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.411.4985&rep=rep1&type=pdf) 484 | * [The field guide to understanding 'human error'](https://www.amazon.com/Field-Guide-Understanding-Human-Error/dp/1472439058s://www.amazon.com/Field-Guide-Understanding-Human-Error/dp/1472439058) 485 | * [Behind Human Error] 486 | * [Rule- and role retreat: An empirical study of procedures and resilience](https://www.researchgate.net/publication/50917226_Rule-_and_role_retreat_An_empirical_study_of_procedures_and_resilience?enrichId=rgreq-23625e555a0d8e5250c74f24b5fd01ca-XXX&enrichSource=Y292ZXJQYWdlOzUwOTE3MjI2O0FTOjk3MzU5NjY5MjM1NzQ1QDE0MDAyMjM3NjI5NDY%3D&el=1_x_2&_esc=publicationCoverPdf) 487 | * [Anticipating the effects of technological change: A new era of dynamics for human factors](https://www.researchgate.net/publication/247512351_Anticipating_the_effects_of_technological_change_A_new_era_of_dynamics_for_human_factors) 488 | * [Why do things go right?](http://www.safetydifferently.com/why-do-things-go-right/) 489 | * [Six stages to the new view of human error](http://www.humanfactors.lth.se/fileadmin/lusa/Sidney_Dekker/articles/2007/SafetyScienceMonitor.pdf) 490 | * [Employees: A Problem to Control or Solution to Harness?](http://sidneydekker.com/wp-content/uploads/2014/08/DekkerPS2014.pdf) 491 | * [Team Coordination in Escalating Situations: An Empirical Study Using Mid-Fidelity Simulation] 492 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems] 493 | * [Illusions of explanation: A critical essay on error classification](http://www.humanfactors.lth.se/fileadmin/lusa/Sidney_Dekker/articles/2003_and_before/Illusions_of_explanation.pdf) 494 | * [Failure to adapt or adaptations that fail: contrasting models on procedures and safety](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.606.3361&rep=rep1&type=pdf) 495 | * [Human factors and folk models] 496 | * [The High Reliability Organization Perspective] ([TWRR](https://resilienceroundup.com/issues/09/)) 497 | * [Illusions of explanation: A critical essay on error classification](http://www.humanfactors.lth.se/fileadmin/lusa/Sidney_Dekker/articles/2003_and_before/Illusions_of_explanation.pdf) ([TWRR](https://resilienceroundup.com/issues/42/)) 498 | * [Safety II professionals: How resilience engineering can transform safety practice] ([TWRR](https://resilienceroundup.com/issues/64/)) 499 | * [The complexity of failure: implications of complexity theory for safety investigation](https://static1.squarespace.com/static/53b78765e4b0949940758017/t/5722beb0d51cd4d11675a69c/1461894833950/Dekker%2C+Cilliers+and+Hofmeyr+-+The+Complexity+of+Failure.pdf) 500 | * [The Safety Anarchist](https://www.amazon.com/Safety-Anarchist-innovation-bureaucracy-compliance/dp/1138300462)) 501 | * [Compliance Capitalism](https://www.amazon.com/Compliance-Capitalism-Overregulated-Management-Neoliberalism/dp/1032012366) 502 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems] 503 | * [Drifting into failure: Complexity theory and the management of risk](https://maritimesafetyinnovationlab.org/wp-content/uploads/2021/03/DekkerDriftRiskChapter2013.pdf) ([BH 1](https://safety177496371.wordpress.com/2025/05/03/drifting-into-failure-complexity-theory-and-the-management-of-risk/) [BH 2](https://safety177496371.wordpress.com/2025/05/03/complex-systems-and-drifting-into-failure-further-extracts-from-dekker-2013/)) 504 | 505 | [Human factors and folk models]: https://link.springer.com/article/10.1007%2Fs10111-003-0136-9 506 | [The High Reliability Organization Perspective]: http://sidneydekker.com/wp-content/uploads/2013/01/CH005.pdf 507 | [Safety II professionals: How resilience engineering can transform safety practice]: https://doi.org/10.1016/j.ress.2019.106740 508 | 509 | ### Selected talks 510 | 511 | * [Panel discussion: Safety Culture, Lean, and DevOps] 512 | 513 | 514 | ## John C. Doyle 515 | 516 | [Doyle](http://www.cds.caltech.edu/~doyle/wiki/index.php?title=Main_Page) is a 517 | control systems researcher. He is seeking to identify the universal laws that capture the 518 | behavior of resilient systems, and is concerned with the architecture of such 519 | systems. 520 | 521 | ### Concepts 522 | * Robust yet fragile 523 | * layered architectures 524 | * constraints that deconstrain 525 | * protocol-based architectures 526 | * emergent constraints 527 | * Universal laws and architectures 528 | * conservation laws 529 | * universal architectures 530 | * Highly optimized tolerance 531 | * Doyle's catch 532 | 533 | #### Doyle's catch 534 | 535 | *Doyle's catch* is a term introduced by David Woods, but attributed to John Doyle. Here's how 536 | [Woods quotes Doyle](https://www.researchgate.net/publication/303832480_The_Risks_of_Autonomy_Doyles_Catch): 537 | 538 | > Computer-based simulation and rapid prototyping tools are now broadly available and powerful enough that it is 539 | > relatively easy to demonstrate almost anything, provided that conditions are made sufficiently idealized. 540 | > However, the real world is typically far from idealized, and thus a system must have enough robustness in order to close 541 | > the gap between demonstration and the real thing. 542 | 543 | 544 | ### Selected publications 545 | 546 | * [Universal Laws and Architectures](http://www.cis.upenn.edu/~ngns/docs/Review_2010/Doyle%20MURI%202010.pdf) (slides) 547 | * [Contrasting Views of Complexity and Their Implications For Network-Centric Infrastructures](http://dx.doi.org/10.1109/TSMCA.2010.2048027) 548 | * [Architecture, constraints, and behavior](https://www.pnas.org/content/108/Supplement_3/15624) 549 | * [The “robust yet fragile” nature of the Internet](https://doi.org/10.1073/pnas.0501426102) 550 | * [Highly Optimized Tolerance: Robustness and Design in Complex Systems](http://dx.doi.org/10.1103/physrevlett.84.2529) 551 | * [Robust efficiency and actuator saturation explain healthy heart rate control and variability](https://doi.org/10.1073/pnas.1401883111) 552 | 553 | ## Bob Edwards 554 | 555 | [Edwards](http://hopcoach.net/) is a practitioner who provides 556 | training services in human and organizational performance (HOP). 557 | 558 | Edwards tweets as [@thehopcoach](https://twitter.com/thehopcoach). 559 | 560 | ## Anders Ericsson 561 | 562 | Ericsson introduced the idea of *deliberate practice* as a mechanism for 563 | achieving high level of expertise. 564 | 565 | Ericsson isn't directly associated with the field of resilience engineering. 566 | However, Gary Klein's work is informed by his, and I have a particular 567 | interest in how people improve in expertise, so I'm including him here. 568 | 569 | ### Concepts 570 | 571 | * Expertise 572 | * Deliberate practice 573 | * Protocol analysis 574 | 575 | ### Selected publications 576 | 577 | * [Peak: secrets from the new science of expertise](https://www.amazon.com/Peak-Secrets-New-Science-Expertise/dp/1531864880/) 578 | * [Protocol analysis: verbal reports as data](https://www.amazon.com/Protocol-Analysis-Revd-Verbal-Reports/dp/0262550237) 579 | 580 | ## Paul Feltovich 581 | 582 | [Feltovich](https://www.ihmc.us/groups/pfeltovich/) is a retired Senior Research Scientist at the Florida Institute for Human & Machine Cognition (IHMC), 583 | who has done extensive reserach in human expertise. 584 | 585 | ### Selected publications 586 | 587 | * [Common Ground and Coordination in Joint Activity] 588 | * [Issue of expert flexibility in contexts characterized by complexity and change](https://www.researchgate.net/publication/232465540_Issue_of_expert_flexibility_in_contexts_characterized_by_complexity_and_change) 589 | * [A rose by any other name...would probably be given an acronym] 590 | * [Learners' (mis)understanding of important and difficult concepts: a challenge to smart machines in education](https://www.researchgate.net/publication/234818797_Learners'_misunderstanding_of_important_and_difficult_concepts_a_challenge_to_smart_machines_in_education) 591 | * [Ten challenges for making automation a team player] ([TWRR](https://resilienceroundup.com/issues/ten-challenges-for-making-automation-a-team-player-in-joint-human-agent-activity/)) 592 | * [Issue of expert flexibility in contexts characterized by complexity and change](https://www.researchgate.net/publication/232465540_Issue_of_expert_flexibility_in_contexts_characterized_by_complexity_and_change) 593 | 594 | [Common Ground and Coordination in Joint Activity]: http://jeffreymbradshaw.net/publications/Common_Ground_Single.pdf 595 | [A rose by any other name...would probably be given an acronym]: https://www.researchgate.net/publication/3454029_A_rose_by_any_other_namewould_probably_be_given_an_acronym 596 | [Ten challenges for making automation a team player]: https://ieeexplore.ieee.org/abstract/document/1363742 597 | 598 | ## Meir Finkel 599 | 600 | Finkel is a Colonel in the Israeli Defense Force (IDF) and the Director of the IDF's Ground Forces Concept Development and Doctrine Department 601 | 602 | ### Selected publications 603 | * [On Flexibility: Recovery from Technological and Doctrinal Surprise on the Battlefield](https://www.amazon.com/Flexibility-Recovery-Technological-Doctrinal-Battlefield/dp/0804774897/ref=sr_1_3?ie=UTF8&qid=1546046916&sr=8-3&keywords=on+flexibility) 604 | 605 | ## Marisa Grayson 606 | 607 | [Grayson](https://www.linkedin.com/in/marisa-grayson/) is a cognitive systems engineer at Mile Two, LLC. 608 | 609 | ### Selected Publications 610 | 611 | * [Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems](https://www.researchgate.net/publication/333091997_Approaching_Overload_Diagnosis_and_Response_to_Anomalies_in_Complex_and_Automated_Production_Software_Systems) 612 | * [Cognitive Work of Hypothesis Exploration During Anomaly Response](https://queue.acm.org/detail.cfm?id=3380778) 613 | 614 | ## Ivonne Andrade Herrera 615 | 616 | [Herrera](https://www.ntnu.edu/employees/ivonne.a.herrera) is an associate professor in 617 | the department of industrial economics and technology management at NTNU and a 618 | senior research scientist at SINTEF. Her areas of expertise include safety management and 619 | resilience engineering in avionics and air traffic management. 620 | 621 | ### Selected publications 622 | 623 | * [Organisational accidents and resilient organisations: six perspectives](https://www.sintef.no/globalassets/upload/teknologi_og_samfunn/sikkerhet-og-palitelighet/rapporter/sintef-a17034-organisational-accidents-and-resilience-organisations-six-perspectives.-revision-2.pdf) (SINTEF A17034 report) 624 | 625 | See also: [list of publications](https://wo.cristin.no/as/WebObjects/cristin.woa/wa/fres?sort=ar&pnr=30556&action=sok) 626 | 627 | 628 | ## Robert Hoffman 629 | 630 | [Hoffman](https://www.ihmc.us/groups/rhoffman/) is a senior research scientist at Florida Institute for Human & Machine Cognition (IHMC), 631 | who has done extensive reserach in human expertise. 632 | 633 | ### Selected publications 634 | 635 | * [Measuring resilience](https://journals.sagepub.com/doi/abs/10.1177/0018720816686248) 636 | * [Myths of automation and their implications for military procurement] 637 | * [The Seven Deadly Myths of "Autonomous Systems"] 638 | * [A rose by any other name...would probably be given an acronym] 639 | * [Seeing the invisible: perceptual-cognitive aspects of expertise](https://cmapspublic3.ihmc.us/rid=1G9NSY15K-N7MJMZ-LC5/SeeingTheInvisible.pdf) 640 | * [Toward a Theory of Complex and Cognitive Systems] 641 | * [Macrocognition] ([TWRR](https://resilienceroundup.com/issues/62/)) 642 | 643 | [Myths of automation and their implications for military procurement]:https://www.researchgate.net/publication/326000581_Myths_of_automation_and_their_implications_for_military_procurement 644 | 645 | [The Seven Deadly Myths of "Autonomous Systems"]: https://www.researchgate.net/publication/260304859_The_Seven_Deadly_Myths_of_Autonomous_Systems 646 | 647 | [Toward a Theory of Complex and Cognitive Systems]: https://www.researchgate.net/publication/3454245_Toward_a_Theory_of_Complex_and_Cognitive_Systems 648 | 649 | [Macrocognition]: https://pdfs.semanticscholar.org/df74/b2909f54b41a485cd4c0189fc4aa19d176d0.pdf 650 | 651 | 652 | ### Concepts 653 | 654 | #### Seven deadly myths of autonomous systems: 655 | 656 | 1. "Autonomy" is unidimensional. 657 | 2. The conceptualization of "levels of autonomy" is a useful scientific grounding for the development of autonomous system roadmaps. 658 | 3. Autonomy is a widget. 659 | 4. Autonomous systems are autonomous. 660 | 5. Once achieved, full autonomy obviates the need for human-machine collaboration. 661 | 6. As machines acquire more autonomy, they will work as simple sibstitutes (or multipliers) of human capability 662 | 7. "Full autonomy" is not only possible, but is always desireable. 663 | 664 | ## Erik Hollnagel 665 | 666 | ### Contributions 667 | 668 | #### ETTO principle 669 | 670 | Hollnagel proposed that there is always a fundamental tradeoff between 671 | efficiency and thoroughness, which he called the *ETTO principle*. 672 | 673 | #### Safety-I vs. Safety-II 674 | 675 | Safety-I: avoiding things that go wrong 676 | * looking at what goes wrong 677 | * bimodal view of work and activities (acceptable vs unacceptable) 678 | * find-and-fix approach 679 | * prevent transition from 'normal' to 'abnormal' 680 | * causality credo: believe that adverse outcomes happen because something goes 681 | wrong (they have causes that can be found and treated) 682 | * it either works or it doesn't 683 | * systems are decomposable 684 | * functioning is bimodal 685 | 686 | Safety-II: performance variability rather than bimodality 687 | * the system’s ability to succeed under varying conditions, so that the number 688 | of intended and acceptable outcomes (in other words, everyday activities) is 689 | as high as possible 690 | * performance is always variable 691 | * performance variation is ubiquitous 692 | * things that go right 693 | * focus on frequent events 694 | * remain sensitive to possibility of failure 695 | * be thorough as well as efficient 696 | 697 | #### FRAM 698 | 699 | Hollnagel proposed the Functional Resonance Analysis Method (FRAM) for modeling 700 | complex socio-technical systems. 701 | 702 | 703 | #### Four abilities necessary for resilient performance 704 | * respond 705 | * monitor 706 | * learn 707 | * anticipate 708 | 709 | ### Concepts 710 | * ETTO (efficiency thoroughness tradeoff) principle 711 | * FRAM (functional resonance analysis method) 712 | * Safety-I and Safety-II 713 | * things that go wrong vs things that go right 714 | * causality credo 715 | * performance variability 716 | * bimodality 717 | * emergence 718 | * work-as-imagined vs. work-as-done 719 | * joint cognitive systems 720 | * systems of the first, second, third, fourth kind 721 | 722 | ### Selected publications 723 | 724 | * [The ETTO Principle: Efficiency-Thoroughness Trade-Off: Why Things That Go Right Sometimes Go Wrong](https://www.amazon.com/ETTO-Principle-Efficiency-Thoroughness-Trade-Off-Sometimes/dp/0754676781/ref=sr_1_1?s=books&ie=UTF8&qid=1545965837&sr=1-1&keywords=etto+principle) 725 | * [From Safety-I to Safety-II: A White Paper](https://www.skybrary.aero/bookshelf/books/2437.pdf) 726 | * [Safety-II in Practice](https://www.amazon.com/Safety-II-Practice-Developing-Resilience-Potentials/dp/1138708925) 727 | * [Safety-I and Safety-II: The past and future of safety management](https://www.amazon.com/gp/product/1472423089/ref=dbs_a_def_rwt_bibl_vppi_i0) 728 | * [FRAM: The Functional Resonance Analysis Method: Modelling Complex Socio-technical System](https://www.amazon.com/gp/product/B010WIDYE8/ref=dbs_a_def_rwt_bibl_vppi_i15) 729 | * [Joint Cognitive Systems: Patterns in Cognitive Systems Engineering](https://www.amazon.com/gp/product/0849339332/ref=x_gr_w_bb?ie=UTF8&tag=x_gr_w_bb-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0849339332&SubscriptionId=1MGPYB6YW3HWK55XCGG2) 730 | * [Resilience Engineering: Concepts and Precepts] 731 | * [I want to believe: some myths about the management of industrial safety](http://dx.doi.org/10.1007/s10111-012-0237-4) 732 | * [Resilience engineering – Building a Culture of Resilience](http://www.ptil.no/getfile.php/1325150/PDF/Seminar%202013/Integrerte%20operasjoner/Hollnagel_RIO_presentation.pdf) (slides) 733 | * [Anomaly Response] 734 | * [Cognitive Systems Engineering: New wine in new bottles] ([TWRR](https://www.getrevue.co/profile/resilience/issues/resilience-roundup-cognitive-systems-engineering-new-wine-in-new-bottles-issue-32-175912)) 735 | * [Epilogue: Resilience Engineering Precepts](https://www.researchgate.net/publication/265074845_Epilogue_Resilience_Engineering_Precepts) 736 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems] 737 | * [Resilience Engineering](https://erikhollnagel.com/ideas/resilience-engineering.html) (web essay) 738 | * [RAG - Resilience Analysis Grid](http://erikhollnagel.com/onewebmedia/RAG%20Outline%20V2.pdf) 739 | * [Resilience engineering in practice: a guidebook] 740 | * [Mapping Cognitive Demands in Complex Problem-Solving Worlds] (mentions disturbance management) 741 | * [Human factors and folk models] 742 | * [Designing for joint cognitive systems](https://www.researchgate.net/publication/4213914_Designing_for_joint_cognitive_systems) 743 | * [Macrocognition] ([TWRR](https://resilienceroundup.com/issues/62/)) 744 | * [A day when (Almost) nothing happened](https://www.sciencedirect.com/science/article/abs/pii/S0925753521004719) 745 | * [Minding the Gaps: Creating Resilience in Health Care] 746 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems] 747 | * [Understanding Accidents - From Root Causes to Performance Variability](https://www.researchgate.net/publication/3973687_Understanding_accidents-from_root_causes_to_performance_variability) ([BH](https://safety177496371.wordpress.com/2025/03/12/understanding-accidents-from-root-causes-to-performance-variability/)) 748 | 749 | 750 | [Resilience Engineering: Concepts and Precepts]: https://www.amazon.com/gp/product/B009KNDF64/ref=x_gr_w_glide_bb?ie=UTF8&tag=x_gr_w_glide_bb-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=B009KNDF64&SubscriptionId=1MGPYB6YW3HWK55XCGG2 751 | [Anomaly Response]: https://docs.wixstatic.com/ugd/3ad081_f46dda684154447583c8a5b282b60cc2.pdf 752 | [Cognitive Systems Engineering: New wine in new bottles]: https://www.ida.liu.se/~729A15/mtrl/CSEnew.pdf?utm_campaign=Resilience%20Roundup&utm_medium=email&utm_source=Revue%20newsletter 753 | [Resilience Roundup]: https://resilienceroundup.com/ 754 | [Mapping Cognitive Demands in Complex Problem-Solving Worlds]: https://www.researchgate.net/publication/220108174_Mapping_Cognitive_Demands_in_Complex_Problem-Solving_Worlds 755 | 756 | ## Leila Johannesen 757 | 758 | [Johannesen](https://www.linkedin.com/in/leilajohannesen/) is currently a UX researcher and community advocate at IBM. 759 | Her PhD dissertation work examined how humans cooperate, including studies of anesthesiologists. 760 | 761 | ### Concepts 762 | 763 | * common ground 764 | 765 | ### Selected publications 766 | 767 | * [Grounding explanations in evolving, diagnostic situations] 768 | * [Maintaining common ground: an analysis of cooperative communication in the operating room](https://www.abdn.ac.uk/iprc/documents/Communication%20Book%20Chapter.pdf) 769 | * [Behind Human Error] 770 | 771 | 772 | ## Gary Klein 773 | 774 | Klein studies how experts are able to quickly make effective decisions in high-tempo situations. 775 | 776 | Klein tweets as [@KleInsight](https://twitter.com/KleInsight). 777 | 778 | ### Concepts 779 | 780 | * naturalistic decision making (NDM) 781 | * intuitive expertise 782 | * cognitive task analysis 783 | * common ground 784 | * problem detection 785 | * automation as a "team player" 786 | 787 | ### Selected publications 788 | 789 | * [Sources of power: how people make decisions](https://www.amazon.com/gp/product/0262534290/ref=dbs_a_def_rwt_bibl_vppi_i0) 790 | * [Common Ground and Coordination in Joint Activity] 791 | * [Working minds: a practitioner's guide to cognitive task analysis](https://www.amazon.com/gp/product/0262532816/ref=dbs_a_def_rwt_bibl_vppi_i5) 792 | * [Patterns in Cooperative Cognition](https://www.researchgate.net/publication/262449980_Patterns_in_Cooperative_Cognition) 793 | * [Can We Trust Best Practices? Six Cognitive Challenges of Evidence-Based Approaches] 794 | * [Conditions for intuitive expertise: a failure to disagree](http://dx.doi.org/10.1037/a0016755) 795 | * [Problem detection] 796 | * [Ten challenges for making automation a team player] ([TWRR](https://resilienceroundup.com/issues/66)) 797 | * [Decision making in action: models and methods](http://www.macrocognition.com/documents/Decision-Making-in-Action-Models-and-Methods-0316.pdf) 798 | * [Critical decision method for eliciting knowledge](https://ieeexplore.ieee.org/document/31053) 799 | * [A recognition-primed decision (RPD) model of rapid decision making](https://pdfs.semanticscholar.org/0672/092ecc507fb41d81e82d2986cf86c4bff14f.pdf) 800 | * [Seeing the invisible: perceptual-cognitive aspects of expertise](https://cmapspublic3.ihmc.us/rid=1G9NSY15K-N7MJMZ-LC5/SeeingTheInvisible.pdf) 801 | * [Patterns in Cooperative Cognition] 802 | * [The strengths and limitations of teams for detecting problems](https://link.springer.com/article/10.1007/s10111-005-0024-6) 803 | * [Macrocognition] ([TWRR](https://resilienceroundup.com/issues/62/)) 804 | 805 | [Problem detection]: https://www.researchgate.net/publication/220579480_Problem_detection 806 | [Patterns in Cooperative Cognition]: https://www.researchgate.net/publication/262449980_Patterns_in_Cooperative_Cognition 807 | [Can We Trust Best Practices? Six Cognitive Challenges of Evidence-Based Approaches]: https://journals.sagepub.com/doi/abs/10.1177/1555343416637520?journalCode=edma 808 | 809 | ### Selected talks 810 | 811 | * [Problem detection](https://www.youtube.com/watch?v=UXx51qK4ItQ&feature=emb_title) 812 | 813 | ## Elizabeth Lay 814 | 815 | Elizabeth Lay is a resilience engineering practitioner. She is currently a director of safety and human performance at Lewis Tree Service. 816 | 817 | ### Selected publications 818 | 819 | * [Noticing Brittleness, Designing for Resilience] 820 | * [A practitioner’s experiences operationalizing Resilience Engineering] 821 | 822 | ## Nancy Leveson 823 | 824 | Nancy Leveson is a computer science researcher with a focus in software safety. 825 | 826 | ### Contributions 827 | 828 | #### STAMP 829 | 830 | Leveson developed the accident causality model known as STAMP: the Systems-Theoretic Accident Model and Process. 831 | 832 | See [STAMP](STAMP.md) for some more detailed notes of mine. 833 | 834 | ### Concepts 835 | 836 | * Software safety 837 | * STAMP (systems-theoretic accident model and processes) 838 | * STPA (system-theoretic process analysis) hazard analysis technique 839 | * CAST (causal analysis based on STAMP) accident analysis technique 840 | * Systems thinking 841 | * hazard 842 | * interactive complexity 843 | * system accident 844 | * dysfunctional interactions 845 | * safety constraints 846 | * control structure 847 | * dead time 848 | * time constants 849 | * feedback delays 850 | 851 | ### Selected publications 852 | * [A New Accident Model for Engineering Safer Systems](http://sunnyday.mit.edu/accidents/safetyscience-single.pdf) 853 | * [Engineering a safer world](https://mitpress.mit.edu/books/engineering-safer-world) 854 | * [STPA Handbook](http://psas.scripts.mit.edu/home/get_file.php?name=STPA_handbook.pdf) 855 | * [Safeware](https://www.amazon.com/Safeware-Computers-Nancy-G-Leveson/dp/0201119722) 856 | * [Resilience Engineering: Concepts and Precepts](https://www.amazon.com/gp/product/B009KNDF64/ref=x_gr_w_glide_bb?ie=UTF8&tag=x_gr_w_glide_bb-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=B009KNDF64&SubscriptionId=1MGPYB6YW3HWK55XCGG2) 857 | * [High-pressure steam engines and computer software](http://dx.doi.org/10.1145/143062.143076) 858 | * [Resilience Engineering: Concepts and Precepts] 859 | 860 | ## Carl Macrae 861 | 862 | [Macrae](https://www.nottingham.ac.uk/business/people/lizcjm.html) is a social psychology 863 | researcher who has done safety research in multiple domains, including aviation 864 | and healthcare. He helped set up the new healthcare investigation agency in 865 | England. He is currently a professor of organizational behavior and psychology 866 | at the Notthingham University Business School. 867 | 868 | Macrae tweets at [@CarlMacrae](https://twitter.com/CarlMacrae). 869 | 870 | ### Concepts 871 | 872 | * risk resilience 873 | 874 | ### Selected publications 875 | 876 | * [Close calls](http://www.closecalls.cc/) 877 | * [Early warnings, weak signals and learning from healthcare disasters](https://qualitysafety.bmj.com/content/23/6/440) 878 | 879 | ## Laura Maguire 880 | 881 | [Maguire](https://www.linkedin.com/in/lauramaguire/) is a cognitive systems 882 | engineering researcher with a PhD from Ohio State 883 | University. Maguire has done safety work in multiple domains, including 884 | forestry, avalanches, and software services. She currently works as a researcher 885 | at [jeli.io](jeli.io) 886 | 887 | Maguire tweets as [@LauraMDMaguire](https://twitter.com/lauramdmaguire). 888 | 889 | ### Selected publications 890 | 891 | * [Managing the Hidden Costs of Coordination](https://queue.acm.org/detail.cfm?id=3380779) 892 | * [Controlling the Costs of Coordination in Large-scale Distributed Software Systems](http://rave.ohiolink.edu/etdc/view?acc_num=osu1593661547087969) (PhD dissertation) 893 | * [Howie: The Post-Incident Guide](https://www.jeli.io/howie-the-post-incident-guide/) 894 | 895 | ### Selected talks 896 | 897 | * [How Many Is Too Much? Exploring Costs of Coordination During Outages](https://www.infoq.com/presentations/incident-command-system/) 898 | * [Mental models – why saying “I didn’t know it worked that way” is a sign of expertise not incompetence](https://www.youtube.com/watch?v=VEprjLtHzg0) 899 | * [Operating at the edge of the envelope](https://re-deploy.io/videos/27-maguire.html) 900 | 901 | ## Christopher Nemeth 902 | 903 | [Nemeth](https://www.linkedin.com/in/christopher-nemeth-6651204) is a principal scientist at Applied Resesarch Associates, Inc. 904 | 905 | ### Selected publications 906 | 907 | * [Replacing Hindsight With Insight: Toward Better Understanding of Diagnostic Failures] 908 | * [Resilience is not control: healthcare, crisis management, and ICT] 909 | * [Taking Things in One’s Stride: Cognitive Features of Two Resilient Performances] 910 | * [Minding the Gaps: Creating Resilience in Health Care] 911 | 912 | 913 | [Replacing Hindsight With Insight: Toward Better Understanding of Diagnostic Failures]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.458.7283&rep=rep1&type=pdf 914 | 915 | ## Anne-Sophie Nyssen 916 | 917 | [Nyssen](http://www.lecit.ulg.ac.be/equipe/anne-sophie-nyssen/) is a psychology professor at the University of Liège, 918 | who does research on human error in complex systems, in particular in medicine. 919 | 920 | A list of publications can be found on her website linked above. 921 | 922 | 923 | ## Elinor Ostrom 924 | 925 | [Ostrom](http://www.elinorostrom.com/) was a Nobel-prize winning economics and 926 | political science researcher. 927 | 928 | ### Selected publications 929 | * [Coping with tragedies of the commons](https://www.annualreviews.org/doi/abs/10.1146/annurev.polisci.2.1.493) 930 | * [Governing the Commons: The Evolution of Institutions for Collective Action](https://www.amazon.com/Governing-Commons-Evolution-Institutions-Collective/dp/1107569788) 931 | 932 | ### Concepts 933 | 934 | * tragedy of the commons 935 | * polycentric governance 936 | * social-ecological system framework 937 | 938 | ## Jean Pariès 939 | 940 | Pariès is the president of [Dédale](http://www.dedale.net/dedale_en/), a safety and human factors consultancy. 941 | 942 | ### Selected publications 943 | * [Resilience engineering in practice: a guidebook] 944 | 945 | 946 | [Resilience engineering in practice: a guidebook]: https://www.crcpress.com/Resilience-Engineering-in-Practice-A-Guidebook/Paries-Wreathall-Hollnagel/p/book/9781472420749 947 | ### Selected talks 948 | 949 | * [Predicting The fatal flaws: The challenge of The unpredictable...](paries-keynote-2015.pptx) 950 | 951 | ## Emily Patterson 952 | 953 | [Patterson](https://hrs.osu.edu/faculty-and-staff/faculty-directory/patterson-emily) 954 | is a researcher who applies human factors engineering to improve patient safety 955 | in healthcare. 956 | 957 | ### Selected publications 958 | 959 | * [Patient boarding in the emergency department as a symptom of complexity-induced risks](https://www.researchgate.net/publication/312624891_Patient_boarding_in_the_emergency_department_as_a_symptom_of_complexity-induced_risks) 960 | * [Using observational study as a tool for discovery: uncovering cognitive and collaborative demands and adaptive strategies] 961 | * [Voice Loops as Coordination Aids in Space Shuttle Mission Control] 962 | * [Functionally distributed coordination during anomaly response in space shuttle mission control] 963 | * [Patterns in Cooperative Cognition] 964 | * [Collaborative Cross-Checking to Enhance Resilience] ([TWRR](https://resilienceroundup.com/issues/73/)) 965 | * [New Arctic Air Crash Aftermath Role-Play Simulation Orchestrating a Fundamental Surprise] 966 | * [Handoff strategies in settings with high consequences for failure: lessons for health care operations] ([TWRR](https://resilienceroundup.com/issues/56)) 967 | * [How Unexpected Events Produce An Escalation Of Cognitive And Coordinative Demands] ([TWRR](https://resilienceroundup.com/issues/how-unexpected-events-produce-an-escalation-of-cognitive-and-coordinative-demands/)) 968 | * [Communication Strategies from High-reliability Organizations: Translation is Hard Work](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1876978/) ([TWRR](https://resilienceroundup.com/issues/communication-strategies-from-high-reliability-organizations-translation-is-hard-work/)) 969 | * [Understanding rigor in information analysis] 970 | * [Behind Human Error: Taming Complexity to Improve Patient Safety] 971 | 972 | [Using observational study as a tool for discovery: uncovering cognitive and collaborative demands and adaptive strategies]: https://www.researchgate.net/profile/Emily_Patterson2/publication/237138704_USING_OBSERVATIONAL_STUDY_AS_A_TOOL_FOR_DISCOVERY_UNCOVERING_COGNITIVE_AND_COLLABORATIVE_DEMANDS_AND_ADAPTIVE_STRATEGIES/links/0deec52c8e310b385a000000.pdf 973 | 974 | [Voice Loops as Coordination Aids in Space Shuttle Mission Control]: https://www.semanticscholar.org/paper/Voice-Loops-as-Coordination-Aids-in-Space-Shuttle-Patterson-Watts-Perotti/068dfee1a859a63fa2ef82f008d239e6a81ed004 975 | 976 | [Functionally distributed coordination during anomaly response in space shuttle mission control]: https://www.researchgate.net/publication/3657906_Functionally_distributed_coordination_during_anomaly_response_inspace_shuttle_mission_control 977 | 978 | [How Unexpected Events Produce An Escalation Of Cognitive And Coordinative Demands]: http://csel.eng.ohio-state.edu/productions/laws/laws_mediapaper/2_4_escalation.pdf 979 | 980 | [Handoff strategies in settings with high consequences for failure: lessons for health care operations]: https://www.researchgate.net/publication/8648890_Handoff_strategies_in_settings_with_high_consequences_for_failure_Lessons_for_health_care_operations 981 | 982 | [Understanding rigor in information analysis]: https://www.researchgate.net/publication/228809190_Understanding_rigor_in_information_analysis 983 | 984 | 985 | 986 | ## Charles Perrow 987 | 988 | Perrow is a sociologist who studied the Three Mile Island disaster. "Normal Accidents" is cited by numerous other influential systems engineering publications such as [Vaughan's](#diane-vaughan) "The Challenger Launch Decision". 989 | 990 | ### Concepts 991 | * Complex systems: A system of tightly-coupled components with common mode connections that is prone to unintended feedback loops, complex controls, low observability, and poorly-understood mechanisms. They are not always high-risk, and thus their failure is not always catastrophic. 992 | * Normal accidents: Complex systems with many components exhibit unexpected interactions in the face of inevitable component failures. When these components are tightly-coupled, failed parts cannot be isolated from other parts, resulting in unpredictable system failures. Crucially, adding more safety devices and automated system controls often makes these coupling problems worse. 993 | * Common-mode: The failure of one component that serves multiple purposes results in multiple associated failures, often with high interactivity and low linearity - both ingredients for unexpected behavior that is difficult to control. 994 | * Production pressures and safety: Organizations adopt processes and devices to improve safety and efficiency, but production pressure often defeats any safety gained from the additions: the safety devices allow or encourage more risky behavior. As an unfortunate side-effect, the system is now also more complex. 995 | 996 | ### Selected publications 997 | * [Normal Accidents: Living With High-Risk Technologies](https://www.amazon.com/Normal-Accidents-Living-Technologies-Updated-ebook/dp/B00CHRINUI) 998 | 999 | ## Shawna J. Perry 1000 | 1001 | Perry is a medical researcher who studies emergency medicine. 1002 | 1003 | ### Concepts 1004 | * Underground adaptations 1005 | * Articulated functions vs. important functions 1006 | * Unintended effects 1007 | * Apparent success vs real success 1008 | * Exceptions 1009 | * Dynamic environments 1010 | 1011 | ### Selected publications 1012 | 1013 | * [Underground adaptations: case studies from health care](https://doi.org/10.1007/s10111-011-0207-2) 1014 | * [Can We Trust Best Practices? Six Cognitive Challenges of Evidence-Based Approaches] 1015 | * [The Role of Automation in Complex System Failures] 1016 | * [Extemporaneous Adaptation to Evolving Complexity: A Case Study of Resilience in Healthcare] ([TWRR](https://resilienceroundup.com/issues/55)) 1017 | * [Automation, interaction, complexity, and failure: A case study] 1018 | 1019 | ### Other 1020 | 1021 | * [Interview on Naturalistic Decision Making podcast](https://open.spotify.com/episode/7lHcgt2KuDoLyvTP9wMbEn?si=nPIyk9L8QB2Iuck2fKKrNA) 1022 | 1023 | 1024 | 1025 | [Extemporaneous Adaptation to Evolving Complexity: A Case Study of Resilience in Healthcare]: https://pdfs.semanticscholar.org/1423/f18530599b9de186af0eee4852bb7e619384.pdf 1026 | 1027 | ## Jens Rasmussen 1028 | 1029 | Jens Rasmussen was an enormously influential researcher in human factors and safety systems. In particular, you can see his influence in the work of Sidney Dekker, Nancy Leveson, David Woods. 1030 | 1031 | ### Contributions 1032 | 1033 | #### Skill-rule-knowledge (SKR) model 1034 | 1035 | Rasmussen proposed three models of human performance. 1036 | 1037 | **Skill-based** behavior doesn't require conscious attention. The prototypical example is riding a bicycle. 1038 | 1039 | **Rule-based** behavior is based on a set of rules that we have internalized in 1040 | advance. We select which rule to use based on experience, and then carry it 1041 | out. An example would be: if threads are blocked, restart the server. You can think of rule-based behavior as a memorized runbook. 1042 | 1043 | **Knowledge-based** behavior comes into play when facing an unfamiliar 1044 | situation. The person generates a set of plans based on their understanding of 1045 | the environment, and then selects which one to use. The challenging incidents 1046 | are the ones that require knowledge-based behavior to resolve. 1047 | 1048 | He also proposed three types of information that humans process as they perform work. 1049 | 1050 | **Signals**. Example: weather vane 1051 | 1052 | **Signs**. Example: stop sign 1053 | 1054 | **Symbols**. Example: written language 1055 | 1056 | #### Abstraction hierarchy 1057 | 1058 | Rasmussen proposed a model of how operators reason about the behavior of a 1059 | system they are supervising called the *abstraction hierarchy*. 1060 | The levels in the hierarchy are 1061 | 1062 | 1. functional purpose 1063 | 2. abstract functions 1064 | 3. general functions 1065 | 4. physical funcitons 1066 | 5. physical form 1067 | 1068 | The hierarchy forms a means-ends relationship: proper function is described top-down (ends), and problems are explained bottom-up (means) 1069 | 1070 | 1071 | #### Dynamic safety model 1072 | 1073 | Rasmussen proposed a state-based model of a socio-technical system as a system 1074 | that moves within a region of a state space. The region is surrounded by 1075 | different boundaries: 1076 | 1077 | * economic failure 1078 | * unacceptable work load 1079 | * functionality acceptable performance 1080 | 1081 | ![Migration to the boundary](boundary.png) 1082 | 1083 | Source: [Risk management in a dynamic society: a modelling problem] 1084 | 1085 | Incentives push the system towards the boundary of acceptable performance: 1086 | accidents happen when the boundary is exceeded. 1087 | 1088 | 1089 | #### AcciMaps 1090 | 1091 | The AcciMaps approach is a technique for reasoning about the causes of an accident, using a diagram. 1092 | 1093 | 1094 | #### Risk management framework 1095 | 1096 | Rasmussen proposed a multi-layer view of socio-technical systems: 1097 | 1098 | ![Risk management framework](risk-management-framework.png) 1099 | 1100 | Source: [Risk management in a dynamic society: a modelling problem] 1101 | 1102 | ### Concepts 1103 | * Dynamic safety model 1104 | * Migration toward accidents 1105 | * Risk management framework 1106 | * Boundaries: 1107 | - boundary of functionally acceptable performance 1108 | - boundary to economic failure 1109 | - boundary to unacceptable work load 1110 | * Cognitive systems engineering 1111 | * Skill-rule-knowledge (SKR) model 1112 | * AcciMaps 1113 | * Means-ends hierarchy 1114 | * Ecological interface design 1115 | * Systems approach 1116 | * Control-theoretic 1117 | * decisions, acts, and errors 1118 | * hazard source 1119 | * anatomy of accidents 1120 | * energy 1121 | * systems thinking 1122 | * trial and error experiments 1123 | * defence in depth (fallacy) 1124 | * Role of managers 1125 | - Information 1126 | - Competency 1127 | - Awareness 1128 | - Commitment 1129 | * Going solid 1130 | * observability 1131 | 1132 | ### Selected publications 1133 | * [Mental procedures in real-life tasks: a case study of electronic trouble shooting](https://www.tandfonline.com/doi/abs/10.1080/00140137408931355) (1974) 1134 | * [Coping with complexity](https://orbit.dtu.dk/en/publications/coping-with-complexity) 1135 | * [Skills, rules, and knowledge; signals, signs, and symbols, and other distinctions in human performance models](https://www.iwolm.com/wp-content/downloads/SkillsRulesAndKnowledge-Rasmussen.pdf) 1136 | * [“Going solid”: a model of system dynamics and consequences for patient safety](https://qualitysafety.bmj.com/content/14/2/130) 1137 | * [Human error and the problem of causality in analysis of accidents](https://www.ida.liu.se/~729A71/Literature/Human%20Error_T/Rasmussen_1990.pdf) ([TWRR](https://resilienceroundup.com/issues/human-error-and-the-problem-of-causality-in-analysis-of-accidents/)) 1138 | * [Human Errors: A Taxonomy for Describing Human Malfunction in Industrial Installations](https://backend.orbit.dtu.dk/ws/portalfiles/portal/158020073/ERTAX1.PDF) 1139 | * [Ecological interfaces: A technological imperative in high‐tech systems](https://core.ac.uk/download/pdf/13788397.pdf) 1140 | * [Information processing and human-machine interaction: an approach to cognitive engineering](https://www.amazon.com/Information-Processing-Human-Machine-Interaction-North-Holland/dp/0444009876) 1141 | * [The role of hierarchical knowledge representation in decisionmaking and system management](https://backend.orbit.dtu.dk/ws/files/158019622/HISMC.PDF) 1142 | * [A Model of Human Decision Making in Complex Systems and its Use for Design of System Control Strategies](https://core.ac.uk/download/pdf/13777954.pdf) 1143 | * [The role of error in organizing behaviour](https://qualitysafety.bmj.com/content/qhc/12/5/377.full.pdf) ([TWRR](https://resilienceroundup.com/issues/the-role-of-error-in-organizing-behaviour/)) 1144 | * [Information processing and human-machine interaction](https://www.osti.gov/biblio/7011990-information-processing-human-machine-interaction) 1145 | * [Risk management in a dynamic society: a modelling problem] 1146 | * [Proactive risk management in a dynamic society](https://rib.msb.se/Filer/pdf/16252.pdf) 1147 | * [Trends in Human Reliability Analysis](https://backend.orbit.dtu.dk/ws/portalfiles/portal/137294535/TREND.PDF) 1148 | * [The role of hierarchical knowledge representation in decisionmaking and system management](https://backend.orbit.dtu.dk/ws/portalfiles/portal/158019622/HISMC.PDF) 1149 | * [The role of error in organizing behaviour](https://backend.orbit.dtu.dk/ws/portalfiles/portal/137538698/ERRROLE_1_.PDF) 1150 | * [Human error and the problem of causality in analysis of accidents](https://backend.orbit.dtu.dk/ws/portalfiles/portal/158018718/ROYSOC.PDF) 1151 | * [Coping with human errors through system design: implications for ecological interface design](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=5fb7644d205b342aa52c594b7982a9e208086238) 1152 | * [Graphic representation of accident scenarios: mapping system structure and the causation of accidents](https://www.sciencedirect.com/science/article/abs/pii/S0925753500000369) 1153 | * [Diagnostic reasoning in action](https://backend.orbit.dtu.dk/ws/portalfiles/portal/158017532/DIAACT.PDF) 1154 | * [A framework for cognitive task analysis in systems design](https://orbit.dtu.dk/en/publications/a-framework-for-cognitive-task-analysis-in-systems-design) 1155 | * [Analysis of human errors in industrial incidents and accidents for mprovement of work safety](https://backend.orbit.dtu.dk/ws/portalfiles/portal/158019864/LEPRAS.PDF) 1156 | * [Why do complex organizational systems fail?](https://documents1.worldbank.org/curated/ru/535511468766200820/pdf/multi0page.pdf) 1157 | * [Notes on human error analysis and prediction](https://orbit.dtu.dk/en/publications/notes-on-human-error-analysis-and-prediction) 1158 | 1159 | (These are written but others about Rasmussen's work) 1160 | * [Recurring themes in the legacy of Jens Rasmussen](https://www.sciencedirect.com/science/article/abs/pii/S0003687016302150?via%3Dihub) - special issue of Applied Ergonomics 1161 | * [Reflecting on Jens Rasmussen’s legacy. A strong program for a hard problem](https://doi.org/10.1016/j.ssci.2014.03.015) ([my notes](https://github.com/lorin/booknotes/blob/master/papers/Reflecting-on-Jens-Rasmussens-Legacy.md)) 1162 | * [Reflecting on Jens Rasmussen's legacy (2) behind and beyond, a ‘constructivist turn’](https://doi.org/10.1016/j.apergo.2015.07.013) 1163 | * [Musings on Models and the Genius of Jens Rasmussen](https://www.sciencedirect.com/science/article/abs/pii/S0003687015301009?via%3Dihub) 1164 | 1165 | [Risk management in a dynamic society: a modelling problem]: https://doi.org/10.1016/S0925-7535(97)00052-0 1166 | 1167 | ## Mike Rayo 1168 | 1169 | Rayo is the Director of the Cognitive Systems Engineering Laboratory at the Ohio State University. 1170 | 1171 | ### Concepts 1172 | 1173 | * SCAD (Systematic Contributors Analysis and Diagram) 1174 | 1175 | ### Selected Publications 1176 | 1177 | * [Developing Systemic Contributors and Adaptations Diagramming (SCAD): systemic insights, multiple pragmatic implementations] 1178 | * [Multiple Systemic Contributors versus Root Cause: Learning from a NASA Near Miss](https://www.researchgate.net/publication/308194080_Multiple_Systemic_Contributors_versus_Root_Cause_Learning_from_a_NASA_Near_Miss) 1179 | 1180 | [Developing Systemic Contributors and Adaptations Diagramming (SCAD): systemic insights, multiple pragmatic implementations]: https://journals.sagepub.com/doi/10.1177/1071181322661334 1181 | 1182 | ## James Reason 1183 | 1184 | Reason is a psychology researcher who did work on understanding and categorizing human error. 1185 | 1186 | ### Contributions 1187 | 1188 | #### Accident causation model (Swiss cheese model) 1189 | 1190 | Reason developed an accident causation model that is sometimes known as the *swiss cheese* model of accidents. 1191 | In this model, Reason introduced the terms "sharp end" and "blunt end". 1192 | 1193 | #### Human Error model: Slips, lapses and mistakes 1194 | 1195 | Reason developed a model of the types of errors that humans make: 1196 | 1197 | * slips 1198 | * lapses 1199 | * mistakes 1200 | 1201 | ### Concepts 1202 | 1203 | * Blunt end 1204 | * Human error 1205 | * Slips, lapses and mistakes 1206 | * Swiss cheese model 1207 | 1208 | ### Selected publications 1209 | 1210 | * [Human error] 1211 | * [Organizational Accidents Revisited](https://www.amazon.com/Organizational-Accidents-Revisited-James-Reason/dp/1472447689) 1212 | 1213 | [Human error]: https://www.amazon.com/gp/product/0521314194/ref=dbs_a_def_rwt_bibl_vppi_i0 1214 | 1215 | ## J. Paul Reed 1216 | 1217 | [Reed](https://jpaulreed.com/) is a Senior Applied Resilience engineer at Netflix and runs [REdeploy](https://re-deploy.io), a conference focused on Resilience Engineering in the software development and operations industry. 1218 | 1219 | Reed tweets as [@jpaulreed](https://twitter.com/jpaulreed). 1220 | 1221 | ### Selected Publications 1222 | 1223 | * [Maps, Context, and Tribal Knowledge: On the Structure and Use of Post-Incident Analysis Artifacts in Software Development and Operations](https://lup.lub.lu.se/student-papers/search/publication/8966930j 1224 | * [Beyond the "Fix-it" Treadmill](https://queue.acm.org/detail.cfm?id=3380780d) 1225 | 1226 | 1227 | ### Concepts 1228 | 1229 | * [Blame "Aware"](https://jpaulreed.com/blame-aware) (versus "Blameless") Culture 1230 | * Postmortem Artifact _Archetypes_ 1231 | 1232 | ## Emilie M. Roth 1233 | 1234 | [Roth](http://www.rothsite.com/resume.html) is a cognitive psychologist who 1235 | serves as the principal scientist at [Roth Cognitive Engineering](http://www.rothsite.com/), a small 1236 | company that conducts research and application in the areas of human factors 1237 | and applied cognitive psychology (cognitive engineering) 1238 | 1239 | ### Selected publications 1240 | 1241 | * [Uncovering the Requirements of Cognitive Work](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.564.2044) ([TWRR](https://www.getrevue.co/profile/resilience/issues/resilience-roundup-uncovering-the-requirements-of-cognitive-work-issue-30-173410)) 1242 | * [Using observational study as a tool for discovery: uncovering cognitive and collaborative demands and adaptive strategies] 1243 | * [Handoff strategies in settings with high consequences for failure: lessons for health care operations] ([TWRR](https://resilienceroundup.com/issues/56)) 1244 | * [Bootstrapping multiple converging cognitive task analysis techniques for system design] ([TWRR](https://resilienceroundup.com/issues/70)) 1245 | 1246 | ### Other 1247 | 1248 | * [Interview on Naturalistic Decision Making podcast](https://open.spotify.com/episode/3XqAhdpyrszLoB59VcRJWG) 1249 | 1250 | ## Nadine Sarter 1251 | 1252 | [Sarter](https://ioe.engin.umich.edu/people/nadine-sarter/) is a researcher in industrial and operations engineering. 1253 | She is the director of the Center for Ergonomics at the University of Michigan. 1254 | 1255 | ### Concepts 1256 | 1257 | * cognitive ergonomics 1258 | * organization safety 1259 | * human-automation/robot interaction 1260 | * human error / error management 1261 | * attention / interruption management 1262 | * design of decision support systems 1263 | 1264 | 1265 | ### Selected publications 1266 | 1267 | * [Learning from Automation Surprises and "Going Sour" Accidents: Progress on Human-Centered Automation](https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19980016965.pdf) 1268 | * [Behind Human Error] 1269 | * [Designed-Induced Error and Error-Informed Design: A Two-Way Street](https://www.amazon.com/Cognitive-Systems-Engineering-Expertise-Applications-ebook/dp/B076TDR6H9/ref=sr_1_1?keywords=cognitive+systems+engineering&qid=1554075974&s=gateway&sr=8-1) 1270 | * [The Critical Incident Technique: A Method for Identifying System Strengths and Weaknesses Based on Observational Data](https://www.taylorfrancis.com/books/e/9780429134845) 1271 | * [Myths of automation and their implications for military procurement] 1272 | * [Automation surprises] 1273 | * [Team Play with a Powerful and Independent Agent: A Full-Mission Simulation Study] ([TWRR](https://resilienceroundup.com/issues/team-play-with-a-powerful-and-independent-agent-a-full-mission-simulation-study/)) 1274 | 1275 | [Bootstrapping multiple converging cognitive task analysis techniques for system design]: https://www.researchgate.net/publication/313737506_Bootstrapping_multiple_converging_cognitive_task_analysis_techniques_for_system_design 1276 | [Automation surprises]: https://www.researchgate.net/publication/270960170_Automation_surprises 1277 | [Team Play with a Powerful and Independent Agent: A Full-Mission Simulation Study]: https://www.researchgate.net/publication/12195752_Team_Play_with_a_Powerful_and_Independent_Agent_A_Full-Mission_Simulation_Study 1278 | 1279 | ## James C. Scott 1280 | 1281 | Scott is an anthropologist who also does research in political science. While 1282 | Scott is not a member of a resilience engineering community, his book *Seeing 1283 | like a state* has long been a staple of the cognitive systems engineering and 1284 | resilience engineering communities. 1285 | 1286 | ### Concepts 1287 | 1288 | * authoritarian high-modernism 1289 | * legibility 1290 | * mētis 1291 | 1292 | ### Selected publications 1293 | 1294 | * [Seeing like a state: how certain schemes to improve the human condition have failed](https://www.amazon.com/Seeing-like-State-Certain-Condition/dp/0300078153/ref=sr_1_1) 1295 | 1296 | 1297 | ## Steven Shorrock 1298 | 1299 | Shorrock is a chartered psychologist and a chartered ergonomist and human 1300 | factors specialist. He is the editor-in-chief of EUROCONTROL 1301 | [HindSight](https://www.skybrary.aero/index.php/HindSight_-_EUROCONTROL) 1302 | magazine. He runs the excellent [Humanistic Systems](https://humanisticsystems.com/) blog. 1303 | 1304 | Shorrock tweets as [@StevenShorrock](https://twitter.com/StevenShorrock). 1305 | 1306 | ### Selected publications 1307 | 1308 | * [Systems Thinking for Safety: Ten Principles A White Paper Moving towards Safety-II](https://skybrary.aero/sites/default/files/bookshelf/2882.pdf) 1309 | * [Human Factors and Ergonomics in Practice: Improving System Performance and Human Well-Being in the Real World](https://www.crcpress.com/Human-Factors-and-Ergonomics-in-Practice-Improving-System-Performance-and/Shorrock-Williams/p/book/9781472439253) (book) 1310 | * [State of science: evolving perspectives on ‘human error’](https://doi.org/10.1080/00140139.2021.1953615) 1311 | 1312 | ### Selected talks 1313 | 1314 | [Life After Human Error](https://www.youtube.com/watch?v=STU3Or6ZU60) (Velocity Europe 2014 keynote) 1315 | 1316 | ## Diane Vaughan 1317 | 1318 | Vaughan is a sociology researcher who did a famous study of the NASA Challenger accident, concluding that it was the result of organizational failure rather than a technical failure. Specifically, production pressure overrode the rigorous scientific safety culture in place at NASA. 1319 | 1320 | ### Concepts 1321 | 1322 | * Structural Secrecy: Organizational structure, processes, and information exchange patterns can systematically undermine the ability to "see the whole picture" and conceal risky decisions. 1323 | * Social Construction of Risk: Out of the necessity to balance risk with the associated reward, any group of people will develop efficient heuristics to solve the problems they face. The understanding of risk that faces one subgroup may not match that of another subgroup or of the whole group. The ability of an individual to change a social construction of risk, formed over years with good intentions and often with evidence, is limited. (Though the evidence is usually accurate, the conclusion might not be, leading to an inadvertent scientific paradigm.) 1324 | * Normalization of Deviance: During operation of a complex system, inadvertent deviations from system design may occur and not result in a system failure. Because the intial construction of risk is usually conservative, the deviation is seen as showing that the system and its redundancies "worked", leading to a new accepted safe operating envelope. 1325 | * Signals of potential danger: Information gained through the operation of a system that may indicate the system does not work as designed. Most risk constructions are based on a comprehensive understanding of the operation of the system, so information to the contrary is a sign that the system could leave the safe operation envelope in unexpected ways - a danger. 1326 | * Weak signals, mixed signals, missed signals: signals of potential danger that have been interpreted as non-threats or acceptable risk because at the time they didn't represent a clear and present danger sufficient to overcome the Social Construction of Risk. Often, post-hoc, these are seen as causes due to cherry-picking - such signals were ignored before with no negative consequences. 1327 | * Competition for Scarce Resources: An ongoing need to justify investment to customers leads to Efficiency-Thoroughness Tradeoffs (ETTOs). In NASA's case, justifying the cost of the Space Shuttle program to taxpayers and their congressional representatives meant pressure to quickly develop payload delivery capability at the lowest cost possible. 1328 | * Belief in Redundancy: Constructing risk from a signal of potential danger such that a redundant subsystem becomes part of the normal operating strategy for a primary subsystem. In NASA's case, signals that the primary O-ring assembly did not operate as expected formed an acceptable risk because a secondary O-ring would contain a failure. Redundancy was eliminated from the design in this construction of risk - the secondary system now became part of the primary system, eliminating system redundancy. 1329 | 1330 | ### Selected publications 1331 | 1332 | * [The Challenger Launch Decision: Risky Technology, Culture, and Deviance at 1333 | NASA](https://www.amazon.com/Challenger-Launch-Decision-Technology-Deviance/dp/022634682X/ref=sr_1_1?ie=UTF8&qid=1545966442&sr=8-1&keywords=diane+vaughan) 1334 | 1335 | ## Barry Turner 1336 | 1337 | [Turner](https://www.tandfonline.com/doi/pdf/10.1080/10245289508523441) was a sociologist who greatly influenced the field of organization studies. 1338 | 1339 | ### Selected publications 1340 | 1341 | * [Man-made disasters](https://www.amazon.com/Man-Made-Disasters-Second-Barry-Turner/dp/0750620870/ref=sr_1_1) 1342 | 1343 | ## Robert L. Wears 1344 | 1345 | [Wears](https://en.wikipedia.org/wiki/Robert_Wears) was a medical researcher who also had a PhD in industrial safety. 1346 | 1347 | ### Concepts 1348 | 1349 | * Underground adaptations 1350 | * Articulated functions vs. important functions 1351 | * Unintended effects 1352 | * Apparent success vs real success 1353 | * Exceptions 1354 | * Dynamic environments 1355 | * Systems of care are intrinsically hazardous 1356 | 1357 | ### Selected publications 1358 | 1359 | * [The error of counting "errors"](https://linkinghub.elsevier.com/retrieve/pii/S0196064408006070) [BH](https://safety177496371.wordpress.com/2023/09/20/the-error-of-counting-errors/) 1360 | * [Underground adaptations: case studies from health care](https://doi.org/10.1007/s10111-011-0207-2) 1361 | * [Fundamental On Situational Surprise: A Case Study With Implications For Resilience](https://books.openedition.org/pressesmines/1122) 1362 | * [Replacing Hindsight With Insight: Toward Better Understanding of Diagnostic Failures] 1363 | * [Seeing patient safety ‘Like a State’](http://dx.doi.org/10.1016%2Fj.ssci.2014.02.007) 1364 | * [Fundamental On Situational Surprise: A Case Study With Implications For Resilience](https://books.openedition.org/pressesmines/1122?lang=en) 1365 | * [The Role of Automation in Complex System Failures] 1366 | * [Exploring the Dynamics of Resilient Performance](https://pastel.archives-ouvertes.fr/pastel-00664145/document) 1367 | * [Extemporaneous Adaptation to Evolving Complexity: A Case Study of Resilience in Healthcare] ([TWRR](https://resilienceroundup.com/issues/55)) 1368 | * [Automation, interaction, complexity, and failure: A case study] 1369 | * [Resilience is not control: healthcare, crisis management, and ICT] 1370 | * [The Secret Life of Policies](https://www.annemergmed.com/article/S0196-0644(17)30874-0/fulltext) 1371 | * [The tragedy of adaptability](https://www.annemergmed.com/article/S0196-0644(13)01554-0/abstract) [BH](https://safety177496371.wordpress.com/2021/04/19/the-tragedy-of-adaptability/) 1372 | * [Relying on resilience: too much of a good thing?](https://www.taylorfrancis.com/chapters/edit/10.1201/9781315605722-11/relying-resilience-much-good-thing-robert-wears-charles-vincent) [BH](https://safety177496371.wordpress.com/2024/03/20/relying-on-resilience-too-much-of-a-good-thing/) 1373 | * [Replacing hindsight with insight: toward better understanding of diagnostic failures] [BH](https://safety177496371.wordpress.com/2024/02/26/replacing-hindsight-with-insight-toward-better-understanding-of-diagnostic-failures/) 1374 | * [The science of human factors: separating fact from fiction](https://safety177496371.wordpress.com/2024/10/29/the-science-of-human-factors-separating-fact-from-fiction/) [BH](https://safety177496371.wordpress.com/2024/10/29/the-science-of-human-factors-separating-fact-from-fiction/) 1375 | * [Resilience skills as emergent phenomena: A study of emergency departments in Brazil and the United States](https://doi.org/10.1016/j.apergo.2016.02.012) [BH](https://safety177496371.wordpress.com/2023/01/20/resilience-skills-as-emergent-phenomena-a-study-of-emergency-departments-in-brazil-and-the-united-states/) 1376 | * [Our current approach to root cause analysis: is it contributing to our failure to improve patient safety?](https://qualitysafety.bmj.com/content/26/5/381) [BH](https://safety177496371.wordpress.com/2021/03/18/our-current-approach-to-root-cause-analysis-is-it-contributing-to-our-failure-to-improve-patient-safety/) 1377 | * [Error Reduction and Performance Improvement in the Emergency Department through Formal Teamwork Training: Evaluation Results of the MedTeams Project](https://pmc.ncbi.nlm.nih.gov/articles/PMC1464040/) [BH](https://safety177496371.wordpress.com/2021/03/18/our-current-approach-to-root-cause-analysis-is-it-contributing-to-our-failure-to-improve-patient-safety/) 1378 | * [In situ simulation: detection of safety threats and teamwork training in a high risk emergency department](https://www.academia.edu/download/85660593/468.full.pdf) 1379 | * [“Safeware”: Safety-Critical Computing and Health Care Information Technology](https://europepmc.org/article/nbk/nbk43774) 1380 | * [The Illusion of Explanation] 1381 | * [Thick Versus Thin: Description Versus Classification in Learning From Case Reviews](https://www.annemergmed.com/article/S0196-0644(07)01451-5/fulltext) 1382 | * [Safety, Error, and Resilience: a Meta-narrative Review](https://www.resilience-engineering-association.org/download/resources/symposium/symposium_2015/Wears_R.-Sutcliffe_K.-Safety-error-and-resilience-a-meta-narrative-review-Paper.pdf) 1383 | 1384 | ### Selected talks 1385 | 1386 | * [Design of resilient systems](https://www.youtube.com/watch?v=nV52yh6GDMg) 1387 | 1388 | 1389 | ## David Woods 1390 | 1391 | [Woods](https://u.osu.edu/csel/member-directory/david-woods/) has a research background in cognitive systems engineering and did work 1392 | researching NASA accidents. He is one of the founders [Adaptive Capacity 1393 | Labs](http://www.adaptivecapacitylabs.com/), a resilience engineering 1394 | consultancy. 1395 | 1396 | Woods tweets as [@ddwoods2](https://twitter.com/ddwoods2). 1397 | 1398 | ### Contributions 1399 | 1400 | Woods has contributed an enormous number of concepts. 1401 | 1402 | #### The adaptive universe 1403 | 1404 | Woods uses *the adaptive universe* as a lens for understanding the behavior of 1405 | all different kinds of systems. 1406 | 1407 | All systems exist in a dynamic environment, and must adapt to change. 1408 | 1409 | A successful system will need to adapt by virtue of its success. 1410 | 1411 | Systems can be viewed as units of adaptive behavior (UAB) that interact. UABs 1412 | exist at different scales (e.g., cell, organ, individual, group, organization). 1413 | 1414 | All systems have competence envelopes, which are constrained by boundaries. 1415 | 1416 | The resilience of a system is determined by how it behaves when it comes near 1417 | to a boundary. 1418 | 1419 | See [Resilience Engineering Short Course](https://www.youtube.com/playlist?list=PLvlZBj1NU_ikTy1ot30EbEbYMAoBf9eAt) for more details. 1420 | 1421 | #### Charting adaptive cycles 1422 | 1423 | * Trigger 1424 | * Units of adaptive behavior 1425 | * Goals and goal conflicts 1426 | * Pressure points 1427 | * Subcycles 1428 | 1429 | ### Graceful extensibility 1430 | 1431 | From [The theory of graceful extensibility: basic rules that govern adaptive systems]: 1432 | 1433 | (Longer wording) 1434 | 1435 | 1. Adaptive capacity is finite 1436 | 2. Events will produce demands that challenge boundaries on the adaptive 1437 | capacity of any UAB 1438 | 3. Adaptive capacities are regulated to manage the risk of saturating CfM 1439 | 4. No UAB can have sufficient ability to regulate CfM to manage the risk of saturation alone 1440 | 5. Some UABs monitor and regulate the CfM of other UABs in response to changes 1441 | in the risk of saturation 1442 | 6. Adaptive capacity is the potential for adjusting patterns of action to 1443 | handle future situations, events, opportunities and disruptions 1444 | 7. Performance of a UAB as it approaches saturation is different from the 1445 | performance of that UAB when it operates far from saturation 1446 | 8. All UABs are local 1447 | 9. There are bounds on the perspective any UAB, but these limits are overcome 1448 | by shifts and contrasts over multiple perspectives. 1449 | 10. Reflective systems risk mis-calibration 1450 | 1451 | (Shorter wording) 1452 | 1453 | 1. Boundaries are universal 1454 | 2. Surprise occurs, continuously 1455 | 3. Risk of saturation is monitored and regulated 1456 | 4. Synchronization across multiple units of adaptive behavior in a network is necessary 1457 | 5. Risk of saturation can be shared 1458 | 6. Pressure changes what is sacrificed when 1459 | 7. Pressure for optimality undermines graceful extensibility 1460 | 8. All adaptive units are local 1461 | 9. Perspective contrast overcomes bounds 1462 | 10. Mis-calibration is the norm 1463 | 1464 | For more details, see [summary of graceful extensibility theorems](graceful-extensibility.md). 1465 | 1466 | ### SCAD (Systemic Contributors Analysis and Diagram) 1467 | 1468 | (tbd) 1469 | 1470 | ### Concepts 1471 | 1472 | Many of these are mentioned in Woods's [short course](https://www.youtube.com/playlist?list=PLvlZBj1NU_ikTy1ot30EbEbYMAoBf9eAt). 1473 | 1474 | * adaptive capacity 1475 | * adaptive universe 1476 | * unit of adaptive behavior (UAB), adaptive unit 1477 | * continuous adaptation 1478 | * graceful extensibility 1479 | * sustained adaptability 1480 | * Tangled, layered networks (TLN) 1481 | * competence envelope 1482 | * adaptive cycles/histories 1483 | * precarious present (unease) 1484 | * resilient future 1485 | * tradeoffs, five fundamental 1486 | * efflorescence: the degree that changes in one area tend to recruit or open up 1487 | beneficial changes in many other aspects of the network - which opens new 1488 | opportunities across the network ... 1489 | * reverberation 1490 | * adaptive stalls 1491 | * borderlands 1492 | * anticipate 1493 | * synchronize 1494 | * proactive learning 1495 | * initiative 1496 | * reciprocity 1497 | * SNAFUs 1498 | * robustness 1499 | * surprise 1500 | * dynamic fault management 1501 | * software systems as "team players" 1502 | * multi-scale 1503 | * brittleness 1504 | * how adaptive systems fail (see: [How do systems manage their adaptive capacity to successfully handle disruptions? A resilience engineering perspective]) 1505 | - decompensation 1506 | - working at cross-purposes 1507 | - getting stuck in outdated behaviors 1508 | * proactive learning vs getting stuck 1509 | * oversimplification 1510 | * fixation 1511 | * fluency law, veil of fluency 1512 | * capacity for manoeuvre (CfM) 1513 | * crunches 1514 | * turnaround test 1515 | * sharp end, blunt end 1516 | * adaptive landscapes 1517 | * law of stretched systems: Every system is continuously stretched to operate at capacity. 1518 | * cascades 1519 | * adapt how to adapt 1520 | * unit working hard to stay in control 1521 | * you can monitor how hard you're working to stay in control (monitor risk of saturation) 1522 | * reality trumps algorithms 1523 | * stand down 1524 | * time matters 1525 | * Properties of resilient organizations 1526 | - Tangible experience with surprise 1527 | - uneasy about the precarious present 1528 | - push initiative down 1529 | - reciprocity 1530 | - align goals across multiple units 1531 | * goal conflicts, goal interactions (follow them!) 1532 | * to understand system, must study it under load 1533 | * adaptive races are unstable 1534 | * adaptive traps 1535 | * roles, nesting of 1536 | * hidden interdependencies 1537 | * net adaptive value 1538 | * matching tempos 1539 | * tilt toward florescence 1540 | * linear simplification 1541 | * common ground 1542 | * problem detection 1543 | * joint cognitive systems 1544 | * automation as a "team player" 1545 | * "new look" 1546 | * sacrifice judgment 1547 | * task tailoring 1548 | * substitution myth 1549 | * observability 1550 | * directability 1551 | * directed attention 1552 | * inter-predictability 1553 | * error of the third kind: solving the wrong problem 1554 | * buffering capacity 1555 | * context gap 1556 | * Norbert's contrast 1557 | * anomaly response 1558 | * automation surprises 1559 | * disturbance management 1560 | * Doyle's catch 1561 | * Cooperative advocacy 1562 | 1563 | ### Selected publications 1564 | 1565 | * [Resilience Engineering: Concepts and Precepts](https://www.amazon.com/gp/product/B009KNDF64/ref=x_gr_w_glide_bb?ie=UTF8&tag=x_gr_w_glide_bb-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=B009KNDF64&SubscriptionId=1MGPYB6YW3HWK55XCGG2) 1566 | * [Prologue: Resilience Engineering Concepts](http://erikhollnagel.com/onewebmedia/Prologue.pdf) 1567 | * [Epilogue: Resilience Engineering Precepts](https://www.researchgate.net/publication/265074845_Epilogue_Resilience_Engineering_Precepts) 1568 | * [Resilience is a verb](https://www.researchgate.net/publication/329035477_Resilience_is_a_Verb) 1569 | * [Four concepts for resilience and the implications for the future of resilience engineering](https://www.researchgate.net/publication/276139783_Four_concepts_for_resilience_and_the_implications_for_the_future_of_resilience_engineering) ([TWRR](https://resilienceroundup.com/issues/65)) 1570 | * [Basic patterns in how adaptive systems fail](https://www.researchgate.net/publication/284324002_Basic_patterns_in_how_adaptive_systems_fail) ([TWRR](https://resilienceroundup.com/issues/34/)) 1571 | * [Resilience and the ability to anticipate](https://www.researchgate.net/publication/285487326_Resilience_and_the_ability_to_anticipate) ([TWRR](https://resilienceroundup.com/issues/resilience-and-the-ability-to-anticipate/)) 1572 | * [Distancing through differencing: An obstacle to organizational learning following accidents](https://www.researchgate.net/publication/292504703_Distancing_through_differencing_An_obstacle_to_organizational_learning_following_accidents) 1573 | * [Essential characteristics of resilience](https://www.researchgate.net/publication/284328979_Essential_characteristics_of_resilience) 1574 | * [Essentials of resilience, revisited](https://www.researchgate.net/publication/330116587_4_Essentials_of_resilience_revisited) ([TWRR](https://resilienceroundup.com/issues/71/)) 1575 | * [Learning from Automation Surprises and "Going Sour" Accidents: Progress on Human-Centered Automation](https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19980016965.pdf) 1576 | * [Behind Human Error] 1577 | * [Joint Cognitive Systems: Patterns in Cognitive Systems Engineering](https://www.amazon.com/gp/product/0849339332/ref=x_gr_w_bb?ie=UTF8&tag=x_gr_w_bb-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0849339332&SubscriptionId=1MGPYB6YW3HWK55XCGG2) 1578 | * [Patterns in Cooperative Cognition](https://www.researchgate.net/publication/262449980_Patterns_in_Cooperative_Cognition) 1579 | * [Origins of cognitive systems engineering](https://www.researchgate.net/publication/298793082_Origins_of_Cognitive_Systems_Engineering) 1580 | * [Incidents - markers of resilience or brittleness?](https://www.researchgate.net/publication/292504952_Incidents_-_markers_of_resilience_or_brittleness) [BH](https://safety177496371.wordpress.com/2023/12/18/incidents-markers-of-resilience-or-brittleness/) 1581 | * [The alarm problem and directed attention in dynamic fault management](https://www.researchgate.net/publication/40961767_The_Alarm_problem_and_directed_attention_in_dynamic_fault_management) 1582 | * [Can We Trust Best Practices? Six Cognitive Challenges of Evidence-Based Approaches] 1583 | * [Operating at the Sharp End: The Complexity of Human Error](https://www.researchgate.net/publication/313407259_Operating_at_the_Sharp_End_The_Complexity_of_Human_Error) 1584 | * [The theory of graceful extensibility: basic rules that govern adaptive systems] 1585 | * [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems](https://www.researchgate.net/publication/220628177_Beyond_Simon%27s_Slice_Five_Fundamental_Trade-Offs_that_Bound_the_Performance_of_Macrocognitive_Work_Systems) ([TWRR](https://resilienceroundup.com/issues/five-fundamental-trade-offs-in-cognitive-work/)) 1586 | * [Anticipating the effects of technological change: A new era of dynamics for human factors](https://www.researchgate.net/publication/247512351_Anticipating_the_effects_of_technological_change_A_new_era_of_dynamics_for_human_factors) 1587 | * [Common Ground and Coordination in Joint Activity] 1588 | * [Resilience as Graceful Extensibility to Overcome Brittleness](https://www.irgc.org/wp-content/uploads/2016/04/Woods-Resilience-as-Graceful-Extensibility-to-Overcome-Brittleness-1.pdf) 1589 | * [Resilience Engineering: Redefining the Culture of Safety and Risk Management](http://ordvac.com/soro/library/Aviation/Aviation%20Safety/General%20Safety%20Articles/resilience%20engineering%20bulletin.pdf) 1590 | * [Problem detection] 1591 | * [Cognitive consequences of clumsy automation on high workload, high consequence human performance] 1592 | * [Implications of automation surprises in aviation for the future of total intravenous anesthesia (TIVA)] 1593 | * [Ten challenges for making automation a team player] ([TWRR](https://resilienceroundup.com/issues/66)) 1594 | * [The Messy Details: Insights From the Study of Technical Work in Healthcare] 1595 | * [Nosocomial automation: technology-induced complexity and human performance] 1596 | * [Human-centered software agents: Lessons from clumsy automation](http://www.ifp.illinois.edu/nsfhcs/abstracts/woods.txt) 1597 | * [STELLA: Report from the SNAFUcatchers Workshop on Coping with Complexity](https://snafucatchers.github.io/) 1598 | * [The New Look at Error, Safety, and Failure: A Primer for Health Care] 1599 | * [Grounding explanations in evolving, diagnostic situations] 1600 | * [Resilience Engineering: Concepts and Precepts] 1601 | * [A Tale of Two Stories: Contrasting Views of Patient Safety] 1602 | * [Voice Loops as Coordination Aids in Space Shuttle Mission Control] 1603 | * [The Critical Incident Technique: 40 Years Later](https://journals.sagepub.com/doi/abs/10.1177/154193129403801702) 1604 | * [Functionally distributed coordination during anomaly response in space shuttle mission control] 1605 | * [Cooperative Advocacy: An Approach for Integrating Diverse Perspectives in Anomaly Response](https://www.researchgate.net/publication/225211285_Cooperative_Advocacy_An_Approach_for_Integrating_Diverse_Perspectives_in_Anomaly_Response) 1606 | * [Visual momentum: A concept to improve the cognitive coupling of person and computer](https://www.researchgate.net/publication/222737388_Visual_Momentum_A_Concept_to_Improve_the_Cognitive_Coupling_of_Person_and_Computer) 1607 | * [Cognitive demands and activities in dynamic fault management: abductive reasoning and disturbance management](https://www.researchgate.net/publication/262401824_Cognitive_demands_and_activities_in_dynamic_fault_management_abductive_reasoning_and_disturbance_management) 1608 | * [Coping with complexity: the psychology of human behaviour in complex systems](https://www.researchgate.net/publication/238727732_Coping_with_Complexity_The_psychology_of_human_behavior_in_complex_systems) ([TWRR](https://resilienceroundup.com/issues/coping-with-complexity/)) 1609 | * [Process Tracing Methods for The Study of Cognition Outside of the Experimental Laboratory. In Klein GA, Orasanu J, Calderwood R, Zsambok CE, eds. Decision making in action: Models](https://www.researchgate.net/profile/David_Woods11/publication/232513565_Process-tracing_methods_for_the_study_of_cognition_outside_of_the_experimental_psychology_laboratory/links/00b7d53988a2f7a7f8000000.pdf) 1610 | * [Towards a theoretical base for representation design in the computer medium: ecological perception and aiding human cognition](https://www.researchgate.net/publication/239059408_Towards_a_theoretical_base_for_representation_design_in_the_computer_medium_ecological_perception_and_aiding_human_cognition) 1611 | * [Perspectives on Human Error: Hindsight Biases and Local Rationality] 1612 | * [Anomaly Response] 1613 | * [The Risks of Autonomy: Doyle's Catch](https://www.researchgate.net/publication/303832480_The_Risks_of_Autonomy_Doyles_Catch) ([TWRR](https://resilienceroundup.com/issues/the-risks-of-autonomy-doyles-catch/)) 1614 | * [Mistaking Error] 1615 | * [Adapting to new technology in the operating room] 1616 | * [The Strategic Agility Gap: How organizations are slow and stale to adapt in turbulent worlds](https://www.researchgate.net/publication/330196218_The_Strategic_Agility_Gap_How_organizations_are_slow_and_stale_to_adapt_in_turbulent_worlds) 1617 | * [Resiliency Trade Space Study: The Interaction of Degraded C2 Link and Detect and Avoid Autonomy on Unmanned Aircraft](https://www.researchgate.net/publication/330222613_Resiliency_Trade_Space_Study_The_Interaction_of_Degraded_C2_Link_and_Detect_and_Avoid_Autonomy_on_Unmanned_Aircraft) 1618 | * [Cognitive Technologies: The Design of Joint Human-Machine Cognitive Systems](https://www.researchgate.net/publication/220604613_Cognitive_Technologies_The_Design_of_Joint_Human-Machine_Cognitive_Systems) 1619 | * [Cognitive Systems Engineering: New wine in new bottles] ([TWRR](https://resilienceroundup.com/issues/32/)) 1620 | * [The Seven Deadly Myths of "Autonomous Systems"] 1621 | * [Resilience and the ability to anticipate](https://www.researchgate.net/publication/285487326_Resilience_and_the_ability_to_anticipate) 1622 | * [Patterns in Cooperative Cognition] 1623 | * [Collaborative Cross-Checking to Enhance Resilience] ([TWRR](https://resilienceroundup.com/issues/73/)) 1624 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems] 1625 | * [A rose by any other name...would probably be given an acronym] 1626 | * [How do systems manage their adaptive capacity to successfully handle disruptions? A resilience engineering perspective](https://www.researchgate.net/publication/286581322_How_do_systems_manage_their_adaptive_capacity_to_successfully_handle_disruptions_A_resilience_engineering_perspective) 1627 | * [How Unexpected Events Produce An Escalation Of Cognitive And Coordinative Demands] ([TWRR](https://resilienceroundup.com/issues/how-unexpected-events-produce-an-escalation-of-cognitive-and-coordinative-demands/)) 1628 | * [How to Make Automated Systems Team Players](https://www.researchgate.net/profile/David_Woods11/publication/2483863_How_to_Make_Automated_Systems_Team_Players/links/5a4f829eaca272940bf8202c/How-to-Make-Automated-Systems-Team-Players.pdf) 1629 | * [Toward a Theory of Complex and Cognitive Systems] 1630 | * [Multiple systemic contributors versus root cause: learning from a NASA Near Miss](https://www.researchgate.net/publication/308194080_Multiple_Systemic_Contributors_versus_Root_Cause_Learning_from_a_NASA_Near_Miss) 1631 | * [Bootstrapping multiple converging cognitive task analysis techniques for system design] ([TWRR](https://resilienceroundup.com/issues/70)) 1632 | * [New Arctic Air Crash Aftermath Role-Play Simulation Orchestrating a Fundamental Surprise] 1633 | * [Mapping Cognitive Demands in Complex Problem-Solving Worlds] (mentions disturbance management) 1634 | * [Fixation Errors: Failures to Revise Situation Assessment in Dynamic and Risky Systems](https://www.researchgate.net/publication/290071190_Fixation_Errors_Failures_to_Revise_Situation_Assessment_in_Dynamic_and_Risky_Systems) 1635 | * [Nine Steps to Move Forward From Error] [BH](https://safety177496371.wordpress.com/2022/11/03/nine-steps-to-move-forward-from-error/) 1636 | * [Handoff strategies in settings with high consequences for failure: lessons for health care operations] ([TWRR](https://resilienceroundup.com/issues/56)) 1637 | * [The High Reliability Organization Perspective] ([TWRR](https://resilienceroundup.com/issues/09/)) 1638 | * [Automation surprises] 1639 | * [Safety II professionals: How resilience engineering can transform safety practice] ([TWRR](https://resilienceroundup.com/issues/64/)) 1640 | * [Gaps in the continuity of care and progress on patient safety] 1641 | * [Systems with human monitors: a signal detection analysis](https://www.researchgate.net/publication/250890631_Systems_with_Human_Monitors_A_Signal_Detection_Analysis) 1642 | * [On taking human performance seriously](https://www.sciencedirect.com/science/article/abs/pii/095183209090022F), 1990 1643 | * [Beyond surge: Coping with mass burn casualty in the closest hospital to the Formosa Fun Coast Dust Explosion] 1644 | * [Designing for Expertise](https://www.researchgate.net/publication/284173210_Designing_for_Expertise) 1645 | * [Steering the Reverberations of Technology Change on Fields of Practice: Laws that Govern Cognitive Work](https://www.researchgate.net/publication/334267822_Steering_the_Reverberations_of_Technology_Change_on_Fields_of_Practice_Laws_that_Govern_Cognitive_Work) ([TWRR](https://resilienceroundup.com/issues/steering-the-reverberations-of-technology-change-on-fields-of-practice-laws-that-govern-cognitive-work/)) 1646 | * [Distant Supervision–Local Action Given the Potential for Surprise](https://www.researchgate.net/profile/David_Woods11/publication/225921479_Distant_Supervision-Local_Action_Given_the_Potential_for_Surprise/links/0a85e53baa4009ad8e000000/Distant-Supervision-Local-Action-Given-the-Potential-for-Surprise.pdf) ([TWRR](https://resilienceroundup.com/issues/75/)) 1647 | * [Coping With a Mass Casualty: Insights into a Hospital’s Emergency Response and Adaptations After the Formosa Fun Coast Dust Explosion] ([TWRR](https://resilienceroundup.com/issues/76/)) 1648 | * [A Shared Pilot-Autopilot Control Architecture for Resilient Flight](http://aaclab.mit.edu/resources/FarjadianAnnaswamyWoods2019.pdf) ([TWRR](https://resilienceroundup.com/issues/a-shared-pilot-autopilot-control-architecture-for-resilient-flight/)) 1649 | * [Team Play with a Powerful and Independent Agent: A Full-Mission Simulation Study] ([TWRR](https://resilienceroundup.com/issues/team-play-with-a-powerful-and-independent-agent-a-full-mission-simulation-study/)) 1650 | * [How Not to Have to Navigate Through Too Many Displays](https://www.researchgate.net/publication/239030256_How_Not_to_Have_to_Navigate_Through_Too_Many_Displayjjs) 1651 | * [Discovering How Distributed Cognitive Systems Work](https://www.researchgate.net/publication/251196422_Discovering_How_Distributed_Cognitive_Systems_Work) 1652 | * [Human Performance in Anesthesia] 1653 | * [Creating Foresight: Lessons for Enhancing Resilience from Columbia](https://www.researchgate.net/profile/David-Woods-19/publication/jjgg255648297_Creating_Foresight_Lessons_for_Enhancing_Resilience_from_Columbia/links/542becf50cf29bbc126ac095/Creating-Foresight-Lessons-for-Enhancing-Resilience-from-Columbia.pdf) 1654 | * [Inventing the Future of Cognitive Work: Navigating the "Northwest Passage"](http://faculty.washington.edu/roesler/publications/design_cycle2005.pdf) 1655 | * [A practitioner’s experiences operationalizing Resilience Engineering] 1656 | * [Understanding rigor in information analysis] 1657 | * [Human Performance in Anesthesia: A Corpus of Cases] 1658 | * [Minding the Gaps: Creating Resilience in Health Care] 1659 | * [From Counting Failures to Anticipating Risks: Possible Futures for Patient Safety] 1660 | * [Resilience Engineering: New directions for measuring and maintaining safety in complex systems] 1661 | * [Behind Human Error: Taming Complexity to Improve Patient Safety] 1662 | * [Escaping failures of foresight](https://www.researchgate.net/publication/239357782_Escaping_failures_of_foresight) 1663 | 1664 | [The theory of graceful extensibility: basic rules that govern adaptive systems]: https://www.researchgate.net/publication/327427067_The_Theory_of_Graceful_Extensibility_Basic_rules_that_govern_adaptive_systems 1665 | [How do systems manage their adaptive capacity to successfully handle disruptions? A resilience engineering perspective]: https://www.researchgate.net/publication/286581322_How_do_systems_manage_their_adaptive_capacity_to_successfully_handle_disruptions_A_resilience_engineering_perspective 1666 | 1667 | ### Selected talks 1668 | 1669 | * [Overview of resilience engineering](https://www.youtube.com/watch?v=GnVXfgC-5Jw&feature=youtu.be) 1670 | * [Creating safety by engineering resilience](https://vimeo.com/104759707) 1671 | * [The Mystery of Sustained Adaptability](https://www.youtube.com/watch?v=7STcaWjJoww) 1672 | * [Resilience is a verb](https://www.youtube.com/watch?v=V2qj5gMsjrU) 1673 | * [Complexity workshop keynote](https://www.youtube.com/watch?v=KJJ2NCjc2Wg) 1674 | * [De-Confounding Reliability, Robustness, and Resilience](https://www.youtube.com/watch?v=QSiXEZLZ1y0&t=6s) 1675 | * [2003 Senate Hearing testimony](https://www.c-span.org/video/?c4531343/user-clip-david-woods-senate-hearing) 1676 | * [Shock and Resilience](https://www.youtube.com/watch?v=ZuLUp94wki4) 1677 | * [Hedging bets](https://www.youtube.com/watch?v=vlYtd-eUjY8) 1678 | * [REA 2021](https://youtu.be/OwdMgEf2MsA) 1679 | * [Adobe Summit Talk: Why do reliable systems fail?](https://www.youtube.com/watch?v=fbwDnpuys7w) 1680 | 1681 | ### Online courses 1682 | 1683 | * [Cognitive Systems Engineering Laboratory's (CSEL) Resilience Engineering 101 Series](https://resiliencefoundations.github.io/video-1-introduction-pt-1-it's-all-about-viability.html) 1684 | * [Resilience Engineering: An Introductory Short Course](https://www.youtube.com/playlist?list=PLvlZBj1NU_ikTy1ot30EbEbYMAoBf9eAt) 1685 | 1686 | ## John Wreathall 1687 | 1688 | Wreathall is an expert in human performance in safety. He works at the 1689 | [WreathWood Group](http://www.wreathall.com/), a risk and safety studies 1690 | consultancy. 1691 | Wreathall tweets as [@wreathall](https://twitter.com/wreathall). 1692 | 1693 | ### Selected publications 1694 | * [Resilience engineering in practice: a guidebook](https://www.crcpress.com/Resilience-Engineering-in-Practice-A-Guidebook/Paries-Wreathall-Hollnagel/p/book/9781472420749) 1695 | -------------------------------------------------------------------------------- /STAMP.md: -------------------------------------------------------------------------------- 1 | # STAMP 2 | 3 | ## Introduction 4 | 5 | STAMP (Systems-Theoretic Accident Model and Processes) is an accident model 6 | developed by Prof. Nancy Leveson of MIT. It was designed for designing 7 | safety-critical systems. 8 | 9 | STAMP views safety as a control problem. Safety is managed by a control 10 | structure embedded in an adaptive socio-technical system. **The goal of the 11 | control structure is to enforce constraints on system development and operation 12 | that result in safe behavior**. 13 | 14 | In STAMP, systems are viewed as interrelated components that are kept in a state of dynamic equilibrium by feedback loops of information and control. 15 | 16 | Safety is an emergent property that is achieved when appropriate constraints on behavior of the system and its components are satisfied. 17 | 18 | **In STAMP, accidents and losses result from not enforcing safety constraints on behavior.** 19 | 20 | Basic concepts in STAMP 21 | 22 | 1. Safety constraints 23 | 1. Hierarchical safety control structures 24 | 1. Process models 25 | 26 | 27 | ## Main concepts 28 | ### Safety constraints 29 | A constraint is the most basic concept in STAMP. 30 | 31 | The cause of an accident is viewed as: 32 | - the result of a lack of constraints imposed on the system design and on operations. 33 | - Inadequate enforcement of constraints on behavior at each level of a socio-technical system 34 | 35 | System-level constraints must be identified. 36 | 37 | Responsibility for enforcing constraints must be divided up and allocated to appropriate groups. 38 | 39 | ### Hierarchical safety control structure 40 | Systems are viewed as hierarchical structures 41 | 42 | Each level imposes constraints on the activity beneath it 43 | 44 | Control processes operate between levels to control processes at lower levels in the hierarchy. 45 | 46 | Control processes enforce safety constraints. 47 | 48 | Accidents occur when processes provide inadequate control & safety constraints are violated in the behavior of the lower-level components. 49 | 50 | By describing accidents in terms of a hierarchy of control based on adaptive feedback mechanism, adaptation plays a central role in the understanding and prevention of accidents. 51 | 52 | Inadequate control may result from: 53 | - Missing constraints 54 | - Inadequate safety control commands 55 | - Commands that were not executed correctly at a lower level 56 | - Inadequately communicated or processed feedback about constraint enforcement 57 | 58 | Between hierarchical levels, need: 59 | - Downward *reference channel* providing info necessary to impose safety constraints on the level below 60 | - Upward *measuring channel* to provide feedback about how effectively constraints are being satisfied 61 | 62 | *Time lags* may affect flow of control actions and feedback and may impact effectiveness of the control loop in enforcing safety constraints 63 | 64 | ### Process models 65 | 66 | Four conditions to control a process: 67 | 68 | 1. Goal: safety constraints that must be enforced by each controller in the hierarchical safety control structure 69 | 1. Action condition: implemented in the downward control channels 70 | 1. Observability condition: embodied in the upward feedback or measuring channels 71 | 1. Model condition: any controller needs a model of the process being controlled to control it effectively 72 | 73 | Component interaction accidents can usually be explained in terms of incorrect process models. 74 | 75 | In general, accidents often occur when the process model used by the controller does not match the process and, as a result: 76 | 77 | 1. Incorrect or unsafe control commands are given 78 | 1. Required control actions (for safety) are not provided 79 | 1. Potentially correct control commands are provided at the wrong time (too early or too late), or 80 | 1. Control is stopped too soon or applied too long 81 | 82 | Process models play an important role: 83 | 84 | 1. In understanding why accidents occur and why humans provide inadequate control over safety-critical systems 85 | 1. In designing safer systems. 86 | 87 | ## Accidents 88 | Accidents in STAMP are the result of a complex process that results in the system behavior violating the safety constraints. The safety constraints are enforced by the control loops between the various levels of the hierarchical control structure that are in place during design, development, manufacturing, and operations. 89 | 90 | Using the STAMP causality model, if there is an accident, one or more of the following must have occurred: 91 | 92 | 1. The safety constraints were not enforced by the controller. 93 | a. The control actions necessary to enforce the associated safety constraint at each level of the sociotechnical control structure for the system were not provided. 94 | b. The necessary control actions were provided but at the wrong time (too early or too late) or stopped too soon 95 | c. Unsafe control actions were provided that caused a violation of the safety constraints. 96 | 2. Appropriate control actions were provided but not followed. 97 | 98 | Classification of accident causal factors starts by examining each of the basic components of a control loop and determining how their improper operation may contribute to the general types of inadequate control. 99 | 100 | Causal factors in accidents can be divided into three general categories: 101 | 102 | 1. The controller operation 103 | 1. The behavior of actuators and controlled processes 104 | 1. Communication and coordination among controllers and decision makers 105 | 106 | When humans are involved in the control structure, context and behavior-shaping mechanisms also play an important role in causality. 107 | 108 | ### Controller operation 109 | Three primary parts: 110 | 111 | 1. Control inputs and other relevant external information sources 112 | 1. Control algorithms 113 | 1. Process model 114 | 115 | Inadequate, ineffective or missing control actions necessary to enforce safety constraints and ensure safety can stem from flaws in each of these parts. 116 | 117 | For human controllers and actuators, context is also an important factor. 118 | 119 | #### Unsafe inputs 120 | Control actions and other info required for safe behavior may be missing or wrong. 121 | 122 | #### Unsafe control algorithms 123 | Control algorithms may not enforce safety constraints because: 124 | 125 | - Algorithms are inadequately designed originally 126 | - Process may change and algorithms become unsafe 127 | - Control algorithms may be inadequately modified by maintainers if the algorithms are automated or through various types of natural adaptation if they are implemented by humans 128 | 129 | **Time delays** are important consideration in designing control algorithms. 130 | Feedback delays generate requirements to predict when a prior control action 131 | has taken effect and when resources will be available again. When time delays 132 | are not adequately considered in the control algorithm, accidents can result. 133 | 134 | Many accidents relate to *asynchronous evolution* where one part of the system 135 | changes without the related necessary changes in the other parts. 136 | 137 | Communication is a critical factor here as well as monitoring for changes that may occur and feeding back this information to the higher-level control. For example, the safety analysis process that generates constraints always involves some basic assumptions about the operating environment of the process. When the environment changes such that those assumptions are no longer true. 138 | 139 | #### Inconsistent, incomplete or incorrect process models 140 | 141 | Accidents, particularly component interaction accidents, most often result from inconsistencies between the models of the process used by the controllers (both human and automated) and the actual process state. When the controller's model of the process (either the human mental model or the software or hardware model) diverges from the process state, erroneous control commands (based on the incorrect model) can lead to an accident. 142 | 143 | The most common form of inconsistency occurs when one or more process models is incomplete in terms of not defining appropriate behavior for all possible process states or all possible disturbances, including unhandled or incorrectly handled component failures. 144 | 145 | Inconsistency happens when: 146 | - The process model designed into the system is wrong from the beginning 147 | - Missing or incorrect feedback for updating the process model as the controlled process changes state 148 | - Process model is updated incorrectly 149 | - Time lags are not accounted for. 150 | 151 | No control system will perform better than its measuring channel. 152 | 153 | Feedback is missing inadequate when: 154 | - Not included in the system design 155 | - Flaws exist in the monitoring or feedback communication channel 156 | - Feedback is not timely 157 | - Measuring instrument operates inadequately 158 | 159 | ### Actuators and controlled processes 160 | Problem: the control commands maintain the safety constraints, but the controlled process does not implement the commands. 161 | 162 | Possible reasons: 163 | - failure/flaw in reference channel (transmission of control commands) 164 | - Actuator or controlled component fault or failure 165 | - Safety depends on inputs from other system components (e.g., power) for execution of controlled actions, where these inputs are missing or inadequate 166 | - External disturbances not handled by the controller 167 | 168 | ### Coordination and communication among controllers and decision makers 169 | When there are multiple controllers (human and/or automated), control actions may be inadequately coordinated, including unexpected side effects of decisions or actions or conflicting control actions. Communication flaws play an important role here. 170 | 171 | Accidents are most likely in overlap areas or in boundary areas or where two or more controllers (human or automated) control the same process or processes with common boundaries 172 | 173 | #### Context and environment 174 | Human behavior is greatly impacted by the context and environment in which the human is working. These factors have been called "behavior shaping mechanisms". 175 | 176 | 177 | ## Definitions 178 | ### Accident 179 | An undesired or unplanned event that results in a loss, including loss of human life or human injury, property damage, environmental pollution, mission loss, etc. 180 | 181 | ### Hazard 182 | A system state or set of conditions that, together with a particular set of worst-case environmental conditions, will lead to an accident (loss). 183 | 184 | Hazards may be defined in terms of conditions, as here, or in terms of events as long as one of these choices is used consistently. 185 | 186 | Hazards are not identical to failures: failures can occur without resulting in a hazard and a hazard may occur without any precipitating failures. 187 | 188 | Draw the system boundaries 189 | Identify high-level system hazards 190 | Specify system-level safety requirements and design constraints necessary to prevent hazards from occurring 191 | 192 | ## STPA - hazard analysis 193 | STPA (System-Theoretic Process Analysis) is a *hazard analysis* technique. The goal of hazard analysis is to identify potential causes of accidents so they can be eliminated or controlled before damage occurs. 194 | 195 | Goals of STPA: 196 | - Identify accident scenarios that encompass the entire accident process 197 | - Provide guidance to the users in getting good results 198 | 199 | Two main steps: 200 | 201 | 1. Identify the potential for inadequate control of the system that could lead to a hazardous state. 202 | 203 | Hazardous states result from inadequate control or enforcement of the safety constraints, which can occur because: 204 | 205 | 1. A control action required for safety is not provided or not followed. 206 | 1. An unsafe control action is provided. 207 | 1. A potentially safe control action is provided too early or too late, that is, at the wrong time or in the wrong sequence. 208 | 1. A control action required for safety is stopped too soon or applied too long. 209 | 210 | 211 | 2. Determine how each potentially hazardous control action identified in step 1 could occur. 212 | 213 | a. For each unsafe control action, examine the parts of the control loop to see if they could cause it. 214 | 215 | Design controls and mitigation measures if they do not already exist or evaluate existing measures if the analysis is being performed on an existing design. 216 | 217 | For multiple controllers of the same component or safety constraint, identify conflicts and potential coordination problems. 218 | 219 | b. Consider how the designed controls could degrade over time and build in protection, including 220 | 221 | 1. Management of change procedures to ensure safety constraints are enforced in planned changes. 222 | 1. Performance audits where the assumptions underlying the hazard analysis sis are the preconditions for the operational audits and controls so that unplanned changes that violate the safety constraints can be detected. 223 | 1. Accident and incident analysis to trace anomalies to the hazards and to the system design. 224 | 225 | 226 | ## CAST - accident/incident analysis 227 | CAST - causal analysis based on STAMP 228 | 229 | 1. Identify the system(s) and hazard(s) involved in the loss 230 | 2. Identify the system safety constraints and system requirements associated with that hazard. 231 | 3. Document the safety control structure in place to control the hazard and enforce the safety constraints. This structure includes the roles and responsibilities of each component in the structure as well as the controls provided or created to execute their responsibilities and the relevant feedback provided to them to help them do this. This structure may be completed in parallel with the later steps. 232 | 4. Determine the proximate events leading to the loss. 233 | 5. Analyze the loss at the physical system level. Identify the contribution of each of the following to the events: physical and operational controls, physical failures, dysfunctional interactions, communication and coordination flaws, and unhandled disturbances. Determine why the physical controls in place were ineffective in preventing the hazard. 234 | 6. Moving up the levels of the safety control structure, determine how and why each successive higher level allowed or contributed to the inadequate control at the current level. 235 | 236 | For each system safety constraint, either the responsibility for enforcing it was never assigned to a component in the safety control structure or a component or components did not exercise adequate control to ensure their assigned responsibilities (safety constraints) were enforced in the components below them. 237 | 238 | Any human decisions or flawed control actions need to be understood in terms of (at least): 239 | 240 | 1. the information available to the decision maker as well as any required information that was not available 241 | 1. the behavior-shaping mechanisms (the context and influences on the decision-making making process) 242 | 1. the value structures underlying the decision 243 | 1. any flaws in the process models of those making the decisions and why those flaws existed. 244 | 7. Examine overall coordination and communication contributors to the loss. 245 | 8. Determine the dynamics and changes in the system and the safety control structure relating to the loss and any weakening of the safety control structure over time. 246 | 9. Generate recommendations. 247 | 248 | 249 | -------------------------------------------------------------------------------- /boundary.graffle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorin/resilience-engineering/1ace2a9fbe9d45d86cb9d6f740bb68ed28b4300e/boundary.graffle -------------------------------------------------------------------------------- /boundary.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorin/resilience-engineering/1ace2a9fbe9d45d86cb9d6f740bb68ed28b4300e/boundary.png -------------------------------------------------------------------------------- /graceful-extensibility.md: -------------------------------------------------------------------------------- 1 | # Theorems of graceful extensibility 2 | 3 | Source: [The theory of graceful extensibility: basic rules that govern adaptive systems] 4 | 5 | 6 | ## Managing risk of saturation 7 | ### Adaptive capacity is finite / boundaries are universal 8 | The location of boundaries to the ability to meet demands is uncertain. 9 | 10 | Given a finite range, there is a general parameter - capacity for maneuver (CfM) which specifies how much of the range the unit has used and what capacity reamints to handle upcoming demands. 11 | 12 | ### Events will produce demands that challenge boundaries on the adaptive capacity of any UAB / Surprise occurs, continuously 13 | 14 | There are recurring patterns that characterize model surprise - how events challenge boundaries: 15 | 16 | * Events will occur at some rate and of some size and of some kind that increase the risk of saturation - exhausting the remaining CfM 17 | * Brittleness is how rapidly a unit's eprformance declines when it nears and reaches its boundaries. 18 | * The range of adaptive behavior of a UAB is a model of fitness. 19 | * Events that occur near or outside a UAB's boundary increases the risk of stauration, and this occurs independent ofhow well that UAB matches responses to demands. 20 | 21 | ### Adaptive capacities are regulated to manage the risk of saturating CfM / Risk of saturation is monitored and regulated 22 | * The work required to adapt and handle changing demands increases as CfM decreases. 23 | * As risk of saturation increases and CfM approaches exhaustion, UABs need to adapt to stretch or extend their base range of adaptive behavior to accomodate surprises. 24 | 25 | ## Network of adaptive units 26 | ### No UAB can have sufficient ability to regulate CfM to manage the risk of saturation alone / Synchronization across multiple units of adaptive behavior in a network is necessary 27 | 28 | UABs exist in and are defined relative to a network of interacting and interdependent UABs at multiple scales → networks with multiple roles, multiple echelons 29 | 30 | ### Some UABs monitor and regulate the CfM of other UABs in response to changes in the risk of saturation / Risk of saturation can be shared 31 | 32 | Misalignment and mis-coordination across UABs increaes the risk of saturating control as demands grow and cascade. 33 | 34 | ### Adaptive capacity is the potential for adjusting patterns of action to handle future situations, events, opportunities and disruptions / Pressure changes what is sacrificed when 35 | What architectural properties of the network influence the way units in a network respond to varying pressures on trade-offs? 36 | 37 | ## Outmaneuvering constraints 38 | ### Performance of a UAB as it approaches saturation is different from the performance of that UAB when it operates far from saturation / Pressure for optimality undermines graceful extensibility 39 | 40 | ### All UABs are local / All adaptive units are local 41 | 42 | ### There are bounds on the perspective any UAB, but these limits are overcome by shifts and contrasts over multiple perspectives / Perspective contrast overcomes bounds 43 | 44 | ### Reflective systems risk mis-calibration / Mis-calibration is the norm 45 | 46 | 47 | [The theory of graceful extensibility: basic rules that govern adaptive systems]: https://www.researchgate.net/publication/327427067_The_Theory_of_Graceful_Extensibility_Basic_rules_that_govern_adaptive_systems 48 | -------------------------------------------------------------------------------- /intro.md: -------------------------------------------------------------------------------- 1 | # Resilience engineering: Where do I start? 2 | 3 | This an introductory guide to readings in *resilience engineering*, aimed at software engineers. 4 | 5 | Key papers are organized into themes: 6 | 7 | 8 | * [What is resilience?](#what-is-resilience) 9 | * [Changing perspectives on accidents and safety](#changing-perspectives-on-accidents-and-safety) 10 | * [Complex systems](#complex-systems) 11 | * [Coordination](#coordination) 12 | * [Automation](#automation) 13 | * [Boundary as a model (Rasmussen)](#boundary-as-a-model-rasmussen) 14 | * [David Woods](#david-woods) 15 | 16 | The papers linked here should all be accessible to casual readers. 17 | 18 | When you're ready for more, check out [resilience engineering notes](README.md). 19 | 20 | ## What is resilience? 21 | 22 | A *resilient* organization **adapts effectively to surprise**. 23 | 24 | Here I'm using the definition proposed by [David Woods](https://u.osu.edu/csel/member-directory/david-woods/). 25 | Before going into more detail about *resilience*, it's important to distinguish it from 26 | a different concept that Woods calls *robustness*. 27 | 28 | ### Robustness vs. resilience 29 | 30 | ![Resilience vs robustness](resilience-doodle.jpg) 31 | 32 | When we talk about designing highly available systems, we usually cover 33 | techniques such as redundancy, retries, fallbacks, and failovers. We think about 34 | what might go wrong (e.g., server failure, network partition), and design our 35 | system to gracefully handle these situations. 36 | 37 | Woods uses the term **robustness** to refer to systems that are designed to 38 | effectively handle known failure modes. 39 | 40 | **Resilience**, on the other hand, describes how well the system can handle 41 | troubles that were not foreseeable by the designer. You can think of robustness 42 | as being able to deal well with *known unknowns*, and resilience as being able 43 | to deal well with *unknown unknowns*. 44 | 45 | * [Four concepts for resilience and the implications for 46 | the future of resilience 47 | engineering] 48 | by Woods discusses four different common usages of the term *resilience*. 49 | In particular, he describes why he considers *robustness* to be a different concept. 50 | * [Resilience is a verb] is another very readable paper on how Woods defines resilience. 51 | 52 | 53 | [Four concepts for resilience and the implications for the future of resilience engineering]: https://www.researchgate.net/publication/276139783_Four_concepts_for_resilience_and_the_implications_for_the_future_of_resilience_engineering 54 | [Resilience is a verb]: https://www.researchgate.net/publication/329035477_Resilience_is_a_Verb 55 | 56 | ## Changing perspectives on accidents and safety 57 | 58 | Resilience engineering as a field emerged from the safety science community. 59 | That's why you'll often see examples from aviation and medicine, as well as 60 | other safety critical areas like maritime, space flight, nuclear power, and rail. 61 | 62 | Because of this history, the earlier papers that we associate with resilience 63 | engineering are reactions to previous ways of thinking about accidents in 64 | particular and safety in general. 65 | 66 | Note that traditional approaches to safety often focus on minimizing variance 67 | associated with humans doing work, using techniques such as documented 68 | procedures and enforcement mechanisms for deviating from them. 69 | 70 | ### New look / new view 71 | 72 | The "new look" or "new view" refers to a change in perspective on how accidents 73 | happen, which focuses on understanding how actions taken 74 | by actors involved in the incident were rational, given what information those 75 | actors had at the time that events were unfolding. 76 | 77 | Johan Bergström of Lund University has three excellent short (<10 minute) videos: 78 | 79 | * [Was it technical failure or human error?](https://www.youtube.com/watch?v=Ygx2AI2RtkI) 80 | * [Three analytical traps in accident investigation](https://www.youtube.com/watch?v=TqaFT-0cY7U) 81 | * [Two views on human error](https://www.youtube.com/watch?v=rHeukoWWtQ8) 82 | 83 | Two great introductory papers (alas, 2nd one is paywalled) are: 84 | 85 | * [Reconstructing human contributions to accidents: the new view on error and performance](http://sidneydekker.stackedsite.com/wp-content/uploads/sites/899/2013/01/SafetyResearch.pdf) 86 | by Dekker 87 | * [The error of counting errors](https://doi.org/10.1016/j.annemergmed.2008.03.015) by Robert Wears 88 | 89 | A great book on putting these ideas into practice in incident investigations is: 90 | 91 | * [The Field Guide to Understanding "Human Error"](https://www.amazon.com/Field-Guide-Understanding-Human-Error/dp/1472439058/) by Dekker 92 | 93 | 94 | ### Safety-II 95 | 96 | Safety-II is a perspective on the role that humans play in safety-critical 97 | systems, proposed by Erik Hollnagel. In the Safety-II perspective, 98 | it is the everyday, normal work of the humans in the system that create the safety, 99 | as opposed to the errors of humans that erode it. 100 | 101 | * [From Safety-I to Safety-II: A White Paper](https://www.skybrary.aero/bookshelf/books/2437.pdf) by Hollnagel is a very readable 102 | introduction to Safety-II concepts. 103 | * [Why do things go right?](http://www.safetydifferently.com/why-do-things-go-right/) by Dekker on the [Safety Differently](http://www.safetydifferently.com) website is another good article. 104 | 105 | ## Complex systems 106 | 107 | Ever wonder why resilience engineering advocates natter on about "no root cause?" 108 | 109 | A recurring theme in resilience engineering is about reasoning holistically 110 | about *systems*, as opposed to breaking things up into components and reasoning 111 | about components separately. This perspective is known as *systems thinking*, 112 | which is a school of thought that has been influential in the resilience 113 | engineering community. 114 | 115 | When you view the world as a system, the idea of *cause* becomes meaningless, 116 | because there's no way to isolate an individual cause. Instead, the world is 117 | a tangled web of influences. 118 | 119 | You'll often hear the phrase *socio-technical system*. This language emphasizes that 120 | systems should be thought of as encompassing both humans and technologies, as opposed to 121 | thinking about technological aspects in isolation. 122 | 123 | 124 | * [How complex systems fail](https://www.adaptivecapacitylabs.com/HowComplexSystemsFail.pdf) by Richard I. Cook is a great starting point. It's a short paper and very easy to read. 125 | * [Drift into failure](https://www.goodreads.com/book/show/10258783) by Sidney Dekker is a book written for a lay audience, so it is also very readable. Dekker draws heavily from systems thinking to propose a theory about how complex systems can evolve into unsafe states. 126 | 127 | 128 | ## Coordination 129 | 130 | The systems we are interested in often involve a collection of people working together 131 | in some way to achieve a task. One particularly relevant example involves a collection of engineers 132 | working together to troubleshoot and repair a system during an ongoing 133 | incident. 134 | 135 | * [Common Ground and Coordination in Joint Activity] is an oft-cited paper on what is required for people 136 | to effectively coordinate when working on tasks together. 137 | 138 | [Common Ground and Coordination in Joint Activity]: http://jeffreymbradshaw.net/publications/Common_Ground_Single.pdf 139 | 140 | ## Automation 141 | 142 | One thing we software folk do have in common with the safety-critical world is 143 | the increased adoption of automation. Automation introduces challenges, and 144 | the nature of these challenges is a topic of many resilience engineering papers. 145 | 146 | You might hear the phrase *joint cognitive system* in the context of automation. This terms refers to 147 | systems that do cognitive work that are made up of a combination of humans and software. 148 | There is an entire research discipline that studies joint cognitive systems called *cognitive systems engineering*, initially 149 | developed by David Woods and Erik Hollnagel, both of whom would both later go on to play a significant role in 150 | developing the field of resilience engineering. 151 | 152 | Because resilience engineering researchers like Woods and Hollnagel have their roots in cognitive 153 | systems engineering, and because of the ever-increasing use of software automation in society, 154 | this community is very concerned about the potential *brittleness* associated with poor 155 | use of automation. 156 | 157 | 158 | * [Ironies of automation](https://ckrybus.com/static/papers/Bainbridge_1983_Automatica.pdf) by Lisanne 159 | Bainbridge is a classic paper on the problems that automation can introduce. 160 | The paper was originally written in 1983, and continues to be widely cited. 161 | 162 | * [How to make automated systems team players](https://researchgate.net/publication/2483863) 163 | by Christoffersen and Woods discusses how previous automated systems have been problematic and proposes strategies 164 | for improving automating. 165 | 166 | * [Ten challenges for making automation a team player](https://ieeexplore.ieee.org/abstract/document/1363742) 167 | by Klein et al. is a more recent paper that outlines the requirements for automation to be genuinely effective in 168 | socio-technical systems. This work draws heavily from the theme of *coordination* discussed earlier. 169 | 170 | ## Boundary as a model (Rasmussen) 171 | 172 | 173 | The late Jens Rasmussen is an enormously influential figure in the resilience engineering community. 174 | 175 | * [Risk management in a dynamic society: a modelling problem](https://doi.org/10.1016/S0925-7535(97)00052-0), published in 1997, 176 | is one of Rasmussen's most famous papers, which introduces Rasmussen's *dynamic safety model*. 177 | 178 | In this widely cited paper, Rasmussen advocates for a cross-disciplinary, 179 | systems-based approach to thinking about how accidents occur. He argues that 180 | accidents occur because the system migrates across a dangerous boundary, and 181 | this migration occurs during the course of normal work. 182 | 183 | Here is a depiction of the model from that paper: 184 | 185 | ![boundary](boundary.png) 186 | 187 | 188 | 189 | ## David Woods 190 | 191 | We've already referenced several papers authored or co-authored by 192 | David Woods. Woods is a force of nature in the field of resilience engineering, having 193 | played a key role in creating the field itself. Woods is incredibly prolific, 194 | and has introduced a wide variety of concepts related to resilience 195 | engineering. 196 | 197 | Woods is interested in resilience engineering principles that apply across an 198 | enormous range of different types of systems: whether we're talking about 199 | the organs in a biological organism up to organizations like NASA. 200 | 201 | Because he's interested in general principles, many of his papers are written at 202 | a very abstract level, where he discusses generic concepts such as *units of adaptive 203 | behavior* or *saturation*. 204 | 205 | ### Dragons at the boundary 206 | 207 | David Woods uses the metaphor of a system moving within a boundary in his writings on resilience engineering, but in 208 | a slightly different way than Rasmussen. 209 | 210 | Woods sees the boundary as a *competence envelope*. There are two different regimes of system behavior: far from the boundary and near the boundary. 211 | 212 | When a system is far from the boundary, the system (and its environment) behave as expected. By contrast, when a system 213 | grows near to the boundary, surprises happen. Woods uses the metaphor of *dragons* to capture the surprises that occur when a system moves near the boundary, and how the system's model of the world is violated when it enters this regime. 214 | 215 | It is how units within a system adapt when the system moves near the boundary, how these units deal with the dragons, 216 | that is one of the prime concerns of Woods. 217 | 218 | Woods's [Essentials of Resilience, revisited](https://www.researchgate.net/profile/David_Woods11/publication/330116587_4_Essentials_of_resilience_revisited/links/5c2e448ba6fdccd6b58f871e/4-Essentials-of-resilience-revisited.pdf?origin=publication_detail) discusses behavior at the boundary, although it doesn't use the *dragon* metaphor. 219 | 220 | ### The adaptive universe 221 | 222 | Woods's idea of the *adaptive universe* is characterized by three properties: 223 | 224 | * Resources are finite 225 | * Surprise is fundamental 226 | * Change never stops 227 | 228 | I haven't found a good introductory paper for the adaptive universe, as it 229 | encompasses an enormous number of topics, including the topic of *dragons at the boundaries* 230 | that we discussed earlier. 231 | 232 | I recommend watching Woods's [Resilience Engineering short 233 | course](https://www.youtube.com/playlist?list=PLvlZBj1NU_ikTy1ot30EbEbYMAoBf9eAt), which 234 | covers this topic. I've written my own [notes on the short 235 | course](https://github.com/lorin/res-eng-short-course-notes), which you might 236 | find useful. In particular, you might be interested in my [summary 237 | notes](https://github.com/lorin/res-eng-short-course-notes/blob/master/summary.md). 238 | 239 | ### Graceful extensibility 240 | 241 | Woods introduced the theory of *graceful extensibility* to capture how successful 242 | systems adapt effectively to surprise. The most relevant paper here is: 243 | 244 | * [The theory of graceful extensibility: basic rules that govern adaptive systems](https://link.springer.com/article/10.1007%2Fs10669-018-9708-3). 245 | 246 | -------------------------------------------------------------------------------- /laws.md: -------------------------------------------------------------------------------- 1 | # Laws, tradeoffs and theorems 2 | 3 | Many of these are documented in [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems] 4 | by Hoffman and Woods. 5 | 6 | [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems]: https://www.researchgate.net/publication/220628177_Beyond_Simon%27s_Slice_Five_Fundamental_Trade-Offs_that_Bound_the_Performance_of_Macrocognitive_Work_Systems 7 | 8 | * Laws 9 | - Law of fluency 10 | - Law of stretched systems 11 | - Law of requisite variety 12 | - Laws of the adaptive universe 13 | - Law of coordinative entropy 14 | - Mr. Weasley's Law 15 | - The Law of the Kludge 16 | - First law of cooperative systems 17 | - (Robin) Murphy's Law 18 | * Tradeoffs 19 | - Efficiency-thoroughness tradeoff 20 | - Optimality-brittleness tradeoff 21 | * Theorems 22 | - Theorems of graceful extensibility 23 | 24 | ## Laws 25 | 26 | ### Law of fluency 27 | 28 | Well-adapted cognitive work occurs with a facility that belies the difficulty 29 | of resolving demands and balancing dilemmas 30 | 31 | Source: [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems] 32 | 33 | ### Law of stretched systems 34 | 35 | Every system is stretched to operate at its capacity. 36 | 37 | Sources: 38 | 39 | * [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems] 40 | 41 | This law is attributed to Lawrence Hirschhorn, and has been popularized by David Woods and Richard Cook. 42 | 43 | ### Law of requisite variety 44 | 45 | The larger the variety of actions available to a control system, the larger the 46 | variety of perturbations it is able to compensate. 47 | 48 | This is also called the first law of cybernetics or Ashby's law. 49 | 50 | Source: 51 | 52 | ### Laws of the adaptive universe 53 | 54 | * Resources are finite 55 | * Surprise is fundamental 56 | * Change never stops 57 | 58 | Source: 59 | 60 | ### Law of coordinative entropy 61 | 62 | Coordination costs, continuously. 63 | 64 | Source: [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems] 65 | 66 | ### Law of systems as surrogates 67 | 68 | Technology reflects the stances, agendas, and goals of those who design and deploy the technology. 69 | 70 | Source: [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems] 71 | 72 | ### Mr. Weasley's Law 73 | 74 | Never trust anything that can think for itself if you can’t see where it keeps its brain. 75 | 76 | Source: [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems] 77 | 78 | ### The Law of the Kludge 79 | 80 | Work systems always require workarounds, with resultant kludges that attempt 81 | to bridge the gap between the original design objectives and current realities 82 | or to reconcile conflicting goals among workers. 83 | 84 | Source: [Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems] 85 | 86 | ### First law of cooperative systems 87 | 88 | It's not cooperation, if either you do it all or I do it all. 89 | 90 | Source: David Woods. Not sure where he first wrote this, but it's referenced in *Cognitive Systems Engineering: The Future for a Changing World* 91 | 92 | ### (Robin) Murphy's Law 93 | 94 | Any deployment of robotic systems will fall short of the target level of autonomy, creating or exacerbating a shortfall 95 | in mechanisms for coordination with human stakeholders. 96 | 97 | Source: This is mentioned in [Joint Cognitive Systems: Patterns in Cogntive Systems Engineering](https://www.amazon.com/Joint-Cognitive-Systems-Patterns-Engineering-ebook/dp/B00918NQOE/ref=sr_1_1?keywords=joint+cognitive+systems&qid=1557092907&s=gateway&sr=8-1), Chapter 10 (Automation Surprises). 98 | 99 | ## Tradeoffs 100 | 101 | ### Optimality vs. resilience 102 | 103 | The pursuit of increases in optimality with respect to some criteria 104 | guarantees an increase in brittleness with respect to changes or variations 105 | that fall outside of those criteria. 106 | 107 | Described in *Beyond Simon's Slice* as: 108 | * bounded ecology 109 | * *optimality-resilience of adaptive capacity trade-off* 110 | 111 | 112 | ### Efficiency vs. thoroughness 113 | 114 | People (and organisations) as part of their activities frequently – or always – 115 | have to make a trade-off between the resources (primarily time and effort) they 116 | spend on preparing to do something and the resources (primarily time and 117 | effort) they spend on doing it. 118 | 119 | 120 | Described in *Beyond Simon's Slice* as: 121 | * bounded cognizance 122 | * *efficiency-thoroughness of situated plans trade-off* 123 | 124 | Source: 125 | 126 | ### Revelation vs. reflection 127 | 128 | Because every perspective reveals some details and hides others, we 129 | gain an advantage from reflecting on different perspectives. But this 130 | reflection has a cost, it takes effort. 131 | 132 | (The text itself doesn't describe "revelation", but my sense is that this is an explore/exploit 133 | style tradeoff, where we have to trade off going broader on perspectives with going deeper in 134 | a specific perspective). 135 | 136 | Described in *Beyond Simon's Slice* as: 137 | * bounded perspectives 138 | * *revelation-reflection on perspectives trade-off* 139 | 140 | ### Acute goal vs. chronic goal 141 | 142 | There are ongoing (chronic) goals that we are always responsible for (e.g., safety), but we often 143 | face some shorter term deadline (acute) that demands more of our attention. 144 | 145 | Described in *Beyond Simon's Slice* as: 146 | * bounded responsibility 147 | * *acute-chronic goal responsibility trade-off* 148 | 149 | 150 | ### Concentrated action vs. distributed action 151 | 152 | Distributing autonomy allows systems to act more quickly, but it makes synchronization across 153 | actions more difficult. 154 | 155 | Described in *Beyond Simon's Slice* as: 156 | * bounded effectiveness 157 | * *concentrated-distributed action trade-off* 158 | 159 | 160 | 161 | ## Theorems 162 | 163 | ### Theorems of graceful extensibility 164 | 165 | * *UAB* stands for unit of adaptive behavior 166 | * *CfM* stands for capacity for manoeuvre 167 | 168 | 1. Adaptive capacity is finite 169 | 2. Events will produce demands that challenge boundaries on the adaptive 170 | capacity of any UAB 171 | 3. Adaptive capacities are regulated to manage the risk of saturating CfM 172 | 4. No UAB can have sufficient ability to regulate CfM to manage the risk of saturation alone 173 | 5. Some UABs monitor and regulate the CfM of other UABs in response to changes 174 | in the risk of saturation 175 | 6. Adaptive capacity is the potential for adjusting patterns of action to 176 | handle future situations, events, opportunities and disruptions 177 | 7. Performance of a UAB as it approaches saturation is different from the 178 | performance of that UAB when it operates far from saturation 179 | 8. All UABs are local 180 | 9. There are bounds on the perspective any UAB, but these limits are overcome 181 | by shifts and contrasts over multiple perspectives. 182 | 10. Reflective systems risk mis-calbiration 183 | 184 | Source: [The Theory of Graceful Extensibility: Basic rules that govern adaptive systems](https://www.researchgate.net/publication/327427067_The_Theory_of_Graceful_Extensibility_Basic_rules_that_govern_adaptive_systems) 185 | -------------------------------------------------------------------------------- /paries-keynote-2015.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorin/resilience-engineering/1ace2a9fbe9d45d86cb9d6f740bb68ed28b4300e/paries-keynote-2015.pptx -------------------------------------------------------------------------------- /resilience-doodle.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorin/resilience-engineering/1ace2a9fbe9d45d86cb9d6f740bb68ed28b4300e/resilience-doodle.jpg -------------------------------------------------------------------------------- /risk-management-framework.graffle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorin/resilience-engineering/1ace2a9fbe9d45d86cb9d6f740bb68ed28b4300e/risk-management-framework.graffle -------------------------------------------------------------------------------- /risk-management-framework.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorin/resilience-engineering/1ace2a9fbe9d45d86cb9d6f740bb68ed28b4300e/risk-management-framework.png -------------------------------------------------------------------------------- /topics.md: -------------------------------------------------------------------------------- 1 | # Papers by topic 2 | 3 | These pages cluster notable resilience engineering [papers](README.md) by topic. 4 | 5 | - [The nature of cognitive work during an incident](topics/the-nature-of-cognitive-work-during-an-incident.md) 6 | - [Human-human interaction](topics/human-human-interaction.md) 7 | - [Nature of complex systems](topics/nature-of-complex-systems.md) 8 | - [Changing perspective on safety](topics/changing-perspective-on-safety.md) 9 | - [Common misconceptions](topics/common-misconceptions.md) 10 | - [Human-machine interaction](topics/human-machine-interaction.md) 11 | - [What can go badly during an incident](topics/what-can-go-badly-during-an-incident.md) 12 | - [What we mean by "resilience"](topics/what-we-mean-by-resilience.md) 13 | - [Incident analysis pragmatics](topics/incident-analysis-pragmatics.md) 14 | -------------------------------------------------------------------------------- /topics/changing-perspective-on-safety.md: -------------------------------------------------------------------------------- 1 | # Changing perspective on safety 2 | 3 | ## Concepts 4 | * old view vs. new view 5 | * safety-I vs safety-II 6 | * safety as "the capacity of people and systems to provide good outcomes" rather than "preventing things from going wrong" 7 | * "work as imagined" vs "work as done" 8 | 9 | ## Readings 10 | 11 | * [Reconstructing human contributions to accidents: the new view on error and performance](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.411.4985&rep=rep1&type=pdf) 12 | * [From Safety-I to Safety-II: A White Paper](https://www.skybrary.aero/bookshelf/books/2437.pdf) 13 | * [I want to believe: some myths about the management of industrial safety](http://dx.doi.org/10.1007/s10111-012-0237-4) 14 | * [Do Safety Differently](https://www.amazon.com/Do-Safety-Differently-Sidney-Dekker/dp/B09RM3Z17V) 15 | 16 | -------------------------------------------------------------------------------- /topics/common-misconceptions.md: -------------------------------------------------------------------------------- 1 | # Common misconceptions 2 | 3 | ## Concepts 4 | 5 | * hindsight 6 | * human error 7 | * root cause 8 | * systems thinking 9 | 10 | ## Readings 11 | 12 | * [Replacing Hindsight With Insight: Toward Better Understanding of Diagnostic Failures](https://www.semanticscholar.org/paper/Replacing-hindsight-with-insight%3A-toward-better-of-Wears-Nemeth/1bef45cae7375eddc8ee584dff100d200d812a8d) 13 | * [Applying systems thinking to analyze and learn from events](https://dspace.mit.edu/handle/1721.1/108102) 14 | * [The error of counting errors](https://doi.org/10.1016/j.annemergmed.2008.03.015) by Robert Wears 15 | 16 | 17 | -------------------------------------------------------------------------------- /topics/human-human-interaction.md: -------------------------------------------------------------------------------- 1 | # Human-human interaction 2 | 3 | ## Topics 4 | 5 | * Common ground and coordination 6 | * Being bumpable 7 | * Polycentric governance 8 | 9 | ## Readings 10 | 11 | * [Governing the Commons: The Evolution of Institutions for Collective Action](https://www.amazon.com/Governing-Commons-Evolution-Institutions-Collective/dp/1107569788) 12 | * [Common Ground and Coordination in Joint Activity](http://jeffreymbradshaw.net/publications/Common_Ground_Single.pdf) 13 | * [Patterns in Cooperative Cognition](https://www.researchgate.net/publication/262449980_Patterns_in_Cooperative_Cognition) 14 | 15 | -------------------------------------------------------------------------------- /topics/human-machine-interaction.md: -------------------------------------------------------------------------------- 1 | # Human-machine interaction 2 | 3 | ## Concepts 4 | 5 | * ironies of automation 6 | * team player 7 | 8 | ## Readings 9 | 10 | * [Ironies of automation](https://www.ise.ncsu.edu/wp-content/uploads/2017/02/Bainbridge_1983_Automatica.pdf) 11 | * [How to Make Automated Systems Team Players](https://www.researchgate.net/profile/David_Woods11/publication/2483863_How_to_Make_Automated_Systems_Team_Players/links/5a4f829eaca272940bf8202c/How-to-Make-Automated-Systems-Team-Players.pdf) 12 | * [Ten challenges for making automation a team player](https://ieeexplore.ieee.org/abstract/document/1363742) 13 | 14 | -------------------------------------------------------------------------------- /topics/incident-analysis-pragmatics.md: -------------------------------------------------------------------------------- 1 | # Incident analysis pragmatics 2 | 3 | "Nuts and bolts" of incident analysis work. 4 | 5 | * [Etsy Debrief Facilitation Guide](http://extfiles.etsy.com/DebriefingFacilitationGuide.pdf) 6 | * [The field guide to understanding 'human error'](https://www.amazon.com/Field-Guide-Understanding-Human-Error/dp/1472439058s://www.amazon.com/Field-Guide-Understanding-Human-Error/dp/1472439058) 7 | * [Pre-accident investigations: an introduction to organizational safety](https://www.amazon.com/Pre-Accident-Investigations-Todd-Conklin/dp/1409447820) 8 | -------------------------------------------------------------------------------- /topics/nature-of-complex-systems.md: -------------------------------------------------------------------------------- 1 | # Nature of complex systems 2 | 3 | ## Concepts 4 | 5 | * sharp-end vs. blunt-end 6 | * practitioner actions as gambles 7 | * coping with complexity 8 | * robust yet fragile 9 | * drift 10 | * strange loops 11 | * dark debt 12 | * well-adapted, under-adapted, over-adapted 13 | * decompensation, working at cross-purposes, getting stuck in outdated behaviors 14 | 15 | ## Readings 16 | 17 | * [How complex systems fail](http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf) 18 | * [Basic Patterns in How Adaptive Systems Fail](https://www.researchgate.net/publication/284324002_Basic_patterns_in_how_adaptive_systems_fail) 19 | * [STELLA: Report from the SNAFUcatchers Workshop on Coping with Complexity](https://snafucatchers.github.io/) 20 | * [Highly Optimized Tolerance: Robustness and Design in Complex Systems](http://dx.doi.org/10.1103/physrevlett.84.2529) 21 | * [Drift into failure](https://www.amazon.com/Drift-into-Failure-Sidney-Dekker/dp/1409422216) 22 | 23 | 24 | -------------------------------------------------------------------------------- /topics/the-nature-of-cognitive-work-during-an-incident.md: -------------------------------------------------------------------------------- 1 | # The nature of cognitive work during an incident 2 | 3 | ## Concepts 4 | * problem detection 5 | * anomaly repsonse 6 | 7 | ## Readings 8 | 9 | * [Anomaly Response](https://docs.wixstatic.com/ugd/3ad081_f46dda684154447583c8a5b282b60cc2.pdf) 10 | * [Problem detection](https://www.researchgate.net/publication/220579480_Problem_detection) 11 | * [The strengths and limitations of teams for detecting problems](https://link.springer.com/article/10.1007/s10111-005-0024-6) 12 | 13 | 14 | -------------------------------------------------------------------------------- /topics/what-can-go-badly-during-an-incident.md: -------------------------------------------------------------------------------- 1 | # What can go badly during an incident 2 | 3 | ## Concepts 4 | 5 | * Going solid 6 | * Going sour 7 | * Fixation 8 | * Vagabonding 9 | 10 | 11 | ## Readings 12 | 13 | * [“Going solid”: a model of system dynamics and consequences for patient safety](https://qualitysafety.bmj.com/content/14/2/130) 14 | * [Learning from Automation Surprises and "Going Sour" Accidents: Progress on Human-Centered Automation](https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19980016965.pdf) 15 | 16 | 17 | -------------------------------------------------------------------------------- /topics/what-we-mean-by-resilience.md: -------------------------------------------------------------------------------- 1 | # What we mean by "resilience" 2 | 3 | ## Concepts 4 | 5 | * resilience 6 | * robustness 7 | 8 | ## Readings 9 | 10 | * [Resilience is a verb](https://www.researchgate.net/publication/329035477_Resilience_is_a_Verb) 11 | * [Four concepts for resilience and the implications for the future of resilience engineering](https://www.researchgate.net/publication/276139783_Four_concepts_for_resilience_and_the_implications_for_the_future_of_resilience_engineering) 12 | 13 | --------------------------------------------------------------------------------