├── README.md └── credits.md /README.md: -------------------------------------------------------------------------------- 1 | # Architectural patterns of resilient distributed systems 2 | 3 | Accompanying repository for the "Architectural patterns of resilient distributed systems" talk given at [Strangeloop 2015](http://www.thestrangeloop.com/2015/sessions.html). Feel free to open any issues for questions and/or to say hi :) 4 | 5 | ## Talk Outline 6 | See the [image credits](credits.md), link to [slides](https://speakerdeck.com/randommood/architectural-patterns-of-resilient-distributed-systems), and [video](https://www.youtube.com/watch?v=ohvPnJYUW1E). 7 | 8 | * Why Resilience 9 | * Motivation & Definitions 10 | * Resilience Literature 11 | * Harvest/Yield thinking 12 | * Cook's Model 13 | * Borrill's Model 14 | * Resilience in industry 15 | * Netflix 16 | * Google 17 | * Fastly 18 | * Conclusions 19 | * Back to the start 20 | * Parting thoughts and rantifestos 21 | 22 | ## References 23 | 24 | ### Resilience literature 25 | * [Baller checklist on things to remember](http://monkey.org/~marius/checklist.pdf) 26 | * [Harvest, Yield, and Scalable Tolerant Systems](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.3690&rep=rep1&type=pdf) 27 | * [Computer Immunology - Burgess](http://people.scs.carleton.ca/~soma/biosec/readings/burgess-immunology.pdf) 28 | * [Building Robust Systems an essay - Sussman](http://groups.csail.mit.edu/mac/users/gjs/6.945/readings/robust-systems.pdf) 29 | * [How Complex Systems Fail - Cook](http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf) 30 | * [Optimal Design, Robustness, and Risk Aversion](http://tuvalu.santafe.edu/~jdf/papers/optimal.pdf) 31 | * [Part Count and Design of Robust Systems](http://meche.mit.edu/documents/danfrey/danfrey_partcount.pdf) 32 | * [Highly Optimized Tolerance: A Mechanism for Power Laws in Designed Systems](http://snap.stanford.edu/class/cs224w-readings/carlson99tolerance.pdf) 33 | * [Fault Tolerance and the Five-Second Rule](https://www.usenix.org/system/files/conference/hotos15/hotos15-paper-chen_ang.pdf) 34 | * [Scale free Networks - computerworld](http://www.computerworld.com/article/2579374/networking/scale-free-networks.html) 35 | * [The Scale-free property - Barabási](http://barabasilab.neu.edu/networksciencebook/download/network_science_december_ch4_2013.pdf) 36 | * [Scale-free network](https://en.wikipedia.org/wiki/Scale-free_network) 37 | * [Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing](https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf) 38 | * [Failure Sketches: A Better Way to Debug](https://www.usenix.org/conference/hotos15/workshop-program/presentation/kasikci) 39 | * [Virtual Network Diagnosis as a Service](https://research.facebook.com/publications/616093585136896/virtual-network-diagnosis-as-a-service/) 40 | * [‘Going solid’: a model of system dynamics and consequences for patient safety](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1743994/pdf/v014p00130.pdf) 41 | * [Building on Quicksand](http://db.cs.berkeley.edu/cs286/papers/quicksand-cidr2009.pdf) 42 | * [Immutability Changes Everything](http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf) 43 | * [You can't sacrifice partition tolerance](http://codahale.com/you-cant-sacrifice-partition-tolerance/) 44 | * [Complex adaptive system](https://en.wikipedia.org/wiki/Complex_adaptive_system) 45 | * [Robustness principle](https://en.wikipedia.org/wiki/Robustness_principle) 46 | * [Small-world experiment](https://en.wikipedia.org/wiki/Small-world_experiment) 47 | 48 | ### Resilience in industry 49 | * [Fault tolerance in a high-volume distributed system](http://techblog.netflix.com/2012/02/fault-tolerance-in-high-volume.html) 50 | * [From Chaos to Control - Testing the resiliency of Netflix’s Content Discovery Platform](http://techblog.netflix.com/2015/08/from-chaos-to-control-testing.html) 51 | * [Making the Netflix API More Resilient](http://techblog.netflix.com/2011/12/making-netflix-api-more-resilient.html) 52 | * [Google Finds: Centralized Control, Distributed Data Architectures Work Better Than Fully Decentralized Architectures](http://highscalability.com/blog/2014/4/7/google-finds-centralized-control-distributed-data-architectu.html) 53 | * [Clients are Jerks: aka How Halo 4 DoSed the Services at Launch & How We Survived](http://caitiem.com/2015/06/23/clients-are-jerks-aka-how-halo-4-dosed-the-services-at-launch-how-we-survived/) 54 | * [Game Day Exercises at Stripe: Learning from kill -9](https://stripe.com/blog/game-day-exercises-at-stripe) 55 | * [How we ended up with microservices](http://philcalcado.com/2015/09/08/how_we_ended_up_with_microservices.html) 56 | * [Postmortem for July 27 outage of the Manta service](https://www.joyent.com/blog/manta-postmortem-7-27-2015) 57 | * [Hashicorp Yamux](https://github.com/hashicorp/yamux) 58 | * [The Chubby lock service for loosely-coupled distributed systems](http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf) 59 | * [Summary of the Amazon DynamoDB Service Disruption and Related Impacts in the US-East Region](https://aws.amazon.com/message/5467D2/) 60 | 61 | ### Media 62 | * [Velocity NY 2013: Richard Cook, "Resilience In Complex Adaptive Systems"](https://www.youtube.com/watch?v=PGLYEDpNu60&feature=youtu.be) 63 | * [Developing a Globally Distributed Purging System](https://www.youtube.com/watch?v=HfO_6bKsy_g) and [slides](https://speakerdeck.com/brucespang/papers-prototypes-and-production-developing-a-globally-distributed-purging-system) 64 | * [Complex Adaptive Systems: 13 Robustness & Resilience](https://www.youtube.com/watch?v=HOTWIPmkdzo) 65 | * [Network Theory: 16 Robustness & Resilience](https://www.youtube.com/watch?v=_ztNkmDg0mw) 66 | * [Design of Resilient Systems - Innovations in Thinking Differently](https://www.youtube.com/watch?v=nV52yh6GDMg) 67 | * [Camille Fournier's Papers We Love Talk on The Chubby lock service for loosely-coupled distributed systems](https://www.youtube.com/watch?v=PqItueBaiRg) and [slides](https://speakerdeck.com/hakka_labs/the-chubby-lock-service-for-loosely-coupled-distributed-systems) 68 | * [Scaling Networks through Software](https://www.usenix.org/conference/srecon15/program/presentation/taveira) 69 | 70 | # Thank you! 71 | Thank you to everyone who helped with feedback/resources and advice for this talk. Special thanks to: Paul Borrill, Jordan West, Caitie McCaffrey, Camille Fournier, Mike O'Neill, Neha Narula, Matt Whiteley, Joao Taveira, Tyler McMullen, Zac Duncan, Nathan Taylor, Ian Fung, Armon Dadgard, Peter Alvaro, Peter Bailis, Alex Rasmussen, Bruce Spang, Aysulu Greenberg, Elaine Greenberg, and Greg Bako. 72 | -------------------------------------------------------------------------------- /credits.md: -------------------------------------------------------------------------------- 1 | ## Image Credits 2 | 3 | * https://www.flickr.com/photos/twodolla/3140766030/ 4 | * https://www.flickr.com/photos/squishy/4283384 5 | * http://static.awkwardfamilyphotos.com/wp-content/uploads/cache/2015/03/01-3lDBXXU/3735362130.jpg 6 | * https://www.flickr.com/photos/30378582@N04/4885282554/ 7 | * https://www.flickr.com/photos/lauripiper/7595345258 8 | * https://www.flickr.com/photos/16724360@N04/10290674743 9 | * https://www.flickr.com/photos/16724360@N04/10290374914 10 | * https://www.flickr.com/photos/16724360@N04/10290276724 11 | * https://www.flickr.com/photos/16724360@N04/10290486753 12 | * https://www.flickr.com/photos/lauripiper/7588675192 13 | * https://www.flickr.com/photos/pennstatelive/21061551969 14 | * https://www.flickr.com/photos/habologique/16973950316 15 | * https://www.flickr.com/photos/uplatecakes/272266151 16 | * http://cdn.phuntube.com/images/2015/04/everyone_involved_in_the_taking_of_this_wedding_photo-23478.jpg 17 | * http://pixel.brit.co/wp-content/uploads/2014/08/cake3.jpg 18 | * http://sh-www-prod.s3.amazonaws.com/originals/711.jpg 19 | * https://www.flickr.com/photos/jaumepi/5551619510 20 | 21 | ### Icons 22 | * [help by Luis Prado](https://thenounproject.com/search/?q=lifeguard&i=28806) 23 | * [Mannequin by José Manuel de Laá](https://thenounproject.com/search/?q=pattern&i=150984) 24 | * [Compass by iconsmind.com](https://thenounproject.com/search/?q=pattern&i=71521) 25 | * [graduates by MikaDo Nguyen](https://thenounproject.com/search/?q=professor&i=194351) 26 | * [graduate by Creative Stall](https://thenounproject.com/search/?q=professor&i=144348) 27 | * [Fruit by Creative Stall](https://thenounproject.com/search/?q=fruits&i=130493) 28 | * [Warning by Golden Roof](https://thenounproject.com/search/?q=availability&i=169393) 29 | * [Layers by Mike Ashley](https://thenounproject.com/search/?q=composition&i=74310) 30 | * [Weight Lifting by Luis Prado](https://thenounproject.com/search/?q=robust&i=50882) 31 | * [marriage by Jason Dilworth](https://thenounproject.com/search/?q=marriage&i=154885) 32 | * [cupid by Yazmin Alanis](https://thenounproject.com/search/?q=cupid&i=88187) 33 | * [Crosshair by Ates Evren Aydinel](https://thenounproject.com/search/?q=cupid&i=191516) 34 | * [Love Letter by Yazmin Alanis](https://thenounproject.com/search/?q=cupid&i=88186) 35 | * [Locked Heart by Gregory Sujkowski](https://thenounproject.com/search/?q=cupid&i=100184) 36 | * [Love potion by Matt Brooks](https://thenounproject.com/search/?q=love&i=33258) 37 | * [Love Letter by Riccardo Rossetti](https://thenounproject.com/search/?q=love&i=7681) 38 | * [crowd by MikaDo Nguyen](https://thenounproject.com/search/?q=crowd&i=178659) 39 | * [Bride by Yazmin Alanis](https://thenounproject.com/search/?q=wedding+dress&i=181177) 40 | * [Hearts by Hayley Parke](https://thenounproject.com/search/?q=love+dove&i=91755) 41 | * [Birds In Love by Hayley Parke](https://thenounproject.com/search/?q=love+dove&i=80456) 42 | * [Balloon by khaleel](https://thenounproject.com/khaleel/collection/love/?oq=love%20bug&cidx=0&i=159862) 43 | * [Cloud by khaleel](https://thenounproject.com/khaleel/collection/love/?oq=love%20bug&cidx=0&i=159866) 44 | * [Calendar by khaleel](https://thenounproject.com/khaleel/collection/love/?oq=love%20bug&cidx=0&i=159864) 45 | * [Skull and Crossbones by Luis Prado](https://thenounproject.com/search/?q=human+safety&i=24836) 46 | * [Castle by Trevor Tarczynski](https://thenounproject.com/search/?q=stronghold&i=8172) 47 | * [Question by Brennan Novak](https://thenounproject.com/search/?q=love+question&i=20518) 48 | * [Artificial Heart by Jakub Ukrop](https://thenounproject.com/search/?q=hearts&i=4153) 49 | * [Heart by Oleg Frolov](https://thenounproject.com/search/?q=hearts&i=31341) 50 | * [Open Heart by José Manuel de Laá](https://thenounproject.com/search/?q=hearts&i=38600) 51 | * [Heart by Michael-Andre Joda](https://thenounproject.com/search/?q=hearts&i=56597) 52 | * [Apple by Gabor Fulop](https://thenounproject.com/search/?q=hearts&i=157012) 53 | * [Ribbon by misirlou from the Noun Project](https://thenounproject.com/search/?q=hearts&i=15185) 54 | * [embrace by Egon Låstad from the Noun Project](https://thenounproject.com/search/?q=hearts&i=113894) 55 | * [Heart Keyhole and Key by Konstantin Bulygin from the Noun Project](https://thenounproject.com/search/?q=hearts&i=101080) 56 | * [Heart Repair by Aha-Soft from the Noun Project](https://thenounproject.com/search/?q=hearts&i=183543) 57 | --------------------------------------------------------------------------------