├── .gitignore
├── LICENSE
├── README.md
├── chapters
    ├── chapter-01.md
    ├── chapter-02.md
    ├── chapter-03.md
    ├── chapter-04.md
    ├── chapter-05.md
    ├── chapter-06.md
    ├── chapter-07.md
    ├── chapter-08.md
    ├── chapter-09.md
    ├── chapter-10.md
    ├── chapter-11.md
    └── chapter-12.md
├── package-lock.json
└── package.json


/.gitignore:
--------------------------------------------------------------------------------
1 | node_modules/*
2 | *.txt


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Andrew Davis
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Building Microservices by Sam Newman (Personal Notes)
 2 | 
 3 | My personal notes for the book: Building Microservices by Sam Newman.
 4 | 
 5 | Book available for purchase [here](https://www.amazon.com/-/es/Sam-Newman/dp/1491950358).
 6 | 
 7 | # Table of Contents
 8 | 
 9 | -  [Chapter 1: Microservices](/chapters/chapter-01.md)
10 | -  [Chapter 2: The Evolutionary Architect](/chapters/chapter-02.md)
11 | -  [Chapter 3: How to Model Services](/chapters/chapter-03.md)
12 | -  [Chapter 4: Integration](/chapters/chapter-04.md)
13 | -  [Chapter 5: Splitting the Monolith](/chapters/chapter-05.md)
14 | -  [Chapter 6: Deployment](/chapters/chapter-06.md)
15 | -  [Chapter 7: Testing](/chapters/chapter-07.md)
16 | -  [Chapter 8: Monitoring](/chapters/chapter-08.md)
17 | -  [Chapter 9: Security](/chapters/chapter-09.md)
18 | -  [Chapter 10: Conway’s Law and System Design](/chapters/chapter-10.md)
19 | -  [Chapter 11: Microservices at Scale](/chapters/chapter-11.md)
20 | -  [Chapter 12: Bringing It All Together](/chapters/chapter-12.md)
21 | 


--------------------------------------------------------------------------------
/chapters/chapter-01.md:
--------------------------------------------------------------------------------
 1 | # Chapter 1: Microservices
 2 | 
 3 | ---
 4 | ## What are Microservices?
 5 | Microservices are small, autonomous services that work together.
 6 | 
 7 | ---
 8 | ## What does it mean “Small, and Focused on Doing One Thing Well”? How do we set services boundaries?
 9 | Microservices take this same approach to independent services. We focus our service boundaries on business boundaries, making it obvious where code lives for a given piece of functionality. And by keeping this service focused on an explicit boundary, we avoid the temptation for it to grow too large, with all the associated difficulties that this can introduce.
10 | 
11 | ---
12 | ## Which is one of the aspects that help us answer the “how small?” question?
13 | A strong factor in helping us answer how small? is how well the service aligns to team structures. If the codebase is too big to be managed by a small team, looking to break it down is very sensible.
14 | 
15 | ---
16 | ## What does it mean for a microservice to be autonomous?
17 | Our microservice is a separate entity. It might be deployed as an isolated service on a platform as a service (PAAS), or it might be its own operating system process.
18 | 
19 | ---
20 | ## Three properties of an Autonomous microservice.
21 | - These services need to be able to change independently of each other, and be deployed by themselves without requiring consumers to change.
22 | - We need to think about what our services should expose, and what they should allow to be hidden.
23 | - If there is too much sharing, our consuming services become coupled to our internal representations. This decreases our autonomy, as it requires additional coordination with consumers when making changes.
24 | 
25 | ---
26 | ## Key Benefit: Technology Heterogeneity?
27 | With a system composed of multiple, collaborating services, we can decide to use different technologies inside each one. This allows us to pick the right tool for each job, rather than having to select a more standardized, one-size-fits-all approach that often ends up being the lowest common denominator.
28 | 
29 | ---
30 | ## How can we use the key benefit of  “technology heterogeneity” to try new technologies?
31 | With a system consisting of multiple services, I have multiple new places in which to try out a new piece of technology. I can pick a service that is perhaps lowest risk and use the technology there, knowing that I can limit any potential negative impact.
32 | 
33 | ---
34 | ## Key benefit: Resilience?
35 | If one component of a system fails, but that failure doesn’t cascade, you can isolate the problem and the rest of the system can carry on working. Service boundaries become your obvious bulkheads (walls). In a monolithic service, if the service fails, everything stops working.
36 | 
37 | ---
38 | ## When it comes to resilience what should be careful of? Any new source of failure?
39 | We do need to be careful, however. To ensure our microservice systems can properly embrace this improved resilience, we need to understand the new sources of failure that distributed systems have to deal with. Networks can and will fail, as will machines.
40 | 
41 | ---
42 | ## Key benefit: Scaling?
43 | With a large, monolithic service, we have to scale everything together. One small part of our overall system is constrained in performance, but if that behavior is locked up in a giant monolithic application, we have to handle scaling everything as a piece. With smaller services, we can just scale those services that need scaling, allowing us to run other parts of the system on smaller, less powerful hardware.
44 | 
45 | ---
46 | ## What is one of the issues of deploying small changes with monolithic systems? (Key benefit: Ease of deployment)
47 |  A one-line change to a million-line-long monolithic application requires the whole application to be deployed in order to release the change. That could be a large impact, high-risk deployment. In practice, large-impact, high-risk deployments end up happening infrequently due to understandable fear. Unfortunately, this means that our changes build up and build up between releases, until the new version of our application hitting production has masses of changes. And the bigger the delta between releases, the higher the risk that we’ll get something wrong!
48 | 
49 | ---
50 | ## Which advantages do we have when using mircroservices for deployment?
51 | With microservices, we can make a change to a single service and deploy it independently of the rest of the system. This allows us to get our code deployed faster. If a problem does occur, it can be isolated quickly to an individual service, making fast rollback easy to achieve. It also means we can get our new functionality out to customers faster.
52 | 
53 | ---
54 | ## How do microservices help? (Key benefit: Organizational Alignment)
55 | We know that smaller teams working on smaller codebases tend to be more productive
56 | 
57 | ---
58 | ## Which opportunities do we have when using microservices? (Key benefit: Composability)
59 | One of the key promises of distributed systems and service-oriented architectures is that we open up opportunities for reuse of functionality. With microservices, we allow for our functionality to be consumed in different ways for different purposes.
60 | 
61 | ---
62 | ## List of all microservices Key Benefits
63 | - Technology Heterogeneity
64 | - Resilience
65 | - Scaling
66 | - Ease of Deployment
67 | - Organizational Alignment
68 | - Composability
69 | - Optimizing for Replaceability
70 | 
71 | ---
72 | ## So is the microservice strategy a silver bullet?
73 | Of course it's not. If you’re coming from a monolithic system point of view, you’ll have to get much better at handling deployment, testing, and monitoring to unlock the benefits we’ve covered so far. You’ll also need to think differently about how you scale your systems and ensure that they are resilient. Don’t also be surprised if things like distributed transactions start giving you headaches, either!
74 | 
75 | ---
76 | 
77 | 


--------------------------------------------------------------------------------
/chapters/chapter-02.md:
--------------------------------------------------------------------------------
  1 | # Chapter 2: The Evolutionary Architect
  2 | 
  3 | ---
  4 | 
  5 | ## What architects should shitft their focus away from? What should they focus on instead?
  6 | 
  7 | Thus, our architects need to shift their thinking away from creating the perfect end product, and instead focus on helping create a **framework** in which the right systems can emerge, and continue to grow as we learn more.
  8 | 
  9 | ---
 10 | 
 11 | ## Why is "town planner" a more suitable role when refering to software architects?
 12 | 
 13 | Erik Doernenburg first shared with me the idea that we should think of our role more as town planners than architects for the built environment. The role of the town planner should be familiar to any of you who have played SimCity before.
 14 | The way he influences how the city evolves, though, isinteresting. He does not say, “build this specific building there”; instead, he zones a city.
 15 | 
 16 | ---
 17 | 
 18 | ## What a "town planner" should be more worried about? (from the software architecture perspective)
 19 | 
 20 | Rather than worrying too much about what happens in one zone (service), the town planner will instead spend far more time working out how people and utilities (data) move from one zone to another.
 21 | 
 22 | ---
 23 | 
 24 | ## In software, Why are "cities" better representations than buildings?
 25 | 
 26 | The comparison with software should be obvious. As our users use our software, we need to react and change. We cannot foresee everything that will happen, and so rather than plan for any eventuality, we should plan to allow for change by avoiding the urge to overspecify every last thing.
 27 | 
 28 | ---
 29 | 
 30 | ## Communication between services can get messy. Why this happens?
 31 | 
 32 | Between services is where things can get messy, however. If one service decides to expose REST over HTTP, another makes use of protocol buffers, and a third uses Java RMI, then integration can become a nightmare as consuming services have to understand and support multiple styles of interchange. This is why I try to stick to the guideline that we should “be worried about what happens between the boxes, and be liberal in what happens inside.”
 33 | 
 34 | ---
 35 | 
 36 | ## What should we define when making decisions related to microservice archictectures?
 37 | 
 38 | In order to frame our decision. We need to define:
 39 | 
 40 | -  Strategic Goals
 41 | -  Principles
 42 | -  Practices
 43 | 
 44 | ---
 45 | 
 46 | ## What are Strategic goals?
 47 | 
 48 | The role of the architect is already daunting enough, so luckily we usually don’t have to also define strategic goals! Strategic goals should speak to where your company is going, and how it sees itself as best making its customers happy. These will be highlevel goals, and may not include technology at all. They could be defined at a company level or a division level.
 49 | 
 50 | ---
 51 | 
 52 | ## What are Principles?
 53 | 
 54 | Principles are rules you have made in order to align what you are doing to some larger goal, and will sometimes change.
 55 | 
 56 | ---
 57 | 
 58 | ## What's the difference between Principles and Constraints?
 59 | 
 60 | A constraint is really something that is very hard (or virtually impossible) to change, whereas principles are things we decide to choose (and can change over time but not so frequently)
 61 | 
 62 | ---
 63 | 
 64 | ## What are Practices?
 65 | 
 66 | Our practices are how we ensure our principles are being carried out. They are a set of detailed, practical guidance for performing tasks. They will often be technologyspecific, and should be low level enough that any developer can understand them. Practices could include coding guidelines, the fact that all log data needs to be captured centrally, or that HTTP/REST is the standard integration style. Due to their technical nature, practices will often change more often than principles.
 67 | 
 68 | ---
 69 | 
 70 | ## Is OK to combine Principles and Practices?
 71 | 
 72 | One person’s principles are another’s practices. You might decide to call the use of HTTP/REST a principle rather than a practice, for example. And that would be fine. For a small enough group, perhaps a single team, combining principles and practices might be OK. However, for larger organizations, where the technology and working practices may differ, you may want a different set of practices in different places, as long as they both map to a common set of principles.
 73 | 
 74 | ---
 75 | 
 76 | ## Which "clear attribute" should we define for each service?
 77 | 
 78 | -  Monitoring
 79 | -  Interfaces
 80 | -  Architectural Safety
 81 | 
 82 | ---
 83 | 
 84 | ## When it comes to Monitoring, what is important to keep in mind?
 85 | 
 86 | Whatever you pick, try to keep it standardized. Make the technology inside the box opaque, and don’t require that your monitoring systems change in order to support it. Logging falls into the same category here: we need it in one place.
 87 | 
 88 | ---
 89 | 
 90 | ## When it comes to Interfaces, what is important to keep in mind?
 91 | 
 92 | Picking a small number of defined interface technologies helps integrate new consumers. Having one standard is a good number. Two isn’t too bad, either. Having 20 different styles of integration is bad. This isn’t just about picking the technology and the protocol. If you pick HTTP/REST, for example, will you use verbs or nouns? How will you handle pagination of resources? How will you handle versioning of end points?
 93 | 
 94 | ---
 95 | 
 96 | ## When it comes to Architectural Safety, what is important to keep in mind?
 97 | 
 98 | Playing by the rules is important when it comes to response codes, too. If your circuit breakers rely on HTTP codes, and one service decides to send back 2XX codes for errors, or confuses 4XX codes with 5XX codes, then these safety measures can fall apart.
 99 | 
100 | ---
101 | 
102 | ## Why "exemplars" are important to apply Principles and Practices?
103 | 
104 | But developers also like code, and code they can run and explore. If you have a set of standards or best practices you would like to encourage, then having exemplars that you can point people to is useful. The idea is that people can’t go far wrong just by imitating some of the better parts of your system. Ideally, these should be real-world services you have that get things right, rather than isolated services that are just implemented to be perfect examples.
105 | 
106 | ---
107 | 
108 | ## What's a Tailored Service Template? And why are they helpful?
109 | 
110 | It's a group of libraries/templates/frameworks we implement and encourage to use. In order to make sure that the principles and practices we defined are being met. By using a "Tailored Service Template", Developers can have most of the code in place, which will allow them to implement the core attributes that each service needs. This also ensures that teams can get going faster, and also that developers have to go out of their way to make their services badly behaved.
111 | 
112 | ---
113 | 
114 | ## When it comes to designing a service template, should it be the task of one developer/team?
115 | 
116 | Ideally, it shouldn't. You do have to be careful that creating the service template doesn’t become the job of a central tools or architecture team who dictates how things should be done, albeit via
117 | code. Defining the practices you use should be a collective activity, so ideally your team(s) should take joint responsibility for updating this template (an internal open source approach works well here).
118 | 
119 | ---
120 | 
121 | ## Should a service template be optional or mandatory?
122 | 
123 | Ideally, its use should be purely optional, but if you are going to be more forceful in its adoption you need to understand that ease of use for the developers has to be a prime guiding force.
124 | 
125 | ---
126 | 
127 | ## What does it mean "governance through code"? and which ways do we have to ensure it?
128 | 
129 | It's a way to facilitate the correct fulfillment of our principles and practices by providing tangible examples in code. This code can either come in the form of "exemplars" or "tailored service templates".
130 | 
131 | ---
132 | 
133 | ## What's the governance job?
134 | 
135 | If one of the architect’s jobs is ensuring there is a technical vision, then governance is about ensuring what we are building matches this vision, and evolving the vision if needed.
136 | 
137 | ---
138 | 
139 | ## How can we organize Governance?
140 | 
141 | Normally, governance is a group activity. It could be an informal chat with a small enough team, or a more structured regular meeting with formal group membership for a larger scope. This is where I think the principles we covered earlier should be discussed and changed as required. This group needs to be led by a technologist, and to consist predominantly of people who are executing the work being governed. This group should also be responsible for tracking and managing technical risks.
142 | 


--------------------------------------------------------------------------------
/chapters/chapter-03.md:
--------------------------------------------------------------------------------
  1 | # Chapter 3: How to Model Services
  2 | 
  3 | ---
  4 | 
  5 | ## What is Loose Coupling?
  6 | 
  7 | When services are loosely coupled, a change to one service should not require a change to another. The whole point of a microservice is being able to make a change to one service and deploy it, without needing to change any other part of the system. This is really quite important.
  8 | 
  9 | ---
 10 | 
 11 | ## What sort of things cause tight coupling?
 12 | 
 13 | A classic mistake is to pick an integration style that tightly binds one service to another, causing changes inside the service to require a change to consumers.
 14 | 
 15 | ---
 16 | 
 17 | ## What is desireble about the number of calls between microservices?
 18 | 
 19 | This also means we probably want to limit the number of different types
 20 | of calls from one service to another, because beyond the potential performance prob‐
 21 | lem, chatty communication can lead to tight coupling.
 22 | 
 23 | ---
 24 | 
 25 | ## (High Cohesion) We want related behavior to sit together, and unrelated behavior to sit elsewhere. Why?
 26 | 
 27 | Well, if we want to change behavior, we want to be able to change it in one place, and release that change as soon as possible. If we have to change that behavior in lots of different places, we’ll have to release lots of different services (perhaps at the same time) to deliver that change. Making changes in lots of different places is slower, and deploying lots of services at once is risky—both of which we want to avoid.
 28 | 
 29 | ---
 30 | 
 31 | ## In the MusicCorp example. Which is the domain? And Which are the bounded contexts?
 32 | 
 33 | Let’s return for a moment to the MusicCorp business. Our domain is the whole busiess in which we are operating. It covers everything from the warehouse to the reception desk, from finance to ordering. We may or may not model all of that in our
 34 | software, but that is nonetheless the domain in which we are operating. Let’s think
 35 | about parts of that domain that look like the bounded contexts that Evans refers to.
 36 | At MusicCorp, our warehouse is a hive of activity—managing orders being shipped
 37 | out (and the odd return), taking delivery of new stock, having forklift truck races, and
 38 | so on. Elsewhere, the finance department is perhaps less fun-loving, but still has a
 39 | very important function inside our organization. These employees manage payroll,
 40 | keep the company accounts, and produce important reports. Lots of reports. They
 41 | probably also have interesting desk toys.
 42 | 
 43 | ---
 44 | 
 45 | ## Why are shared models important? (Example: Warehouse and Finance departments)
 46 | 
 47 | To be able to work out the valuation of the company, though, the finance employees
 48 | need information about the stock we hold. The stock item then becomes a shared
 49 | model between the two contexts. However, note that we don’t need to blindly expose
 50 | everything about the stock item from the warehouse context. For example, although
 51 | internally we keep a record on a stock item as to where it should live within the warehouse, that doesn’t need to be exposed in the shared model. So there is the internalonly representation, and the external representation we expose.
 52 | 
 53 | ---
 54 | 
 55 | ## We need to clearly think what models should be shared. Why?
 56 | 
 57 | By thinking clearly about what models should be shared, and not sharing our internal
 58 | representations, we avoid one of the potential pitfalls that can result in tight coupling
 59 | (the opposite of what we want). We have also identified a boundary within our
 60 | domain where all like-minded business capabilities should live, giving us the high
 61 | cohesion we want.
 62 | 
 63 | ---
 64 | 
 65 | ## Why is not a good idea to decompose your system into microservices from the beginning?
 66 | 
 67 | Prematurely decomposing a system into microservices can be costly, especially if you are new to the domain. In many ways, having an existing codebase you want to decompose into microservices is much easier than trying to go to microservices from the beginning.
 68 | 
 69 | ---
 70 | 
 71 | ## What questions do we have to make when deciding which models we should share?
 72 | 
 73 | So ask first “What does this context do?”, and then “So what data
 74 | does it need to do that?” When modeled as services, these capabilities become the key operations that will be exposed over the wire to other collaborators.
 75 | 
 76 | ---
 77 | 
 78 | ## What do we have to do when deciding the boundaries of our microservices?
 79 | 
 80 | When considering the boundaries of your microservices, first think in terms of the larger, coarser-grained contexts, and then subdivide along these nested contexts when you’re looking for the benefits of splitting out these seams.
 81 | 
 82 | ---
 83 | 
 84 | ## Nested approach or full separation approach? What should we consider to decide which one of those two?
 85 | 
 86 | In general, there isn’t a hard-and-fast rule as to what approach makes the most sense.
 87 | However, whether you choose the nested approach over the full separation approach
 88 | should be based on your organizational structure. If order fulfillment, inventory
 89 | management, and goods receiving are managed by different teams, they probably
 90 | deserve their status as top-level microservices. If, on the other hand, all of them are
 91 | managed by one team, then the nested model makes more sense.
 92 | 
 93 | ---
 94 | 
 95 | ## When modeling microservices, what is important to keep in mind when it comes to communication and how it relates to the business concepts?
 96 | 
 97 | The same terms and ideas that are shared between parts of your organization should be reflected in your interfaces. It can be useful to think of forms being sent between these microservices, much as forms are sent around an organization.
 98 | 
 99 | ---
100 | 
101 | ## Is it correct to model service boundaries along technical seams?
102 | 
103 | Making decisions to model service boundaries along technical seams isn’t always
104 | wrong. I have certainly seen this make lots of sense when an organization is looking
105 | to achieve certain performance objectives, for example. However, it should be your
106 | secondary driver for finding these seams, not your primary one.
107 | 


--------------------------------------------------------------------------------
/chapters/chapter-04.md:
--------------------------------------------------------------------------------
  1 | # Chapter 4: Integration
  2 | 
  3 | ---
  4 | 
  5 | ## How should we keep the APIs used for communication between microservices?
  6 | 
  7 | It is also why I think it is very important to ensure that you keep the APIs used
  8 | for communication between microservices technology-agnostic. This means avoiding
  9 | integration technology that dictates what technology stacks we can use to implement
 10 | our microservices.
 11 | 
 12 | ---
 13 | 
 14 | ## What are the problems of "Database Integration" when using microservices?
 15 | 
 16 | Remember when we talked about the core principles behind good microservices?
 17 | Strong cohesion and loose coupling—with database integration, we lose both things.
 18 | Database integration makes it easy for services to share data, but does nothing about
 19 | sharing behavior. Our internal representation is exposed over the wire to our consumers, and it can be very difficult to avoid making breaking changes, which inevitably
 20 | leads to a fear of any change at all. Avoid at (nearly) all costs.
 21 | 
 22 | ---
 23 | 
 24 | ## How can we classify the types of communication we use with microservices?
 25 | 
 26 | -  Synchronous communication
 27 | -  Asynchronous communication
 28 |    -  Including: event-based collaboration
 29 | 
 30 | ---
 31 | 
 32 | ## Explain Orchestration and Choreography of microservices
 33 | 
 34 | With orchestration, we rely on a central brain to guide and drive the
 35 | process, much like the conductor in an orchestra. With choreography, we inform
 36 | each part of the system of its job, and let it work out the details, like dancers all finding their way and reacting to others around them in a ballet.
 37 | 
 38 | ---
 39 | 
 40 | ## Which is the downside of the orchestration approach?
 41 | 
 42 | The downside to this orchestration approach is that the customer service can become
 43 | too much of a central governing authority. It can become the hub in the middle of a
 44 | web, and a central point where logic starts to live. I have seen this approach result in a
 45 | small number of smart “god” services telling anemic CRUD-based services what to
 46 | do.
 47 | 
 48 | ---
 49 | 
 50 | ## How can we apply the choreography approach?
 51 | 
 52 | With a choreographed approach, we could instead just have the customer service
 53 | emit an event in an asynchronous manner, saying Customer created. The email service, postal service, and loyalty points bank then just subscribe to these events and
 54 | react accordingly. This approach is significantly more decoupled. If
 55 | some other service needed to reach to the creation of a customer, it just needs to subscribe to the events and do its job when needed.
 56 | 
 57 | ---
 58 | 
 59 | ## Which is the downside of the choreography approach?
 60 | 
 61 | The downside is that the explicit
 62 | view of the business process we see before is now only implicitly reflected in
 63 | our system. This means additional work is needed to ensure that you can monitor and track that
 64 | the right things have happened
 65 | 
 66 | ---
 67 | 
 68 | ## How can we tackle the downsides of the choreography approach?
 69 | 
 70 | One approach I like for dealing with this is to build a monitoring system that explicitly matches the view of the business process, but then tracks what each of the services does as independent entities, letting you see odd exceptions mapped onto the more
 71 | explicit process flow
 72 | 
 73 | ---
 74 | 
 75 | ## When using microservices, should we trust networks?
 76 | 
 77 | You need to think about the network itself. Famously, the first of the fallacies of distributed computing is “The network is reliable”. Networks aren’t reliable. They can
 78 | and will fail, even if your client and the server you are speaking to are fine. They can
 79 | fail fast, they can fail slow, and they can even malform your packets. You should
 80 | assume that your networks are plagued with malevolent entities ready to unleash
 81 | their ire on a whim.
 82 | 
 83 | ---
 84 | 
 85 | ## Which is the key challeng when using RPC mechanisms?
 86 | 
 87 | This is a key challenge with any RPC mechanism that promotes the
 88 | use of binary stub generation: you don’t get to separate client and server deployments.
 89 | If you use this technology, lock-step releases may be in your future. (This means: Tight-coupling)
 90 | 
 91 | ---
 92 | 
 93 | ## How should we design our remote calls when applying RPC?
 94 | 
 95 | Don’t abstract your remote calls to the point where the network is
 96 | completely hidden, and ensure that you can evolve the server interface without having to insist on lock-step upgrades for clients (tight-coupling). Finding the right balance for your client code is important, for example. Make sure your clients aren’t oblivious to the fact
 97 | that a network call is going to be made
 98 | 
 99 | ---
100 | 
101 | ## In REST, what does a "Resource" mean?
102 | 
103 | Most important is the concept of resources. You can think of a resource as a thing
104 | that the service itself knows about, like a `Customer`. The server creates different representations of this `Customer` on request. How a resource is shown externally is completely decoupled from how it is stored internally. A client might ask for a JSON
105 | representation of a `Customer`, for example, even if it is stored in a completely different
106 | format. Once a client has a representation of this `Customer`, it can then make requests
107 | to change it, and the server may or may not comply with them.
108 | 
109 | ---
110 | 
111 | ## What's the idea behind HATEOAS?
112 | 
113 | The idea behind HATEOAS is that clients should perform interactions (potentially leading to state transitions) with the server via these links (hypermedia controls) to other resources. It doesn’t need to know where exactly customers live on the server by knowing which URI to hit; instead, the client looks for and navigates links to find what it needs.
114 | 
115 | ## Explain this HEATEOAS example...
116 | 
117 | ```
118 | <album>
119 |    <name>Give Blood</name>
120 |    <link rel="/artist" href="/artist/theBrakes" />
121 |    <description>
122 |       Awesome, short, brutish, funny and loud. Must buy!
123 |    </description>
124 |    <link rel="/instantpurchase" href="/instantPurchase/1234" />
125 | </album>
126 | ```
127 | 
128 | -  This hypermedia control shows us where to find information about the artist.
129 | -  And if we want to purchase the album, we now know where to go.
130 | 
131 | ---
132 | 
133 | ## What's one of the benefits of using HEATEOAS?
134 | 
135 | Using these controls to decouple the client and server yields significant benefits over
136 | time that greatly offset the small increase in the time it takes to get these protocols up
137 | and running. By following the links, the client gets to progressively discover the API,
138 | which can be a really handy capability when we are implementing new clients.
139 | 
140 | ---
141 | 
142 | ## What's one of the downsides of using HEATEOAS?
143 | 
144 | One of the downsides is that this navigation of controls can be quite chatty, as the
145 | client needs to follow links to find the operation it wants to perform.
146 | 
147 | ---
148 | 
149 | ## What pattern can we use when deciding how to store our data?
150 | 
151 | There is a more general problem at play here. How we decide to store our data, and
152 | how we expose it to our consumers, can easily dominate our thinking. One pattern I
153 | saw used effectively by one of our teams was to delay the implementation of proper
154 | persistence for the microservice, until the interface had stabilized enough.
155 | 
156 | ---
157 | 
158 | ## Book for learning more about REST
159 | 
160 | Despite these disadvantages, REST over HTTP is a sensible default choice for serviceto-service interactions. If you want to know more, I recommend REST in Practice
161 | (O’Reilly), which covers the topic of REST over HTTP in depth.
162 | 
163 | ---
164 | 
165 | ## How should we keep our middlewares? (message brokers, queues)
166 | 
167 | However, vendors tend to want to package lots of software with them, which can lead
168 | to more and more smarts being pushed into the middleware, as evidenced by things
169 | like the Enterprise Service Bus. Make sure you know what you’re getting: keep your
170 | middleware dumb, and keep the smarts in the endpoints.
171 | 
172 | ---
173 | 
174 | ## Which strategy did Sam Newman use to view bad messages in the pricing system he worked on?
175 | 
176 | Aside from the bug itself, we’d failed to specify a maximum retry limit for the job on
177 | the queue. We fixed the bug itself, and also configured a maximum retry. But we also
178 | realized we needed a way to view, and potentially replay, these bad messages. We
179 | ended up having to implement a message hospital (or dead letter queue), where messages got sent if they failed. We also created a UI to view those messages and retry
180 | them if needed. These sorts of problems aren’t immediately obvious if you are only
181 | familiar with synchronous point-to-point communication.
182 | 
183 | ---
184 | 
185 | ## What should we ensure to have if we end up adopting event-driven architectures?
186 | 
187 | The associated complexity with event-driven architectures and asynchronous programming in general leads me to believe that you should be cautious in how eagerly
188 | you start adopting these ideas. Ensure you have good monitoring in place, and
189 | strongly consider the use of correlation IDs, which allow you to trace requests across
190 | process boundaries
191 | 
192 | ---
193 | 
194 | ## What does happen when you introduce shared code outside your service boundary and how does RealEstate.com.au deal with it?
195 | 
196 | If your use of shared code ever leaks outside your service boundary, you have introduced a potential form of coupling. Using common code like logging libraries is fine,
197 | as they are internal concepts that are invisible to the outside world. RealEstate.com.au
198 | makes use of a tailored service template to help bootstrap new service creation.
199 | Rather than make this code shared, the company copies it for every new service to
200 | ensure that coupling doesn’t leak in.
201 | 
202 | ---
203 | 
204 | ## What's the general rule of thumb when sharing code in microservices? (DRY)
205 | 
206 | My general rule of thumb: don’t violate DRY within a microservice, but be relaxed
207 | about violating DRY across all services
208 | 
209 | ---
210 | 
211 | ## What's the problem of creeping logic into a client library?
212 | 
213 | The more logic that creeps into the client library, the more
214 | cohesion starts to break down, and you find yourself having to change multiple clients to roll out fixes to your server.
215 | 
216 | ---
217 | 
218 | ## What's the model of AWS when using client libraries?
219 | 
220 | A model for client libraries I like is the one for Amazon Web Services (AWS). The
221 | underlying SOAP or REST web service calls can be made directly, but everyone ends
222 | up using just one of the various software development kits (SDKs) that exist, which
223 | provide abstractions over the underlying API. These SDKs, though, are written by the
224 | community or AWS people other than those who work on the API itself. This degree
225 | of separation seems to work, and avoids some of the pitfalls of client libraries. Part of
226 | the reason this works so well is that the client is in charge of when the upgrade happens. If you go down the path of client libraries yourself, make sure this is the case.
227 | 
228 | ---
229 | 
230 | ## What could happen after we retrieve a resource from a specific service? (e.g Customer resource)
231 | 
232 | When we retrieve a given Customer resource from the customer service, we get to see
233 | what that resource looked like when we made the request. It is possible that after we
234 | requested that Customer resource, something else has changed it. What we have in
235 | effect is a memory of what the Customer resource once looked like. The longer we
236 | hold on to this memory, the higher the chance that this memory will be false.
237 | 
238 | ---
239 | 
240 | ## In event-based collaboration. What could be valuable to have if we want to know what happend with a specific resource? (e.g Customer resource)
241 | 
242 | With events, we’re saying this happened, but we need to know what happened. If we’re
243 | receiving updates due to a Customer resource changing, for example, it could be valuable to us to know what the Customer looked like when the event occurred. As long
244 | as we also get a reference to the entity itself so we can look up its current state, then
245 | we can get the best of both worlds.
246 | 
247 | ---
248 | 
249 | ## Which mechanisms can we used to reduce load in our services when we need to retrieve some resource information?
250 | 
251 | If we provide additional information when the resource is retrieved, letting us know at what
252 | time the resource was in the given state and perhaps how long we can consider this
253 | information to be fresh, then we can do a lot with caching to reduce load.
254 | 
255 | ---
256 | 
257 | ## What does the "Postel's law consist in?
258 | 
259 | The example of a client trying to be as flexible as possible in consuming a service
260 | demonstrates Postel’s Law (otherwise known as the robustness principle), which states:
261 | “Be conservative in what you do, be liberal in what you accept from others.”
262 | 
263 | ---
264 | 
265 | ## Explain semantic versioning. What is about?
266 | 
267 | With semantic versioning, each version number is in the form
268 | MAJOR.MINOR.PATCH. When the MAJOR number increments, it means that backward
269 | incompatible changes have been made. When MINOR increments, new functionality
270 | has been added that should be backward compatible. Finally, a change to PATCH states
271 | that bug fixes have been made to existing functionality
272 | 
273 | ---
274 | 
275 | ## Explain a simple use case when working with semantic versioning
276 | 
277 | Our helpdesk application is built to work against version 1.2.0 of the customer service. If a
278 | new feature is added, causing the customer service to change to 1.3.0, our helpdesk
279 | application should see no change in behavior and shouldn’t be expected to make any
280 | changes. We couldn’t guarantee that we could work against version 1.1.0 of the customer service, though, as we may rely on functionality added in the 1.2.0 release. We
281 | could also expect to have to make changes to our application if a new 2.0.0 release of
282 | the customer service comes out.
283 | 
284 | ---
285 | 
286 | ## When introducing a breaking interface change, how can we use our endpoints to handle this issue?
287 | 
288 | One approach I have used successfully to handle this is to coexist both the old and new interfaces in the same running service. So if we want to release a breaking change, we deploy a new version of the service that exposes both the old and new versions of the endpoint.
289 | 
290 | ---
291 | 
292 | ## When coexisting different endpoint versions, things can get messy, with a lot of duplicated code, additional tests, etc. How can we work around this?
293 | 
294 | To make this more manageable, we internally transformed all requests to the V1 endpoint to a V2 request, and then V2 requests to the V3 endpoint. This meant we could clearly delineate what code was going to be retired when the old endpoint(s) died.
295 | 
296 | ---
297 | 
298 | ## When introducing a breaking interface change, how can we use our services to handle this issue?
299 | 
300 | Another versioning solution often cited is to have different versions of the service live
301 | at once, and for older consumers to route their traffic to the older version, with newer
302 | versions seeing the new one.
303 | 
304 | ---
305 | 
306 | ## What problems could we have when using multiple concurrent service versions?
307 | 
308 | if I need to fix an internal bug in my service, I now have to fix and deploy two different sets of services. This would probably mean I have to branch the codebase for my
309 | service, and this is always problematic. Second, it means I need smarts to handle
310 | directing consumers to the right microservice.
311 | 
312 | ---
313 | 
314 | ## Which could be a good rule when deciding which approach to take when dealing with breaking interface changes?
315 | 
316 | The longer it takes for you to get consumers upgraded to the newer version and released,
317 | the more you should look to coexist different endpoints in the same microservice
318 | rather than coexist entirely different versions.
319 | 
320 | ---
321 | 
322 | ## When it comes to our user interfaces. How should we adapt our core services?
323 | 
324 | So, although our core services —our core offering— might be the same, we need a way
325 | to adapt them for the different constraints that exist for each type of interface.
326 | 
327 | ---
328 | 
329 | ## Explain UI Fragment Composition
330 | 
331 | Rather than having our UI make API calls and map everything back to UI controls,
332 | we could have our services provide parts of the UI directly, and then just pull these
333 | fragments in to create a UI. Imagine, for example, that the recommendation service provides a recommendation widget that is combined with other
334 | controls or UI fragments to create an overall UI. It might get rendered as a box on a
335 | web page along with other content.
336 | 
337 | ---
338 | 
339 | ## What's an API gateway?
340 | 
341 | A common solution to the problem of chatty interfaces with backend services, or the
342 | need to vary content for different types of devices, is to have a server-side aggregation
343 | endpoint, or API gateway. This can marshal multiple backend calls, vary and aggregate content if needed for different devices, and serve it up.
344 | 
345 | ---
346 | 
347 | ## What problems could we have when using unique API gateways and which pattern can we use to resolve them?
348 | 
349 | The problem that can occur is that normally we’ll have one giant layer for all our
350 | services. This leads to everything being thrown in together, and suddenly we start to
351 | lose isolation of our various user interfaces, limiting our ability to release them independently. A model I prefer and that I’ve seen work well is to restrict the use of these
352 | backends to one specific user interface or application, as we see in Figure 4-10.
353 | 
354 | ![image](https://user-images.githubusercontent.com/1868409/85929062-b11c4a00-b87f-11ea-9d0e-9019a71740da.png)
355 | 
356 | ---
357 | 
358 | ## Which dangers do we have when using BFF's?
359 | 
360 | The danger with this approach is the same as with any aggregating layer; it can take
361 | on logic it shouldn’t. The business logic for the various capabilities these backends use
362 | should stay in the services themselves. These BFFs should only contain behavior specific to delivering a particular user experience.
363 | 
364 | ---
365 | 
366 | ## When deciding the inclusion of new software... “Should I build, or should I buy?”
367 | 
368 | My clients often struggle with the question “Should I build, or should I buy?” In general, the advice I and my colleagues give when having this conversation with the average enterprise organization boils down to “Build if it is unique to what you do, and
369 | can be considered a strategic asset; buy if your use of the tool isn’t that special.”
370 | 
371 | ---
372 | 
373 | ## How can we work around CMS's in our microservice systems?
374 | 
375 | The answer? Front the CMS with your own service that provides the website to the
376 | outside world, as shown in Figure 4-11. Treat the CMS as a service whose role is to
377 | allow for the creation and retrieval of content. In your own service, you write the
378 | code and integrate with services how you want. You have control over scaling the
379 | website (many commercial CMSes provide their own proprietary add-ons to handle
380 | load), and you can pick the templating system that makes sense.
381 | 
382 | ![image](https://user-images.githubusercontent.com/1868409/85929253-5388fd00-b881-11ea-99c7-5614f6c4e879.png)
383 | 
384 | ---
385 | 
386 | ## How can we work around CRM's in our microservice systems?
387 | 
388 | The first thing we did was identify the core concepts to our domain that the CRM
389 | system currently owned. One of these was the concept of a project — that is, something
390 | to which a member of staff could be assigned. Multiple other systems needed project
391 | information. What we did was instead create a project service. This service exposed
392 | projects as RESTful resources, and the external systems could move their integration
393 | points over to the new, easier-to-work-with service. Internally, the project service was
394 | just a façade, hiding the detail of the underlying integration (the CRM itself). You can see this in
395 | Figure 4-12.
396 | 
397 | ![image](https://user-images.githubusercontent.com/1868409/85929360-1bce8500-b882-11ea-966d-c705e266fbf1.png)
398 | 
399 | ---
400 | 
401 | ## What is the Strangler Application Pattern?
402 | 
403 | A useful pattern here is the Strangler Application Pattern.
404 | Much like with our example of fronting the CMS system with our own code, with a
405 | strangler you capture and intercept calls to the old system. This allows you to decide
406 | if you route these calls to existing, legacy code, or direct them to new code you may have written. This allows you to replace functionality over time without requiring a
407 | big bang rewrite.
408 | 
409 | ---
410 | 


--------------------------------------------------------------------------------
/chapters/chapter-05.md:
--------------------------------------------------------------------------------
  1 | # Chapter 5: Splitting the Monolith
  2 | 
  3 | ---
  4 | 
  5 | ## What are seams and why are they important to identify?
  6 | 
  7 | In his book Working Eectively with Legacy Code (Prentice-Hall), Michael Feathers
  8 | defines the concept of a seam — that is, a portion of the code that can be treated in
  9 | isolation and worked on without impacting the rest of the codebase. We also want to
 10 | identify seams. But rather than finding them for the purpose of cleaning up our codebase, we want to identify seams that can become service boundaries.
 11 | 
 12 | ---
 13 | 
 14 | ## Which are the two main steps that we need to take when breaking apart a Monolith?
 15 | 
 16 | Imagine we have a large backend monolithic service that represents a substantial
 17 | amount of the behavior of MusicCorp’s online systems. To start, we should identify
 18 | the high-level bounded contexts that we think exist in our organization, as we discussed in Chapter 3. Then we want to try to understand what bounded contexts the
 19 | monolith maps to.
 20 | 
 21 | ---
 22 | 
 23 | ## What tool can we use to identify some database-level constraints?
 24 | 
 25 | This doesn’t give us the whole story, however. For example, we may be able to tell that
 26 | the finance code uses the ledger table, and that the catalog code uses the line item
 27 | table, but it might not be clear that the database enforces a foreign key relationship
 28 | from the ledger table to the line item table. To see these database-level constraints,
 29 | which may be a stumbling block, we need to use another tool to visualize the data. A
 30 | great place to start is to use a tool like the freely available SchemaSpy, which can generate graphical representations of the relationships between tables.
 31 | 
 32 | ---
 33 | 
 34 | ## We have this situation: The finance context wants to get some data from the "Line items table" which belongs to a different bounded contex The monolith can get access to that table directly but how can we work around this issue using microservices?
 35 | 
 36 | ![image](https://user-images.githubusercontent.com/1868409/86134257-64bb4f00-bab7-11ea-983a-9565f67f51d1.png)
 37 | 
 38 | So how do we fix things here? Well, we need to make a change in two places. First, we
 39 | need to stop the finance code from reaching into the line item table, as this table
 40 | really belongs to the catalog code, and we don’t want database integration happening
 41 | once catalog and finance are services in their own rights. The quickest way to address
 42 | this is rather than having the code in finance reach into the line item table, we’ll
 43 | expose the data via an API call in the catalog package that the finance code can call.
 44 | This API call will be the forerunner of a call we will make over the wire, as we see in
 45 | Figure 5-3.
 46 | 
 47 | ![image](https://user-images.githubusercontent.com/1868409/86134441-9fbd8280-bab7-11ea-8a0f-5d49571d913a.png)
 48 | 
 49 | ---
 50 | 
 51 | ## How can we share static data (e.g country codes) using microservices? (Hint: There are three ways)
 52 | 
 53 | -  Well, we have a few options. One is to duplicate this table for each of our packages, with the long-term view that it will be duplicated within each service also. (Downside: updating multiple tables when new static data is avalable)
 54 | -  A second option is to instead treat this shared, static data as code. Perhaps it could be in a property file deployed as part of the service, or perhaps just as an enumeration. (Downside: same as before, but at least updating code has shown to be easier than updating a DB table/collection)
 55 | -  A third option, which may well be extreme, is to push this static data into a service of its own right. (Downside: Overkill solution when the amount of static data is not that big and not that complex)
 56 | 
 57 | ---
 58 | 
 59 | ## We have the following situation: the finance and the warehouse code are writing to, and probably occasionally reading from, the same table. How can we tease this apart?
 60 | 
 61 | ![image](https://user-images.githubusercontent.com/1868409/86135819-4d7d6100-bab9-11ea-831b-96113067b387.png)
 62 | 
 63 | We need to make the current abstract concept of the customer concrete. As a transient step, we create a new package called Customer. We can then use an API to expose
 64 | Customer code to other packages, such as finance or warehouse. Rolling this all the way forward, we may now end up with a distinct customer service (Figure 5-6).
 65 | 
 66 | ![image](https://user-images.githubusercontent.com/1868409/86136008-8a495800-bab9-11ea-9ec7-f3ba7125c2c0.png)
 67 | 
 68 | ---
 69 | 
 70 | ## We have the following situation: Our catalog needs to store the name and price of the records we sell, and the warehouse needs to keep an electronic record of inventory. We decide to keep these two things in the same place in a generic line item table. How can we break this down?
 71 | 
 72 | ![image](https://user-images.githubusercontent.com/1868409/86195684-7df7e600-bb1f-11ea-87b5-8847e4ac01f5.png)
 73 | 
 74 | The answer here is to split the table in two as we have in Figure 5-8, perhaps creating
 75 | a stock list table for the warehouse, and a catalog entry table for the catalog details.
 76 | 
 77 | ![image](https://user-images.githubusercontent.com/1868409/86195790-ac75c100-bb1f-11ea-92ef-2b52e7d8593a.png)
 78 | 
 79 | ---
 80 | 
 81 | ## What's the best way to split our schemas so that we can avoid doing a "big-bang release"?
 82 | 
 83 | What next? Do you do a big-bang release, going from one monolithic service with a single schema to two services, each with its own schema? I would actually
 84 | recommend that you split out the schema but keep the service together before splitting the application code out into separate microservices, as shown in Figure 5-9.
 85 | 
 86 | ![image](https://user-images.githubusercontent.com/1868409/86196115-856bbf00-bb20-11ea-939a-aa8a6b935646.png)
 87 | 
 88 | ---
 89 | 
 90 | ## Which benefits do we get by spliting our schemas first (staging the break)?
 91 | 
 92 | By splitting the schemas out but keeping the application code together, we give ourselves the ability to revert our changes or continue to
 93 | tweak things without impacting any consumers of our service. Once we are satisfied
 94 | that the DB separation makes sense, we can then think about splitting out the application code into two services.
 95 | 
 96 | ---
 97 | 
 98 | ## What's eventual consistency?
 99 | 
100 | In many ways, this is another form of what is called eventual consistency. Rather than
101 | using a transactional boundary to ensure that the system is in a consistent state when
102 | the transaction completes, instead we accept that the system will get itself into a consistent state at some point in the future. This approach is especially useful with business operations that might be long-lived.
103 | 
104 | ---
105 | 
106 | ## What's the "two-phase commit" algorithmn about?
107 | 
108 | The most common algorithm for handling distributed transactions — especially shortlived transactions, as in the case of handling our customer order — is to use a two phase commit. With a two-phase commit, first comes the voting phase. This is where
109 | each participant (also called a cohort in this context) in the distributed transaction
110 | tells the transaction manager whether it thinks its local transaction can go ahead. If
111 | the transaction manager gets a yes vote from all participants, then it tells them all to
112 | go ahead and perform their commits. A single no vote is enough for the transaction
113 | manager to send out a rollback to all parties.
114 | 
115 | ---
116 | 
117 | ## What's one the downsides of using the "two-phase commit" algorithmn?
118 | 
119 | This approach relies on all parties halting until the central coordinating process tells
120 | them to proceed. This means we are vulnerable to outages. If the transaction manager
121 | goes down, the pending transactions never complete. If a cohort fails to respond during voting, everything blocks. And there is also the case of what happens if a commit
122 | fails after voting.
123 | 
124 | ---
125 | 
126 | ## Should we implement our own algorithmn for distributed transactions?
127 | 
128 | I’d suggest you avoid trying to create your own.
129 | Instead, do lots of research on this topic if this seems like the route you want to take,
130 | and see if you can use an existing implementation.
131 | 
132 | ---
133 | 
134 | ## How do we do when we encounter a state that we really want to be kept consistent?
135 | 
136 | If you do encounter state that really, really wants to be kept consistent, do everything
137 | you can to avoid splitting it up in the first place. Try really hard. If you really need to
138 | go ahead with the split, think about moving from a purely technical view of the process (e.g, a database transaction) and actually create a concrete concept to represent
139 | the transaction itself.
140 | 
141 | ---
142 | 
143 | ## (Reporting) What is the "Data Retrieval via Service Calls" strategy about?
144 | 
145 | There are many variants of this model, but they all rely on pulling the required data
146 | from the source systems via API calls. For a very simple reporting system, like a dashboard that might just want to show the number of orders placed in the last 15
147 | minutes, this might be fine. To report across data from two or more systems, you
148 | need to make multiple calls to assemble this data.
149 | 
150 | ---
151 | 
152 | ## When processing big volumes of data. Which approach can we use to avoid returning a huge data response?
153 | 
154 | For example, the customer service might expose something like a
155 | `BatchCustomerExport` resource endpoint. The calling system would POST a
156 | `BatchRequest`, perhaps passing in a location where a file can be placed with all the
157 | data. The customer service would return an HTTP 202 response code, indicating that
158 | the request was accepted but has not yet been processed. The calling system could
159 | then poll the resource waiting until it retrieves a 201 Created status, indicating that
160 | the request has been fulfilled, and then the calling system could go and fetch the data.
161 | This would allow potentially large data files to be exported without the overhead of
162 | being sent over HTTP; instead, the system could simply save a CSV file to a shared
163 | location.
164 | 
165 | ---
166 | 
167 | ## (Reporting) What is the "Data Pump" strategy about?
168 | 
169 | An alternative option is to have a standalone program that directly accesses
170 | the database of the service that is the source of data, and pumps it into a reporting
171 | database, as shown in Figure 5-13.
172 | 
173 | ![image](https://user-images.githubusercontent.com/1868409/86518917-58aaf680-be03-11ea-8a98-9cade67ad130.png)
174 | 
175 | ---
176 | 
177 | ## How can we mitigate the tight-coupling and low-cohesion downsides that comes with using the "Data pump" strategy?
178 | 
179 | To start with, the data pump should be built and managed by the same team that
180 | manages the service. We try to reduce the problems with coupling to the service’s
181 | schema by having the same team that manages the service also manage the pump. I
182 | would suggest, in fact, that you version-control these together, and have builds of the
183 | data pump created as an additional artifact as part of the build of the service itself,
184 | with the assumption that whenever you deploy one of them, you deploy them both.
185 | 
186 | ---
187 | 
188 | ## (Reporting) What is the "Event Data Pump" strategy about?
189 | 
190 | In Chapter 4, we touched on the idea of microservices emitting events based on the
191 | state change of entities that they manage. For example, our customer service may
192 | emit an event when a given customer is created, or updated, or deleted. For those
193 | microservices that expose such event feeds, we have the option of writing our own
194 | event subscriber that pumps data into the reporting database, as shown in
195 | Figure 5-15.
196 | 
197 | ![image](https://user-images.githubusercontent.com/1868409/86519011-79278080-be04-11ea-8ac8-00d32ed53347.png)
198 | 
199 | ---
200 | 
201 | ## What's the main downside of using the "Event Data Pump" strategy?
202 | 
203 | The main downsides to this approach are that all the required information must be
204 | broadcast as events, and it may not scale as well as a data pump for larger volumes of
205 | data that has the benefit of operating directly at the database level. Nonetheless, the
206 | looser coupling and fresher data available via such an approach makes it strongly
207 | worth considering if you are already exposing the appropriate events.
208 | 
209 | ---
210 | 
211 | ## What solution does Netflix implement using "Backup Data Pumps"?
212 | 
213 | In the end, Netflix ended up implementing a pipeline capable of processing large amounts of data using this approach, which it then open sourced as the
214 | **Aegisthus project**.
215 | 
216 | ---
217 | 
218 | ## Which technique can we use to mitigate the cost of change asossiated with spliting a monolith?
219 | 
220 | A great technique here is to adapt an approach more typically taught for the design of
221 | object-oriented systems: class-responsibility-collaboration (CRC) cards. With CRC
222 | cards, you write on one index card the name of the class, what its responsibilities are,
223 | and who it collaborates with. When working through a proposed design, for each service I list its responsibilities in terms of the capabilities it provides, with the collaboators specified in the diagram. As you work through more use cases, you start to get
224 | a sense as to whether all of this hangs together properly.
225 | 
226 | ---
227 | 
228 | ## Part of the problem is knowing where to start, and I’m hoping this chapter has helped. But another challenge is the cost associated with splitting out services. Finding somewhere to run the service, spinning up a new service stack, and so on, are non‐trivial tasks. So how do we address this?
229 | 
230 | Well, if doing something is right but difficult,
231 | we should strive to make things easier. Investment in libraries and lightweight service
232 | frameworks can reduce the cost associated with creating the new service. Giving people access to self-service provision virtual machines or even making a platform as a
233 | service (PaaS) available will make it easier to provision systems and test them.
234 | 


--------------------------------------------------------------------------------
/chapters/chapter-06.md:
--------------------------------------------------------------------------------
 1 | # Chapter 6: Deployment
 2 | 
 3 | ---
 4 | 
 5 | ## What's the core goal of CI?
 6 | 
 7 | With CI, the core goal is to keep everyone in sync with each other, which we achieve
 8 | by making sure that newly checked-in code properly integrates with existing code. To
 9 | do this, a CI server detects that the code has been committed, checks it out, and carries out some verification like making sure the code compiles and that tests pass.
10 | 
11 | ---
12 | 
13 | ## What benefits do we get with CI? (Hint: there are 3)
14 | 
15 | CI has a number of benefits:
16 | 
17 | -  We get some level of fast feedback as to the quality of our code.
18 | -  It allows us to automate the creation of our binary artifacts. All the code
19 |    required to build the artifact is itself version controlled, so we can re-create the artifact if needed.
20 | -  We also get some level of traceability from a deployed artifact back to
21 |    the code, and depending on the capabilities of the CI tool itself, can see what tests
22 |    were run on the code and artifact too
23 | 
24 | ---
25 | 
26 | ## What questions do we have to ask ourserlf to see if we uderstand CI?
27 | 
28 | -  Do you check in to mainline once per day?
29 | -  Do you have a suite of tests to validate your changes?
30 | -  When the build is broken, is it the #1 priority of the team to fix it?
31 | 
32 | ---
33 | 
34 | ## According to Jez Humble and Dave Farley’s book. What's Continuos Delivery (CD)?
35 | 
36 | Continuous delivery (CD) builds on this concept, and then some. As outlined in Jez
37 | Humble and Dave Farley’s book of the same name, continuous delivery is the
38 | approach whereby we get constant feedback on the production readiness of each and
39 | every check-in, and furthermore treat each and every check-in as a release candidate.
40 | 
41 | ---
42 | 
43 | ## What's "configuration drift"?
44 | 
45 | By storing all our configuration in source control, we are trying to ensure that we can
46 | automatically reproduce services and hopefully entire environments at will. But once
47 | we run our deployment process, what happens if someone comes along, logs into the
48 | box, and changes things independently of what is in source control? This problem is
49 | often called conguration drift — the code in source control no longer reflects the configuration of the running host.
50 | 
51 | ---
52 | 
53 | ## How does PaaS work?
54 | 
55 | When using a platform as a service (PaaS), you are working at a higher-level abstraction than at a single host. Most of these platforms rely on taking a technology-specific
56 | artifact, such as a Java WAR file or Ruby gem, and automatically provisioning and
57 | running it for you. Some of these platforms will transparently attempt to handle scaling the system up and down for you, although a more common (and in my experience less error-prone) way will allow you some control over how many nodes your
58 | service might run on, but it handles the rest.
59 | 
60 | ---
61 | 
62 | ## Disavantages of using PaaS solutions
63 | 
64 | When PaaS solutions work well, they work very well indeed. However, when they
65 | don’t quite work for you, you often don’t have much control in terms of getting under
66 | the hood to fix things. This is part of the trade-off you make. I would say that in my
67 | experience the smarter the PaaS solutions try to be, the more they go wrong. I’ve used
68 | more than one PaaS that attempts to autoscale based on application use, but does it
69 | badly.
70 | 
71 | ---
72 | 
73 | ## Why slicing up a phisical machine into multiple VM's might not be a good idea?
74 | 
75 | Well, for some people, you can. However, slicing up the machine into ever increasing
76 | VMs isn’t free. Think of our physical machine as a sock drawer. If we put lots of
77 | wooden dividers into our drawer, can we store more socks or fewer? The answer is
78 | fewer: the dividers themselves take up room too! Our drawer might be easier to deal
79 | with and organize, and perhaps we could decide to put T-shirts in one of the spaces
80 | now rather than just socks, but more dividers means less overall space.
81 | 
82 | ---
83 | 
84 | ## Which tool can we use to define the infraestructure of a system?
85 | 
86 | Building a system like this required a significant amount of work. The effort is often
87 | front-loaded, but can be essential to manage the deployment complexity you have. I
88 | hope in the future you won’t have to do this yourself. **Terraform** is a very new tool
89 | from Hashicorp, which works in this space.
90 | 
91 | ---
92 | 


--------------------------------------------------------------------------------
/chapters/chapter-07.md:
--------------------------------------------------------------------------------
  1 | # Chapter 7: Testing
  2 | 
  3 | ---
  4 | 
  5 | ## How has the trend been when it comes to testing? (manual or automated)
  6 | 
  7 | The trend
  8 | recently has been away from any large-scale manual testing, in favor of automating as
  9 | much as possible, and I certainly agree with this approach. If you currently carry out
 10 | large amounts of manual testing, I would suggest you address that before proceeding
 11 | too far down the path of microservices, as you won’t get many of their benefits if you
 12 | are unable to validate your software quickly and efficiently
 13 | 
 14 | ---
 15 | 
 16 | ## When it comes to the testing pyramid, what happens when we go up?
 17 | 
 18 | When you’re reading the pyramid, the key thing to take away is that as you go up the
 19 | pyramid, the test scope increases, as does our confidence that the functionality being
 20 | tested works. On the other hand, the feedback cycle time increases as the tests take
 21 | longer to run, and when a test fails it can be harder to determine which functionality
 22 | has broken
 23 | 
 24 | ---
 25 | 
 26 | ## When it comes to the testing pyramid, what happens when we go down?
 27 | 
 28 | As you go down the pyramid, in general the tests become much faster, so
 29 | we get much faster feedback cycles. We find broken functionality faster, our continuous integration builds are faster, and we are less likely to move on to a new task
 30 | before finding out we have broken something. When those smaller-scoped tests fail,
 31 | we also tend to know what broke, often exactly what line of code. On the flipside, we
 32 | don’t get a lot of confidence that our system as a whole works if we’ve only tested one
 33 | line of code!
 34 | 
 35 | ---
 36 | 
 37 | ## Explain stubbing downstream collaborators.
 38 | 
 39 | When I talk about stubbing downstream collaborators, I mean that we create a stub
 40 | service that responds with canned responses to known requests from the service
 41 | under test. For example, I might tell my stub points bank that when asked for the balance of customer 123, it should return 15,000. The test doesn’t care if the stub is
 42 | called 0, 1, or 100 times.
 43 | 
 44 | ---
 45 | 
 46 | ## Explain mocking in comparison to stubbing
 47 | 
 48 | When using a mock, I actually go further and make sure the call was made. If the
 49 | expected call is not made, the test fails. Implementing this approach requires more
 50 | smarts in the fake collaborators that we create, and if overused can cause tests to
 51 | become brittle. As noted, however, a stub doesn’t care if it is called 0, 1, or many
 52 | times
 53 | 
 54 | ---
 55 | 
 56 | ## What's Mountebank?
 57 | 
 58 | You can think of Mountebank as a small software appliance that is programmable via
 59 | HTTP. The fact that it happens to be written in NodeJS is completely opaque to any
 60 | calling service. When it launches, you send it commands telling it what port to stub
 61 | on, what protocol to handle (currently TCP, HTTP, and HTTPS are supported, with
 62 | more planned), and what responses it should send when requests are sent. It also sup‐
 63 | ports setting expectations if you want to use it as a mock. You can add or remove
 64 | these stub endpoints at will, making it possible for a single Mountebank instance to
 65 | stub more than one downstream dependency.
 66 | 
 67 | ---
 68 | 
 69 | ## What happens when you have more moving parts in your tests?
 70 | 
 71 | The more moving parts, the more brittle our tests may be, and the less deterministic
 72 | they are. If you have tests that sometimes fail, but everyone just re-runs them because
 73 | they may pass again later, then you have flaky tests. It isn’t only tests covering lots of
 74 | different process that are the culprit here.
 75 | 
 76 | ---
 77 | 
 78 | ## According to Martin Fowlen, what should we do when we have flaky tests?
 79 | 
 80 | In “Eradicating Non-Determinism in Tests”, Martin Fowler advocates the approach
 81 | that if you have flaky tests, you should track them down and if you can’t immediately
 82 | fix them, remove them from the suite so you can treat them.
 83 | 
 84 | ---
 85 | 
 86 | ## When is it good to remove a e2e test?
 87 | 
 88 | When it comes to the
 89 | larger-scoped test suites, however, this is exactly what we need to be able to do. If the
 90 | same feature is covered in 20 different tests, perhaps we can get rid of half of them, as
 91 | those 20 tests take 10 minutes to run!
 92 | 
 93 | ---
 94 | 
 95 | ## Instead of including a e2e test for each new piece of functionality, which other approach can we take?
 96 | 
 97 | The best way to counter this is to focus on a small number of core journeys to test for
 98 | the whole system. Any functionality not covered in these core journeys needs to be
 99 | covered in tests that analyze services in isolation from each other. These journeys
100 | need to be mutually agreed upon, and jointly owned. For our music shop, we might
101 | focus on actions like ordering a CD, returning a product, or perhaps creating a new
102 | customer—high-value interactions and very few in number.
103 | 
104 | ---
105 | 
106 | ## Explain CDC (Consumer-Driven Contracts)
107 | 
108 | With CDCs, we are defining the expectations of a consumer on a service (or producer). The expectations of the consumers are captured in code form as tests, which
109 | are then run against the producer. If done right, these CDCs should be run as part of
110 | the CI build of the producer, ensuring that it never gets deployed if it breaks one of
111 | these contracts.
112 | 
113 | ---
114 | 
115 | ## So Should You Use End-to-End Tests?
116 | 
117 | You can view running end-to-end tests prior to production deployment as training
118 | wheels. While you are learning how CDCs work, and improving your production
119 | monitoring and deployment techniques, these end-to-end tests may form a useful
120 | safety net, where you are trading off cycle time for decreased risk. But as you improve
121 | those other areas, you can start to reduce your reliance on end-to-end tests to the
122 | point where they are no longer needed.
123 | 
124 | ---
125 | 
126 | ## What's a smoke test suite?
127 | 
128 | A common example of this is
129 | the smoke test suite, a collection of tests designed to be run against newly deployed
130 | software to confirm that the deployment worked. These tests help you pick up any
131 | local environmental issues.
132 | 
133 | ---
134 | 
135 | ## What's a blue/green deployment?
136 | 
137 | Another example of this is what is called blue/green deployment. With blue/green, we
138 | have two copies of our software deployed at a time, but only one version of it is
139 | receiving real requests.
140 | Let’s consider a simple example, seen in Figure 7-12. In production, we have v123 of
141 | the customer service live. We want to deploy a new version, v456. We deploy this
142 | alongside v123, but do not direct any traffic to it. Instead, we perform some testing in
143 | situ against the newly deployed version. Once the tests have worked, we direct the
144 | production load to the new v456 version of the customer service. It is common to
145 | keep the old version around for a short period of time, allowing for a fast fallback if
146 | you detect any errors.
147 | 
148 | ![image](https://user-images.githubusercontent.com/1868409/89364152-da5ea000-d69f-11ea-85c3-bb6b400676ae.png)
149 | 
150 | ---
151 | 
152 | ## What's canary releasing?
153 | 
154 | With canary releasing, we are verifying our newly deployed software by directing
155 | amounts of production traffic against the system to see if it performs as expected.
156 | “Performing as expected” can cover a number of things, both functional and non‐
157 | functional. For example, we could check that a newly deployed service is responding
158 | to requests within 500ms, or that we see the same proportional error rates from the
159 | new and the old service.
160 | 
161 | ---
162 | 
163 | ## What do you need to decide when considering canary releasing? (about the requests)
164 | 
165 | When considering canary releasing, you need to decide if you are going to divert a
166 | portion of production requests to the canary or just copy production load. Some
167 | teams are able to shadow production traffic and direct it to their canary. In this way,
168 | the existing production and canary versions can see exactly the same requests, but
169 | only the results of the production requests are seen externally. This allows you to do a
170 | side-by-side comparison while eliminating the chance that a failure in the canary can
171 | be seen by a customer request.
172 | 
173 | ---
174 | 
175 | ## Explain the trade-offs between MTBF and MTTR?
176 | 
177 | Sometimes expending the same effort into getting better at remediation of a release
178 | can be significantly more beneficial than adding more automated functional tests. In
179 | the web operations world, this is often referred to as the trade-off between optimizing
180 | for mean time between failures (MTBF) and mean time to repair (MTTR).
181 | 
182 | ---
183 | 
184 | ## What are nonfunctional requirements?
185 | 
186 | Nonfunctional
187 | requirements is an umbrella term used to describe those characteristics your system
188 | exhibits that cannot simply be implemented like a normal feature. They include
189 | aspects like the acceptable latency of a web page, the number of users a system should
190 | support, how accessible your user interface should be to people with disabilities, or
191 | how secure your customer data should be.
192 | 
193 | ---
194 | 
195 | ## What should you ensure to measure in your performance tests?
196 | 
197 | To generate worthwhile results, you’ll often need to run given scenarios with gradu‐
198 | ally increasing numbers of simulated customers. This allows you to see how latency of
199 | calls varies with increasing load. This means that performance tests can take a while
200 | to run. In addition, you’ll want the system to match production as closely as possible, to ensure that the results you see will be indicative of the performance you can expect
201 | on the production systems.
202 | 
203 | ---
204 | 
205 | ## When it comes to our performance test results, what's important to consider?
206 | 
207 | And make sure you also look at the results! I’ve been very surprised by the number of
208 | teams I have encountered who have spent a lot of work implementing tests and run‐
209 | ning them, and never check the numbers. Often this is because people don’t know
210 | what a good result looks like. You really need to have targets. This way, you can make
211 | the build go red or green based on the results, with a red (failing) build being a clear
212 | call to action.
213 | 
214 | ---
215 | 
216 | ## Two points about CDC and E2E tests...
217 | 
218 | -  Avoid the need for end-to-end tests wherever possible by using consumer-driven
219 |    contracts.
220 | -  Use consumer-driven contracts to provide focus points for conversations
221 |    between teams.
222 | 


--------------------------------------------------------------------------------
/chapters/chapter-08.md:
--------------------------------------------------------------------------------
  1 | # Chapter 8: Monitoring
  2 | 
  3 | ---
  4 | 
  5 | ## What two thing do you need to monitor in your services?
  6 | 
  7 | -  First, we’ll want to monitor the host itself. CPU, memory, etc.
  8 | -  Secondly, we might want to monitor the application itself. At a bare minimum, monitoring the response time of the service is a good idea.
  9 | 
 10 | ---
 11 | 
 12 | ## What tool can we use to avoid that our logs take up all our disk space?
 13 | 
 14 | We may even get advanced and use logrotate to move old logs out of the way and avoid them taking up all our disk space.
 15 | 
 16 | ---
 17 | 
 18 | ## When it comes to monitoring our service resources (CPU, memory), what tool can we use?
 19 | 
 20 | We’ll want to know what they should be when things are healthy, so we can alert
 21 | when they go out of bounds. If we want to run our own monitoring software, we
 22 | could use something like Nagios to do so, or else use a hosted service like New Relic.
 23 | 
 24 | ---
 25 | 
 26 | ## When monitoring a single service across multiple services. What strategy and tool should we use?
 27 | 
 28 | So at this point, we still want to track the host-level metrics, and alert on them. But
 29 | now we want to see what they are across all hosts, as well as individual hosts. In other
 30 | words, we want to aggregate them up, and still be able to drill down. Nagios lets us
 31 | group our hosts like this—so far, so good. A similar approach will probably suffice for
 32 | our application.
 33 | 
 34 | ---
 35 | 
 36 | ## What tool can we use to view aggregated logs?
 37 | 
 38 | Kibana is an ElasticSearch-backed system for viewing logs, illustrated in Figure 8-4.
 39 | You can use a query syntax to search through logs, allowing you to do things like
 40 | restrict time and date ranges or use regular expressions to find matching strings.
 41 | Kibana can even generate graphs from the logs you send it, allowing you to see at a
 42 | glance how many errors have been generated over time, for example.
 43 | 
 44 | ---
 45 | 
 46 | ## What's the secrect to knowing when to paninc and when to relax?
 47 | 
 48 | Our website is seeing nearly 50 4XX HTTP error codes per second. Is that bad? The CPU load on the catalog service has increased by 20% since lunch; has something gone wrong? The secret to knowing when to panic and when to relax is to gather metrics about how your sys‐
 49 | tem behaves over a long-enough period of time that clear patterns emerge.
 50 | 
 51 | ---
 52 | 
 53 | ## What's a good way to collect metrics across multiple services? And which would it be a good tool for that?
 54 | 
 55 | We’ll want to be able to look at a metric aggregated for the whole
 56 | system—for example, the avergage CPU load—but we’ll also want to aggregate that
 57 | metric for all the instances of a given service, or even for a single instance of that service. That means we’ll need to be able to associate metadata with the metric to allow
 58 | us to infer this structure. Graphite is one such system that makes this very easy.
 59 | 
 60 | ---
 61 | 
 62 | ## What metrics are good to expose for our services?
 63 | 
 64 | I would strongly suggest having your services expose basic metrics themselves. At a
 65 | bare minimum, for a web service you should probably expose metrics like response
 66 | times and error rates — vital if your server isn’t fronted by a web server that is doing
 67 | this for you. But you should really go further. For example, our accounts service may
 68 | want to expose the number of times customers view their past orders, or your web
 69 | shop might want to capture how much money has been made during the last day.
 70 | 
 71 | ---
 72 | 
 73 | ## Why do we care about knowing which features the final user is using? (2 reasons)
 74 | 
 75 | Why do we care about this? Well, for a number of reasons.
 76 | 
 77 | -  First, there is an old adage
 78 |    that 80% of software features are never used. Now I can’t comment on how accurate
 79 |    that figure is, but as someone who has been developing software for nearly 20 years, I
 80 |    know that I have spent a lot of time on features that never actually get used. Wouldn’t
 81 |    it be nice to know what they are?
 82 | -  Second, we are getting better than ever at reacting to how our users are using our sys‐
 83 |    tem to work out how to improve it. Metrics that inform us of how our systems behave
 84 |    can only help us here. We push out a new version of the website, and find that the
 85 |    number of searches by genre has gone up significantly on the catalog service. Is that a
 86 |    problem, or expected?
 87 | 
 88 | ---
 89 | 
 90 | ## How do we call the fake events that we generate periodically to verify if a service is working correctly? And which technice is associated with it?
 91 | 
 92 | This fake event we created is an example of synthetic transaction. We used this syn‐
 93 | thetic transaction to ensure the system was behaving semantically, which is why this
 94 | technique is often called semantic monitoring.
 95 | 
 96 | ---
 97 | 
 98 | ## What do we need to be careful when applying semantic monitoring?
 99 | 
100 | Likewise, we have to make sure we don’t accidentally trigger unforeseen side effects.
101 | A friend told me a story about an ecommerce company that accidentally ran its tests
102 | against its production ordering systems. It didn’t realize its mistake until a large num‐
103 | ber of washing machines arrived at the head office.
104 | 
105 | ---
106 | 
107 | ## How can we trace a transaction or request that goes across multiple services?
108 | 
109 | One approach that can be useful here is to use correlation IDs. When the first call is
110 | made, you generate a GUID for the call. This is then passed along to all subsequent
111 | calls, and can be put into your logs in a structured way, much as
112 | you’ll already do with components like the log level or date. With the right log aggregation tooling, you’ll then be able to trace that event all the way through your system.
113 | 
114 | ---
115 | 
116 | ## When taling about our development cycles, when is a good moment to start including correlation IDs?
117 | 
118 | Although it might seem like additional work up front, I
119 | would strongly suggest you consider putting them in as soon as you can, especially if
120 | your system will make use of event-drive architecture patterns, which can lead to
121 | some odd emergent behavior.
122 | 
123 | ---
124 | 
125 | ## When talking about monitoring, why is important to have some level of standardization?
126 | 
127 | You should try to write your logs out in a standard format. You definitely want to
128 | have all your metrics in one place, and you may want to have a list of standard names
129 | for your metrics too; it would be very annoying for one service to have a metric called
130 | ResponseTime, and another to have one called RspTimeSecs, when they mean the
131 | same thing.
132 | 
133 | ---
134 | 
135 | ## What questions do we have to make when thinking about our monitoring metrics?
136 | 
137 | What our people want to see and react to right now is different than what they need
138 | when drilling down. So, for the type of person who will be looking at this data, consider the following:
139 | 
140 | -  What they need to know right now
141 | -  What they might want later
142 | -  How they like to consume data
143 | 
144 | ---
145 | 
146 | ## Good book for the graphical display of quantitative information
147 | 
148 | A discussion about all the nuances involved in the graphical display of quantita‐
149 | tive information is certainly outside the scope of this book, but a great place to start is
150 | Stephen Few’s excellent book Information Dashboard Design: Displaying Data for Ata-Glance Monitoring (Analytics Press).
151 | 
152 | ---
153 | 
154 | ## Two things to consider when monitoring one service
155 | 
156 | -  Track inbound response time at a bare minimum. Once you’ve done that, follow
157 |    with error rates and then start working on application-level metrics.
158 | -  Track the health of all downstream responses, at a bare minimum including the
159 |    response time of downstream calls, and at best tracking error rates.
160 | 
161 | ---
162 | 
163 | ## Four things to consider when monitoring the whole system
164 | 
165 | -  Ensure your metric storage tool allows for aggregation at a system or service
166 |    level, and drill down to individual hosts.
167 | -  Ensure your metric storage tool allows you to maintain data long enough to
168 |    understand trends in your system.
169 | -  Understand what requires a call to action, and structure alerting and dashboards
170 |    accordingly.
171 | -  Investigate the possibility of unifying how you aggregate all of your various met‐
172 |    rics by seeing if a tool like Suro or Riemann makes sense for you.
173 | 


--------------------------------------------------------------------------------
/chapters/chapter-09.md:
--------------------------------------------------------------------------------
  1 | # Chapter 9: Security
  2 | 
  3 | ---
  4 | 
  5 | ## How do I identify what it's being authenticated? (abstractly speaking)
  6 | 
  7 | Generally, when we’re talking abstractly about who or what is being authentica‐
  8 | ted, we refer to that party as the principal.
  9 | 
 10 | ---
 11 | 
 12 | ## What's authorization?
 13 | 
 14 | Authorization is the mechanism by which we map from a principal to the action we
 15 | are allowing her to do. Often, when a principal is authenticated, we will be given
 16 | information about her that will help us decide what we should let her do. We might,
 17 | for example, be told what department or office she works in—pieces of information
 18 | that our systems can use to decide what she can and cannot do.
 19 | 
 20 | ---
 21 | 
 22 | ## How does a SSO solution work?
 23 | 
 24 | When a principal tries to access a resource (like a web-based interface), she is directed to authenticate with an identity provider. This may ask her to provide a username
 25 | and password, or might use something more advanced like two-factor authentication.
 26 | Once the identity provider is satisfied that the principal has been authenticated, it
 27 | gives information to the service provider, allowing it to decide whether to grant her
 28 | access to the resource.
 29 | 
 30 | ---
 31 | 
 32 | ## How can we use API gateways with SSO solutions?
 33 | 
 34 | Rather than having each service manage handshaking with your identity provider,
 35 | you can use a gateway to act as a proxy, sitting between your services and the outside
 36 | world (as shown in Figure 9-1). The idea is that we can centralize the behavior for
 37 | redirecting the user and perform the handshake in only one place.
 38 | 
 39 | ![image](https://user-images.githubusercontent.com/1868409/90997068-72102980-e58e-11ea-9166-9b20a3c7c28d.png)
 40 | 
 41 | ---
 42 | 
 43 | ## What's the downside of using API gateways with SSO's?
 44 | 
 45 | I have seen some people put all their eggs in one basket, relying on the
 46 | gateway to handle every step for them. And we all know what happens when we have
 47 | a single point of failure.
 48 | 
 49 | ---
 50 | 
 51 | ## When defining principal roles, why should we keep those roles local to the microservice in question?
 52 | 
 53 | These decisions need to be local to the microservice in question. I have seen people
 54 | use the various attributes supplied by identity providers in horrible ways, using really
 55 | fine-grained roles like CALL_CENTER_50_DOLLAR_REFUND, where they end up
 56 | putting information specific to one part of one of our system’s behavior into their
 57 | directory services. This is a nightmare to maintain and gives very little scope for our
 58 | services to have their own independent lifecycle, as suddenly a chunk of information
 59 | about how a service behaves lives elsewhere, perhaps in a system managed by a different part of the organization.
 60 | 
 61 | ---
 62 | 
 63 | ## Why should HTTP basic authentication be used over HTTPS?
 64 | 
 65 | When using HTTPS, the client gains strong guarantees that the server it is talking to
 66 | is who the client thinks it is. It also gives us additional protection against people
 67 | eavesdropping on the traffic between the client and server or messing with the payload.
 68 | 
 69 | ---
 70 | 
 71 | ## What's the downside of using HTTPS? And how can you work around it?
 72 | 
 73 | Another downside is that traffic sent via SSL cannot be cached by reverse proxies like
 74 | Varnish or Squid. This means that if you need to cache traffic, it will have to be done
 75 | either inside the server or inside the client. You can fix this by having a load balancer
 76 | terminate the SSL traffic, and having the cache sit behind the load balancer.
 77 | 
 78 | ---
 79 | 
 80 | ## Explain TLS
 81 | 
 82 | Another approach to confirm the identity of a client is to make use of capabilities in
 83 | Transport Layer Security (TLS), the successor to SSL, in the form of client certificates.
 84 | Here, each client has an X.509 certificate installed that is used to establish a link
 85 | between client and server. The server can verify the authenticity of the client certificate, providing strong guarantees that the client is valid.
 86 | 
 87 | ---
 88 | 
 89 | ## Using TLS has its complications. So when should we use it?
 90 | 
 91 | Using wildcard certificates can help,
 92 | but won’t solve all problems. This additional burden means you’ll be looking to use
 93 | this technique when you are especially concerned about the sensitivity of the data
 94 | being sent, or if you are sending data via networks you don’t fully control. So you
 95 | might decide to secure communication of very important data between parties that is
 96 | sent over the Internet, for example.
 97 | 
 98 | ---
 99 | 
100 | ## What important consideration should we bear in mind when using strategies like JWT?
101 | 
102 | Finally, understand that this approach ensures only that no third party has manipulated the request and that the private key itself remains private. The rest of the data in
103 | the request will still be visible to parties snooping on the network.
104 | 
105 | ---
106 | 
107 | ## What two aproches do we have when using API keys, and how do we manage them?
108 | 
109 | Some systems use a single API key that is shared, and use an approach similar to
110 | HMAC as just described. A more common approach is to use a public and private
111 | key pair. Typically, you’ll manage keys centrally, just as we would manage identities of
112 | people centrally. The gateway model is very popular in this space.
113 | 
114 | ---
115 | 
116 | ## Explain the type of vulnerability called "confused deputy problem"
117 | 
118 | There is a type of vulnerability called the confused deputy problem, which in the context of service-to-service communication refers to a situation where a malicious party
119 | can trick a deputy service into making calls to a downstream service on his behalf
120 | that he shouldn’t be able to. For example, as a customer, when I log in to the online
121 | shopping system, I can see my account details. What if I could trick the online shop‐
122 | ping UI into making a request for someone else’s details, maybe by making a call with
123 | my logged-in credentials?
124 | 
125 | ---
126 | 
127 | ## Which encryption algorithmns should you use?
128 | 
129 | For encryption at rest, unless you have a very good reason for picking something else,
130 | pick a well-known implementation of AES-128 or AES-256 for your platform
131 | 
132 | ---
133 | 
134 | ## What techniche should you use for securing your stored passwords?
135 | 
136 | For passwords, you should consider using a technique called salted password hashing.
137 | 
138 | ---
139 | 
140 | ## How should we store or keys to access our databases and services?
141 | 
142 | One solution is to use a separate security appliance to encrypt and decrypt data.
143 | Another is to use a separate key vault that your service can access when it needs a key.
144 | The lifecycle management of the keys (and access to change them) can be a vital
145 | operation, and these systems can handle this for you.
146 | 
147 | ---
148 | 
149 | ## Explain: "Decrypt on Demand"
150 | 
151 | Encrypt data when you first see it. Only decrypt on demand, and ensure that data is
152 | never stored anywhere.
153 | 
154 | ---
155 | 
156 | ## What do we have to do in order to keep our backups safe?
157 | 
158 | So it may seem like an obvious point, but we need to make sure that our
159 | backups are also encrypted. This also means that we need to know which keys are
160 | Securing Data at Rest
161 | needed to handle which version of data, especially if the keys change. Having clear
162 | key management becomes fairly important.
163 | 
164 | ---
165 | 
166 | ## When it comes to security. How can our logs help us out?
167 | 
168 | Good logging, and specifically the ability to aggregate logs from multiple systems, is
169 | not about prevention, but can help with detecting and recovering from bad things
170 | happening. For example, after applying security patches you can often see in logs if
171 | people have been exploiting certain vulnerabilities. Patching makes sure it won’t hap‐
172 | pen again, but if it already has happened, you may need to go into recovery mode.
173 | Having logs available allows you to see if something bad happened after the fact.
174 | 
175 | ---
176 | 
177 | ## In security, what are IDS and IPS?
178 | 
179 | Intrusion detection systems (IDS) can monitor networks or hosts for suspicious behav‐
180 | ior, reporting problems when it sees them. Intrusion prevention systems (IPS), as well
181 | as monitoring for suspicious activity, can step in to stop it from happening. Unlike a
182 | firewall, which is primarily looking outward to stop bad things from getting in, IDS
183 | and IPS are actively looking inside the perimeter for suspect behavior.
184 | 
185 | ---
186 | 
187 | ## In AWS, which technice can we use to segregate our networks?
188 | 
189 | AWS, for example, provides the ability to automatically provision a virtual private cloud
190 | (VPC), which allow hosts to live in separate subnets. You can then specify which
191 | VPCs can see each other by defining peering rules, and even route traffic through
192 | gateways to proxy access, giving you in effect multiple perimeters at which additional
193 | security measures can be put into place.
194 | 
195 | ---
196 | 
197 | ## Which two steps should you take when securing OS environment?
198 | 
199 | -  Here, basic advice can get you a long way. Start with only running services as OS users that have as few permissions as possible, to ensure that if such an account is compromised it will do minimal damage.
200 | -  Next, patch your software. Regularly. This needs to be automated, and you need to
201 |    know if your machines are out of sync with the latest patch levels.
202 | 
203 | ---
204 | 
205 | ## How having a microservice architecture can give us much more freedom when implementing our security?
206 | 
207 | For those parts that deal with the most sensitive information
208 | or expose the most valuable capabilities, we can adopt the strictest security provi‐
209 | sions. But for other parts of the system, we can afford to be much more lax in what
210 | we worry about.
211 | 
212 | ---
213 | 
214 | ## When it comes to storing private data, what do you have to take into account? (hint: The German phrase Datensparsamkeit)
215 | 
216 | The German phrase Datensparsamkeit represents this concept. Originating from German privacy legislation, it encapsulates the concept of only storing as much information as is absolutely required to fulfill business operations or satisfy local laws.
217 | 
218 | ---
219 | 
220 | ## How can we help to educate developers about security concerns?
221 | 
222 | Getting people familar with the OWASP Top Ten list and OWASP’s Security Testing Framework can be a great place to start. Specialists absolutely have their place, though, and if you have access to them, use them to help you.
223 | 
224 | ---
225 | 
226 | ## How can we use external parties to assest the security of our system?
227 | 
228 | With security, I think there is great value in having an external assessment done.
229 | Exercises like penetration testing, when done by an outside party, really do mimic
230 | real-world attempts. They also sidestep the issue that teams aren’t always able to see
231 | the mistakes they have made themselves, as they are too close to the problem.
232 | 


--------------------------------------------------------------------------------
/chapters/chapter-10.md:
--------------------------------------------------------------------------------
  1 | # Chapter 10: Conway’s Law and System Design
  2 | 
  3 | ---
  4 | 
  5 | ## How does S. Raymond sumarize the Conway's law?
  6 | 
  7 | This statement is often quoted, in various forms, as Conway’s law. Eric S. Raymond
  8 | summarized this phenomenon in The New Hacker’s Dictionary (MIT Press) by stating
  9 | “If you have four groups working on a compiler, you’ll get a 4-pass compiler.”
 10 | 
 11 | ---
 12 | 
 13 | ## What rules does Amazon use to manage the size of its teans?
 14 | 
 15 | ... It wanted teams to own and operate the systems they looked after, managing the entire
 16 | lifecycle. But Amazon also knew that small teams can work faster than large teams.
 17 | This led famously to its two-pizza teams, where no team should be so big that it could
 18 | not be fed with two pizzas.
 19 | 
 20 | ---
 21 | 
 22 | ## What does happen when the cost of coordinating change increases?
 23 | 
 24 | When the cost of coordinating change increases, one of two things happen. Either
 25 | people find ways to reduce the coordination/communication costs, or they stop making changes. The latter is exactly how we end up with large, hard-to-maintain
 26 | codebases.
 27 | 
 28 | ---
 29 | 
 30 | ## When it comes to defining teams based on their geographic location, what's a good advice?
 31 | 
 32 | So where does this leave us when considering evolving our own service design? Well,
 33 | I would suggest that geographical boundaries between people involved with the
 34 | development of a system can be a great way to drive when services should be decomposed, and that in general, you should look to assign ownership of a service to a single, colocated team who can keep the cost of change low.
 35 | 
 36 | ---
 37 | 
 38 | ## Explain having a core service as an internal open source project. What would the responsability of the core committers be(the ones with the ownership)?
 39 | 
 40 | With normal open source, a small group of people are considered core committers.
 41 | They are the custodians of the code. If you want a change to an open source project,
 42 | you either ask one of the committers to make the change for you, or else you make
 43 | the change yourself and send them a pull request. The core committers are still in
 44 | charge of the codebase; they are the owners.
 45 | 
 46 | ---
 47 | 
 48 | ## What would the process of vetting a approving changes be for the core team?
 49 | 
 50 | The core team needs to have some way of vetting and approving the changes. It needs
 51 | to make sure the changes are idiomatically consistent—that is, that they follow the
 52 | general coding guidelines of the rest of the codebase. The people doing the vetting are
 53 | therefore going to have to spend time working with the submitters to make sure the
 54 | change is of sufficient quality.
 55 | 
 56 | ---
 57 | 
 58 | ## When working with open source projects, when is it good to take submissions and when it's not?
 59 | 
 60 | Most open source projects tend to not take submissions from a wider group of
 61 | untrusted committers until the core of the first version is done. Following a similar
 62 | model for your own organizations makes sense. If a service is pretty mature, and is
 63 | rarely changed—for example, our cart service—then perhaps that is the time to open
 64 | it up for other contributions.
 65 | 
 66 | ---
 67 | 
 68 | ## What benefits do we get by drawing our service boundaries around our bounded contexts? (3)
 69 | 
 70 | This has multiple benefits. First, a team will find it easier to grasp domain
 71 | concepts within a bounded context, as they are interrelated. Second, services within a
 72 | bounded context are more likely to be services that talk to each other, making system
 73 | design and release coordination easier. Finally, in terms of how the delivery team
 74 | interacts with the business stakeholders, it becomes easier for the team to create good
 75 | relationships with the one or two experts in that area.
 76 | 
 77 | ---
 78 | 
 79 | ## What about those services that are not changed frequently, how can we work around those when our team structures are aligned along the bounded contexts?
 80 | 
 81 | If your team structures are aligned along the bounded contexts of your organization,
 82 | then even services that are not changed frequently still have a de facto owner. Imagine
 83 | a team that is aligned with the consumer web sales context. It might handle the website, cart, and recommendation services. Even if the cart service hasn’t been changed
 84 | in months, it would naturally fall to this team to make the change.
 85 | 
 86 | ---
 87 | 
 88 | ## Explain the "Line of Business" implemented by RealState.au
 89 | 
 90 | Each squad inside a line of business is expected to own the entire lifecycle of the serv‐
 91 | ices it creates, including building, testing and releasing, supporting, and even decom‐
 92 | missioning. A core delivery services team provides advice and guidance to these
 93 | teams, as well as tooling to help it get the job done.
 94 | 
 95 | ---
 96 | 
 97 | ## Coming up with a vision for how things should be done without considering how your current staff will feel about this or without considering what capabilities they have is likely to lead to a bad place. How can we adress this issue?
 98 | 
 99 | Each organization has its own set of dynamics around this topic. Understand your
100 | staff ’s appetite to change. Don’t push them too fast! Maybe you still have a separate
101 | team handle frontline support or deployment for a short period of time, giving your
102 | developers time to adjust to other new practices.
103 | 
104 | ---
105 | 
106 | ## In summary, How should we align our teams?
107 | 
108 | This leads us to trying to align service ownership to colocated teams, which themselves are aligned around the same bounded contexts of the
109 | organization.
110 | 


--------------------------------------------------------------------------------
/chapters/chapter-11.md:
--------------------------------------------------------------------------------
  1 | # Chapter 11: Microservices at Scale
  2 | 
  3 | ---
  4 | 
  5 | ## Prevent failure or deal with it gracefully?
  6 | 
  7 | We can also spend a bit less of our time trying to stop the inevitable, and a bit more of
  8 | our time dealing with it gracefully. I’m amazed at how many organizations put processes and controls in place to try to stop failure from occurring, but put little to no
  9 | thought into actually making it easier to recover from failure in the first place.
 10 | 
 11 | ---
 12 | 
 13 | ## Which 3 requirements do you need to understand to handle failure?
 14 | 
 15 | -  Response time/latency
 16 | -  Availability
 17 | -  Durability of data
 18 | 
 19 | ---
 20 | 
 21 | ## What question do you have to make about response time or latency? and why is important to ask such question?
 22 | 
 23 | How long should various operations take? It can be useful here to measure this
 24 | with different numbers of users to understand how increasing load will impact
 25 | the response time.
 26 | 
 27 | ---
 28 | 
 29 | ## When it comes to "Avaiability", what's important to consider? (3 questions)
 30 | 
 31 | Can you expect a service to be down? Is this considered a 24/7 service? Some
 32 | people like to look at periods of acceptable downtime when measuring availability, but how useful is this to someone calling your service? I should either be able
 33 | to rely on your service responding or not. Measuring periods of downtime is
 34 | really more useful from a historical reporting angle.
 35 | 
 36 | ---
 37 | 
 38 | ## When talking about Durability of data, What 2 questions are important to answer? and what's important to consider?
 39 | 
 40 | How much data loss is acceptable? How long should data be kept for? This is
 41 | highly likely to change on a case-by-case basis. For example, you might choose to
 42 | keep user session logs for a year or less to save space, but your financial transaction records might need to be kept for many years.
 43 | 
 44 | ---
 45 | 
 46 | ## When and how should you "degrade functionality" in your microservices?
 47 | 
 48 | What we need to do is understand the impact of each outage, and work out how to
 49 | properly degrade functionality. If the shopping cart service is unavailable, we’re probably in a lot of trouble, but we could still show the web page with the listing. Perhaps
 50 | we just hide the shopping cart or replace it with an icon saying “Be Back Soon!”
 51 | 
 52 | ---
 53 | 
 54 | ## What do you have to ask yourself when dealing with multiple microservices that depend on multiple downstream collaborator? (Resilience) (2)
 55 | 
 56 | But for every customer-facing
 57 | interface that uses multiple microservices, or every microservice that depends on
 58 | multiple downstream collaborators, you need to ask yourself, “What happens if this is
 59 | down?” and know what to do.
 60 | 
 61 | ---
 62 | 
 63 | ## Systems that act slow or systems that fail fast? Which is the worst?
 64 | 
 65 | When you get down to it, we discovered the hard way that systems
 66 | that just act slow are much harder to deal with than systems that just fail fast. In a
 67 | distributed system, latency kills.
 68 | 
 69 | ---
 70 | 
 71 | ## What three (3) fixes did the Sam's team implement in the ads website project where he was the technical lead?
 72 | 
 73 | We ended up implementing three fixes to avoid this happening again: getting our timeouts right,
 74 | implementing bulkheads to separate out different connection pools, and implementing a circuit breaker to avoid sending calls to an unhealthy system in the first place.
 75 | 
 76 | ---
 77 | 
 78 | ## Explain Chaos Monkey
 79 | 
 80 | The most famous of these programs is the Chaos Monkey, which during certain
 81 | hours of the day will turn off random machines. Knowing that this can and will happen in production means that the developers who create the systems really have to be
 82 | prepared for it.
 83 | 
 84 | ---
 85 | 
 86 | ## Explain the trade-offs when deciding "Timeouts" (3)
 87 | 
 88 | Wait too long to decide that a call has failed, and you can slow the whole system
 89 | down. Time out too quickly, and you’ll consider a call that might have worked as
 90 | failed. Have no timeouts at all, and a downstream system being down could hang
 91 | your whole system.
 92 | 
 93 | ---
 94 | 
 95 | ## Where do you have to put timeouts? and what do you have to do with them?
 96 | 
 97 | Put timeouts on all out-of-process calls, and pick a default timeout for everything.
 98 | Log when timeouts occur, look at what happens, and change them accordingly
 99 | 
100 | ---
101 | 
102 | ## Explain "Circuit Breakers"
103 | 
104 | With a circuit breaker, after a certain number of requests to the downstream resource
105 | have failed, the circuit breaker is blown. All further requests fail fast while the circuit
106 | breaker is in its blown state. After a certain period of time, the client sends a few
107 | requests through to see if the downstream service has recovered, and if it gets enough
108 | healthy responses it resets the circuit breaker. You can see an overview of this process
109 | in Figure 11-2.
110 | 
111 | ![image](https://user-images.githubusercontent.com/1868409/92310004-12dfeb00-ef78-11ea-8a72-e2350a00e4d6.png)
112 | 
113 | ---
114 | 
115 | ## What do you have to do with a request when a circuit breaker is blown (open)? (synchronous and asynchronous operations)
116 | 
117 | While the circuit breaker is blown, you have some options. One is to queue up the
118 | requests and retry them later on. For some use cases, this might be appropriate, especially if you’re carrying out some work as part of a asynchronous job. If this call is
119 | being made as part of a synchronous call chain, however, it is probably better to fail
120 | fast. This could mean propagating an error up the call chain, or a more subtle degrading of functionality.
121 | 
122 | ---
123 | 
124 | ## Explain Bulkheads
125 | 
126 | We should have used different connection pools for each downstream connection. That way, if one connection pool gets exhausted, the other connections aren’t
127 | impacted, as we see in Figure 11-3. This would ensure that if a downstream service
128 | started behaving slowly in the future, only that one connection pool would be impac‐
129 | ted, allowing other calls to proceed as normal.
130 | 
131 | ![image](https://user-images.githubusercontent.com/1868409/92310219-30ae4f80-ef7a-11ea-9284-10c24c66b99a.png)
132 | 
133 | ---
134 | 
135 | ## How can we combine circuit breakers with bulk-heads?
136 | 
137 | We can think of our circuit breakers as an automatic mechanism to seal a bulkhead,
138 | to not only protect the consumer from the downstream problem, but also to potentially protect the downstream service from more calls that may be having an adverse impact.
139 | 
140 | ---
141 | 
142 | ## Why is important to keep isolation in our services?
143 | 
144 | The more one service depends on another being up, the more the health of one
145 | impacts the ability of the other to do its job. If we can use integration techniques that
146 | allow a downstream server to be offline, upstream services are less likely to be affected by outages, planned or unplanned
147 | 
148 | ---
149 | 
150 | ## What's an Idempotent operation?
151 | 
152 | In idempotent operations, the outcome doesn’t change after the first application, even
153 | if the operation is subsequently applied multiple times. If operations are idempotent,
154 | we can repeat the call multiple times without adverse impact. This is very useful when
155 | we want to replay messages that we aren’t sure have been processed, a common way
156 | of recovering from error.
157 | 
158 | ---
159 | 
160 | ## When does using Idempotent operations work well?
161 | 
162 | This mechanism works just as well with event-based collaboration, and can be especially useful if you have multiple instances of the same type of service subscribing to
163 | events. Even if we store which events have been processed, with some forms of asynchronous message delivery there may be small windows where two workers can see
164 | the same message. By processing the events in an idempotent manner, we ensure this
165 | won’t cause us any issues.
166 | 
167 | ---
168 | 
169 | ## Why is good to move microservices into their own hosts?
170 | 
171 | As the microservices are independent
172 | processes that communicate over the network, it should be an easy task to then move
173 | them onto their own hosts to improve throughput and scaling. This can also increase
174 | the resiliency of the system, as a single host outage will impact a reduced number of
175 | microservices.
176 | 
177 | ---
178 | 
179 | ## What's important to consider when looking at SLA? (SLA stands for...?)
180 | 
181 | If you’re using an underlying service provider, it is important to know if a service-level agreement (SLA) is offered and plan
182 | accordingly. If you need to ensure your services are down for no more than four
183 | hours every quarter, but your hosting provider can only guarantee a downtime of
184 | eight hours per quarter, you have to either change the SLA, or come up with an alternative solution.
185 | 
186 | ---
187 | 
188 | ## Explain VLAN
189 | 
190 | One mitigation is to have all the instances of the microservice inside a single
191 | VLAN, as we see in Figure 11-5. A VLAN is a virtual local area network, that is isolated in such a way that requests from outside it can come only via a router, and in
192 | this case our router is also our SSL-terminating load balancer. The only communication to the microservice from outside the VLAN comes over HTTPS, but internally
193 | everything is HTTP.
194 | 
195 | ![image](https://user-images.githubusercontent.com/1868409/92339525-05685500-f08d-11ea-8fca-f72cdcc28bfb.png)
196 | 
197 | ---
198 | 
199 | ## Explain what Jeff Dean said in its presentation: "“Challenges in Building Large-Scale Information Retrieval Systems" (2)
200 | 
201 | The architecture that gets you started may not be the architecture that keeps you
202 | going when your system has to handle very different volumes of load. As Jeff Dean
203 | said in his presentation “Challenges in Building Large-Scale Information Retrieval
204 | Systems” (WSDM 2009 conference), you should “design for ~10× growth, but plan to
205 | rewrite before ~100×.” At certain points, you need to do something pretty radical to
206 | support the next level of growth.
207 | 
208 | ---
209 | 
210 | ## Why is preparing our systems for a massive usage from the very beginning a bad idea?
211 | 
212 | We need to be able to rapidly
213 | experiment, and understand what capabilities we need to build. If we tried building
214 | for massive scale up front, we’d end up front-loading a huge amount of work to prepare for load that may never come, while diverting effort away from more important
215 | activities, like understanding if anyone will want to actually use our product.
216 | 
217 | ---
218 | 
219 | ## How can we scale reads in relational databases? How is such setup called?
220 | 
221 | In a relational database management system (RDBMS) like MySQL or Postgres, data
222 | can be copied from a primary node to one or more replicas. This is often done to
223 | ensure that a copy of our data is kept safe, but we can also use it to distribute our
224 | reads. A service could direct all writes to the single primary node, but distribute reads
225 | to one or more read replicas, as we see in Figure 11-6. The replication from the primary database to the replicas happens at some point after the write. This means that
226 | with this technique reads may sometimes see stale data until the replication has completed. Eventually the reads will see the consistent data. Such a setup is called eventually consistent, and if you can handle the temporary inconsistency it is a fairly easy
227 | and common way to help scale systems
228 | 
229 | ![image](https://user-images.githubusercontent.com/1868409/92339830-69d7e400-f08e-11ea-9961-0b324caad525.png)
230 | 
231 | ---
232 | 
233 | ## What approach do we have when scaling for writes?
234 | 
235 | One approach is to use
236 | sharding. With sharding, you have multiple database nodes. You take a piece of data
237 | to be written, apply some hashing function to the key of the data, and based on the
238 | result of the function learn where to send the data. To pick a very simplistic (and
239 | actually bad) example, imagine that customer records A–M go to one database
240 | instance, and N–Z another.
241 | 
242 | ---
243 | 
244 | ## In short words, Explain CQRS
245 | 
246 | The Command-Query Responsibility Segregation (CQRS) pattern refers to an alter‐
247 | nate model for storing and querying information. With normal databases, we use one
248 | system for performing modifications to data and querying the data. With CQRS, part
249 | of the system deals with commands, which capture requests to modify state, while
250 | another part of the system deals with queries.
251 | 
252 | ---
253 | 
254 | ## Explain Client-Side, Proxy and Server-side Caching
255 | 
256 | -  In client-side caching, the client stores the cached result. The client gets to decide
257 |    when (and if) it goes and retrieves a fresh copy. Ideally, the downstream service will
258 |    provide hints to help the client understand what to do with the response, so it knows
259 |    when and if to make a new request.
260 | -  With proxy caching, a proxy is placed between
261 |    the client and the server. A great example of this is using a reverse proxy or content
262 |    delivery network (CDN).
263 | -  With server-side caching, the server handles caching
264 |    responsibility, perhaps making use of a system like Redis or Memcache, or even a
265 |    simple in-memory cache.
266 | 
267 | ---
268 | 
269 | ## Benefits of Client-side caching
270 | 
271 | Clientside caching can help reduce network calls drastically, and can be one of the fastest
272 | ways of reducing load on a downstream service. In this case, the client is in charge of
273 | the caching behavior.
274 | 
275 | ---
276 | 
277 | ## Benefits of Proxy caching
278 | 
279 | With proxy caching, everything is opaque to both the client and server. This is often a
280 | very simple way to add caching to an existing system. If the proxy is designed to
281 | cache generic traffic, it can also cache more than one service; a common example is a
282 | reverse proxy like Squid or Varnish, which can cache any HTTP traffic.
283 | 
284 | ---
285 | 
286 | ## Benefits of Server-side caching
287 | 
288 | With a cache near or inside a service boundary, it can be easier
289 | to reason about things like invalidation of data, or track and optimize cache hits. In a
290 | situation where you have multiple types of clients, a server-side cache could be the
291 | fastest way to improve performance.
292 | 
293 | ---
294 | 
295 | ## How can we use HTTP headers in our caching systems?
296 | 
297 | First, with HTTP, we can use cache-control directives in our responses to clients.
298 | These tell clients if they should cache the resource at all, and if so how long they
299 | should cache it for in seconds. We also have the option of setting an Expires header,
300 | where instead of saying how long a piece of content can be cached for, we specify a
301 | time and date at which a resource should be considered stale and fetched again.
302 | 
303 | ---
304 | 
305 | ## Explain the Guardian technique
306 | 
307 | A technique I saw used at the Guardian, and subsequently elsewhere, was to crawl the
308 | existing live site periodically to generate a static version of the website that could be
309 | served in the event of an outage. Although this crawled version wasn’t as fresh as the
310 | cached content served from the live system, in a pinch it could ensure that a version
311 | of the site would get displayed.
312 | 
313 | ---
314 | 
315 | ## How can we avoid our services of getting flooded with requests if all the cache vanishes?
316 | 
317 | One way to protect the origin in such a situation is never to allow requests to go to
318 | the origin in the first place. Instead, the origin itself populates the cache asynchro‐
319 | nously when needed, as shown in Figure 11-7. If a cache miss is caused, this triggers
320 | an event that the origin can pick up on, alerting it that it needs to repopulate the
321 | cache. So if an entire shard has vanished, we can rebuild the cache in the background.
322 | 
323 | ![image](https://user-images.githubusercontent.com/1868409/92422506-0617ee80-f154-11ea-9c45-38e787733dec.png)
324 | 
325 | ---
326 | 
327 | ## When applying cache, what do we have to be careful about?
328 | 
329 | Be careful about caching in too many places! The more caches between you and the
330 | source of fresh data, the more stale the data can be, and the harder it can be to determine the freshness of the data that a client eventually sees. This can be especially
331 | problematic with a microservice architecture where you have multiple services
332 | involved in a call chain.
333 | 
334 | ---
335 | 
336 | ## What's a good advice when using Autoscaling?
337 | 
338 | Both reactive and predictive scaling are very useful, and can help you be much more
339 | cost effective if you’re using a platform that allows you to pay only for the computing
340 | resources you use. But they also require careful observation of the data available to
341 | you. I’d suggest using autoscaling for failure conditions first while you collect the
342 | data. Once you want to start scaling for load, make sure you are very cautious about
343 | scaling down too quickly. In most situations, having more computing power at your
344 | hands than you need is much better than not having enough!
345 | 
346 | ---
347 | 
348 | ## Explain the CAP Theorem in short words
349 | 
350 | At its heart it tells us that in a distributed system, we have three things we
351 | can trade off against each other: consistency, availability, and partition tolerance.
352 | Specifically, the theorem tells us that we get to keep two in a failure mode.
353 | 
354 | ---
355 | 
356 | ## How can we sacrfice consistency? What do we get by doing so?
357 | 
358 | Let’s assume that we don’t shut the inventory service down entirely. If I make a
359 | change now to the data in DC1, the database in DC2 doesn’t see it. This means any
360 | requests made to our inventory node in DC2 see potentially stale data. In other
361 | words, our system is still available in that both nodes are able to serve requests, and
362 | we have kept the system running despite the partition, but we have lost consistency.
363 | This is often called a AP system. We don’t get to keep all three.
364 | 
365 | ---
366 | 
367 | ## How can we sacrfice availability? What do we get by doing so?
368 | 
369 | Now in the partition, if the database nodes can’t talk to each other, they cannot coordinate to ensure consistency. We
370 | are unable to guarantee consistency, so our only option is to refuse to respond to the
371 | request. In other words, we have sacrificed availability. Our system is consistent and
372 | partition tolerant, or CP. In this mode our service would have to work out how to
373 | degrade functionality until the partition is healed and the database nodes can be
374 | resynchronized.
375 | 
376 | ---
377 | 
378 | ## What's a good advice when we want to achieve multinode consistency? What tool can we use?
379 | 
380 | Getting multinode consistency right is so hard that I would strongly, strongly suggest
381 | that if you need it, don’t try to invent it yourself. Instead, pick a data store or lock
382 | service that offers these characteristics. Consul, for example, which we’ll discuss
383 | shortly, implements a strongly consistent key/value store designed to share configura‐
384 | tion between multiple nodes.
385 | 
386 | ---
387 | 
388 | ## So AP or CP?
389 | 
390 | Without knowing the context in which the operation is being used, we can’t know the
391 | right thing to do. Knowing about the CAP theorem just helps you understand that
392 | this trade-off exists and what questions to ask.
393 | 
394 | ---
395 | 
396 | ## What about those posts claiming they have beaten the CAP theorem, are they true?
397 | 
398 | You’ll often see posts about people beating the CAP theorem. They haven’t. What they
399 | have done is create a system where some capabilities are CP, and some are AP.
400 | 
401 | ---
402 | 
403 | ## Why is sometimes more convenient to go AP rather than CP?
404 | 
405 | We have to recognize that no matter how consistent our systems might be in and of
406 | themselves, they cannot know everything that happens, especially when we’re keeping
407 | records of the real world. This is one of the main reasons why AP systems end up
408 | being the right call in many situations. Aside from the complexity of building CP systems, they can’t fix all our problems anyway.
409 | 
410 | ---
411 | 
412 | ## Good tool for generating documentation of our Microservices
413 | 
414 | Swagger lets you describe your API in order to generate a very nice web UI that
415 | allows you to view the documentation and interact with the API via a web browser.
416 | The ability to execute requests is very nice: you can define POST templates, for exam‐
417 | ple, making it clear what sort of content the server expects.
418 | 
419 | ---
420 | 
421 | ## Whent to use HAL, and when to use Swagger?
422 | 
423 | If you’re using hypermedia, my recommendation is to go with
424 | HAL over Swagger. But if you’re not using hypermedia and can’t justify the switch, I’d definitely suggest giving Swagger a go.
425 | 


--------------------------------------------------------------------------------
/chapters/chapter-12.md:
--------------------------------------------------------------------------------
 1 | [//]: <> (Chaper 12: Bringing It All Together)
 2 | 
 3 | ## List the Principles of Microservices (7)
 4 | 
 5 | ---
 6 | 
 7 | ![image](https://user-images.githubusercontent.com/1868409/94346320-32f84c80-0002-11eb-8888-10b919c74c91.png)
 8 | 
 9 | ===
10 | 
11 | ## What can we do to maximize the autonomy that microservices make possible?
12 | 
13 | ---
14 | 
15 | To maximize the autonomy that microservices make possible, we need to constantly
16 | be looking for the chance to delegate decision making and control to the teams that
17 | own the services themselves. This process starts with embracing self-service wherever
18 | possible, allowing people to deploy software on demand, making development and
19 | testing as easy as possible, and avoiding the need for separate teams to perform these
20 | activities.
21 | 
22 | ===
23 | 
24 | ## If we don’t account for the fact that a downstream call can and will fail, what can happen?
25 | 
26 | ---
27 | 
28 | If we don’t account for
29 | the fact that a downstream call can and will fail, our systems might suffer catastrophic
30 | cascading failure, and we could find ourselves with a system that is much more fragile
31 | than before.
32 | 
33 | ===
34 | 
35 | ## When taking decisions as to our microservice architecture, we need to accept that we're going to get some things wrong. What are our options?
36 | 
37 | ---
38 | 
39 | So, knowing we are going to get some things wrong, what are our options? Well, I would suggest
40 | finding ways to make each decision small in scope; that way, if you get it wrong, you
41 | only impact a small part of your system. Learn to embrace the concept of evolutionary architecture, where your system bends and flexes and changes over time as you
42 | learn new things.
43 | 


--------------------------------------------------------------------------------
/package-lock.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "name": "building-microservices-notes",
 3 |   "version": "1.0.0",
 4 |   "lockfileVersion": 1,
 5 |   "requires": true,
 6 |   "dependencies": {
 7 |     "markdown-to-anki": {
 8 |       "version": "0.2.3",
 9 |       "resolved": "https://registry.npmjs.org/markdown-to-anki/-/markdown-to-anki-0.2.3.tgz",
10 |       "integrity": "sha512-IPRqVtFfVrN7f+my60XxMwRJyxtKoRV7XO1pVj8qiO4F7xWIumi29qziN8jvGApeoHd4RGG5qZ6xaTXwFYtfIw==",
11 |       "requires": {
12 |         "marked": "^0.3.6"
13 |       }
14 |     },
15 |     "marked": {
16 |       "version": "0.3.19",
17 |       "resolved": "https://registry.npmjs.org/marked/-/marked-0.3.19.tgz",
18 |       "integrity": "sha512-ea2eGWOqNxPcXv8dyERdSr/6FmzvWwzjMxpfGB/sbMccXoct+xY+YukPD+QTUZwyvK7BZwcr4m21WBOW41pAkg=="
19 |     }
20 |   }
21 | }
22 | 


--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "name": "building-microservices-notes",
 3 |   "version": "1.0.0",
 4 |   "description": "My personal notes for the book: Building Microservices by Sam Newman.",
 5 |   "main": "none",
 6 |   "scripts": {
 7 |     "build-12": "markdown-to-anki chapters/chapter-12.md > chapter-12.txt"
 8 |   },
 9 |   "repository": {
10 |     "type": "git",
11 |     "url": "git+https://github.com/Andrew4d3/building-microservices-notes.git"
12 |   },
13 |   "keywords": [
14 |     "anki",
15 |     "microservices"
16 |   ],
17 |   "author": "Andrew4d3",
18 |   "license": "MIT",
19 |   "bugs": {
20 |     "url": "https://github.com/Andrew4d3/building-microservices-notes/issues"
21 |   },
22 |   "homepage": "https://github.com/Andrew4d3/building-microservices-notes#readme",
23 |   "dependencies": {
24 |     "markdown-to-anki": "^0.2.3"
25 |   }
26 | }


--------------------------------------------------------------------------------