├── .gitattributes
├── .gitignore
├── 100-go-mistakes-and-how-to-avoid-them
├── README.md
├── assets
│ ├── append-slice-problem.png
│ ├── at-most-two-goroutines-running.png
│ ├── bad-data-alignment.png
│ ├── cache-line-copy-multi-core.png
│ ├── concurrency.png
│ ├── cpu-striding.png
│ ├── error-wrapper-is-not-nil.png
│ ├── false-sharing-same-block.png
│ ├── go-scheduler.png
│ ├── goroutine-switch-enhanced.png
│ ├── goroutine-switch.png
│ ├── hazards-between-instructions.png
│ ├── hazards-between-instructions_improved.png
│ ├── http-client-timeout.png
│ ├── http-server-timeout.png
│ ├── iface-producer-consumer.png
│ ├── ilp-v1.png
│ ├── ilp-v2.png
│ ├── k8s-cpu-throttle.png
│ ├── linked-list-vs-slice.png
│ ├── maps-internals.png
│ ├── mutex-vs-channels.png
│ ├── one-goroutine-running.png
│ ├── optimal-goroutines-gomaxproc.png
│ ├── padding-and-spacial-locality.png
│ ├── parallelism.png
│ ├── range-slice-copy.png
│ ├── slice-of-structs-vs-struct-of-slices.png
│ ├── sql-open-connections.png
│ ├── store-client-dependency.png
│ ├── tests-pyramid.png
│ └── wrap-errors.png
├── chapter-01-go-simple-to-learn-hard-to-master.md
├── chapter-02-code-and-project-organization.md
├── chapter-03-data-types.md
├── chapter-04-control-structures.md
├── chapter-05-strings.md
├── chapter-06-functions-and-methods.md
├── chapter-07-error-management.md
├── chapter-08-concurrency-foundations.md
├── chapter-09-concurrency-practice.md
├── chapter-10-standard-library.md
├── chapter-11-testing.md
└── chapter-12-optimizations.md
├── LICENSE
├── README.md
├── a-common-sense-guide-to-data-structures-and-algorithms
├── README.md
├── assets
│ ├── comparaison-of-different-insertion-sort-scenarios.png
│ ├── different-case-scenarios.png
│ ├── linear-vs-binary-search.png
│ ├── plotted-on-vs-o1-vs-olog.png
│ ├── plotted-on-vs-o1.png
│ ├── plotted-on2-vs-on.png
│ ├── selection-vs-insertion-sort.png
│ └── tree.png
├── chapter-01-why-data-structures-matter.md
├── chapter-02-why-algorithms-matter.md
├── chapter-03-o-yes-big-o-notation.md
├── chapter-04-speeding-up-your-code-with-big-o.md
├── chapter-05-optimizing-code-with-and-without-big-o.md
├── chapter-06-optimizing-for-optimistic-scenarios.md
└── chapter-15-speeding-up-all-the-things-with-binary-search-trees.md
├── a-tour-of-cpp
├── 01-basics.md
├── 02-user-defined-types.md
├── 03-modularity.md
└── assets
│ └── modules-export.png
├── clean-architecture
├── Clean-Architecture.md
└── assets
│ ├── boundary-crossing-against-control-flow.png
│ ├── boundary-gui-business-rules.png
│ ├── boundary-line.png
│ ├── business-rules-and-database-components.png
│ ├── class-diagram-low-high-level-policy.png
│ ├── clean-architecture-hunt-the-wumpus.png
│ ├── clean-architecture.png
│ ├── cohesion-principles-tension.png
│ ├── component-based-taxi-arch.png
│ ├── components-relationships-uni.png
│ ├── cross-cutting-concerns.png
│ ├── db-behind-interface.png
│ ├── facade-pattern.png
│ ├── factories.png
│ ├── hardware-is-a-detail.png
│ ├── isp-problematic-architecture.png
│ ├── isp-segregated-ops.png
│ ├── isp.png
│ ├── loan-entity.png
│ ├── low-high-level-boundary-crossing.png
│ ├── lsp-inheritance.png
│ ├── lsp-square-rectangle.png
│ ├── memory-layout-old-days.png
│ ├── mix-soft-firm-anti-pattern.png
│ ├── ocp.png
│ ├── one-dim-boundary.png
│ ├── os-abstraction-layer.png
│ ├── package-by-component.png
│ ├── package-by-feature.png
│ ├── package-by-layer.png
│ ├── plugging-in-to-business-rules.png
│ ├── ports-adapters.png
│ ├── segregation-of-mutability.png
│ ├── spr.png
│ ├── strategy-pattern.png
│ ├── taxi-object-oriented-arch.png
│ ├── taxi-service-arch.png
│ ├── three-layers.png
│ ├── typical-components-diagram.png
│ ├── video-sales-arch.png
│ ├── video-sales-uml.png
│ ├── violating-sdp.png
│ └── zone-of-exclusion.png
├── cpp-data-structures-and-algorithms
├── .gitignore
├── msvc.ps1
└── sorting
│ ├── bubble-sort.cpp
│ ├── insertion-sort.cpp
│ └── selection-sort.cpp
├── database-internals
├── README.md
├── assets
│ ├── conceptual-storage-webtable.png
│ ├── dbms-architecture.png
│ ├── primary-index-indirection.png
│ └── tree-balancing.png
├── chapter-01-introduction-and-overview.md
└── chapter-02-b-tree-basics.md
├── designing-data-intensive-applications
├── README.md
├── assets
│ ├── b-trees-structure.png
│ ├── graph-structured-data.png
│ ├── hash-indexes.png
│ └── response-time.png
├── chapter-01-reliable-scalable-and-maintainable-applications.md
├── chapter-02-the-battle-of-the-data-models.md
└── chapter-03-storage-and-retrieval.md
├── effective-cpp
├── README.md
└── assets
│ ├── deadly-mi-diamond.png
│ └── virtual-inheritance.png
├── effective-python
└── README.md
├── go-concurrency-patterns
├── .gitignore
├── 01-generator
│ └── main.go
├── 02-fan-in
│ └── main.go
├── 03-fan-out
│ └── main.go
├── 04-daisy-chain
│ └── main.go
├── 05-worker-pool
│ └── main.go
├── README.md
├── examples
│ ├── 01-google-search
│ │ └── main.go
│ └── 02-ping-pong
│ │ └── main.go
└── go.mod
└── systems-performance-enterprise-and-the-cloud
├── README.md
├── assets
├── analysis-perspectives.png
├── counters-statistics-metrics.png
├── cpu-flame-graph.png
└── full-stack.png
└── chapter-01-introduction.md
/.gitattributes:
--------------------------------------------------------------------------------
1 | # Handle line endings automatically for files detected as text
2 | # and leave all files detected as binary untouched.
3 | * text=auto
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # OS generated files #
2 | ######################
3 | .DS_Store
4 | .DS_Store?
5 | ._*
6 | .Spotlight-V100
7 | .Trashes
8 | ehthumbs.db
9 | Thumbs.db
10 | .vscode
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/README.md:
--------------------------------------------------------------------------------
1 | # 100 Go Mistakes and How to Avoid Them
2 |
3 | Notes taken from reading the *100 Go Mistakes and How to Avoid Them* by Teiva Harsanyi.
4 |
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/append-slice-problem.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/append-slice-problem.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/at-most-two-goroutines-running.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/at-most-two-goroutines-running.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/bad-data-alignment.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/bad-data-alignment.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/cache-line-copy-multi-core.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/cache-line-copy-multi-core.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/concurrency.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/concurrency.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/cpu-striding.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/cpu-striding.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/error-wrapper-is-not-nil.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/error-wrapper-is-not-nil.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/false-sharing-same-block.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/false-sharing-same-block.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/go-scheduler.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/go-scheduler.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/goroutine-switch-enhanced.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/goroutine-switch-enhanced.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/goroutine-switch.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/goroutine-switch.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/hazards-between-instructions.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/hazards-between-instructions.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/hazards-between-instructions_improved.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/hazards-between-instructions_improved.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/http-client-timeout.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/http-client-timeout.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/http-server-timeout.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/http-server-timeout.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/iface-producer-consumer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/iface-producer-consumer.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/ilp-v1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/ilp-v1.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/ilp-v2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/ilp-v2.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/k8s-cpu-throttle.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/k8s-cpu-throttle.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/linked-list-vs-slice.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/linked-list-vs-slice.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/maps-internals.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/maps-internals.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/mutex-vs-channels.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/mutex-vs-channels.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/one-goroutine-running.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/one-goroutine-running.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/optimal-goroutines-gomaxproc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/optimal-goroutines-gomaxproc.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/padding-and-spacial-locality.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/padding-and-spacial-locality.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/parallelism.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/parallelism.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/range-slice-copy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/range-slice-copy.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/slice-of-structs-vs-struct-of-slices.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/slice-of-structs-vs-struct-of-slices.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/sql-open-connections.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/sql-open-connections.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/store-client-dependency.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/store-client-dependency.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/tests-pyramid.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/tests-pyramid.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/assets/wrap-errors.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/100-go-mistakes-and-how-to-avoid-them/assets/wrap-errors.png
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/chapter-01-go-simple-to-learn-hard-to-master.md:
--------------------------------------------------------------------------------
1 | ## Chapter 1: Go: Simple to learn but hard to master.
2 |
3 | - Google created the Go programming language in 2007 in response to these challenges:
4 | - Maximizing **agility** and reducing the time to **market** is critical for most organizations
5 | - Ensure that software engineers are as **productive** as possible when reading, writing, and maintaining code.
6 | - Why does Go not have feature *X*?
7 | - Because it **doesn’t fit**, because it affects **compilation** speed or clarity of **design**, or because it would make the fundamental system model too **difficult**.
8 | - Judging the quality of a programming language via its number of features is probably not an accurate metric 💯.
9 | - Go's essential characteristics: **Stability**, **Expressivity**, **Compilation**, **Safety**.
10 | - Go is simple but not easy:
11 | - Example: Study shows that popular repos such as Docker, gRPC, and Kubernetes contain bugs are caused by inaccurate use of the message-passing paradigm via channels.
12 | - Although a concept such as channels and goroutines can be simple to learn, it isn’t an easy topic in practice.
13 | - The cost of software bugs in the U.S. alone to be over $2 trillion 😢.
14 |
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/chapter-04-control-structures.md:
--------------------------------------------------------------------------------
1 | # Chapter 4: Control structures
2 |
3 | ## #30: Ignoring the fact that elements are copied in range loops
4 |
5 | - In Go, everything we **assign** is a **copy**:
6 | - If we assign the result of a function returning a **struct**, it performs a **copy** of that struct.
7 | - If we assign the result of a function returning a **pointer**, it performs a **copy** of the memory address.
8 | - It’s crucial ⚠️ to keep this in mind to avoid common mistakes, including those related to `range` loops. Indeed, when a `range` loop iterates over a data structure, it performs a
9 | **copy** of each element to the value variable.
10 | - So, what if we want to update the slice elements? There are two main options:
11 | ```go
12 | for i := range accounts {
13 | accounts[i].balance += 1000
14 | }
15 | for i := 0; i < len(accounts); i++ {
16 | accounts[i].balance += 1000
17 | }
18 | ```
19 | - Another option is to keep using the `range` loop and access the value but modify the slice type to a slice of account **pointers**:
20 | ```go
21 | accounts := []*account{
22 | {balance: 100.},
23 | {balance: 200.},
24 | {balance: 300.},
25 | }
26 | for _, a := range accounts {
27 | a.balance += 1000
28 | }
29 | ```
30 | - 👎 Iterating over a slice of pointers may be **less efficient** for a CPU because of the lack of **predictability** (CPU caches).
31 |
32 | ## #31: Ignoring how arguments are evaluated in range loops
33 |
34 | - Consider the example below:
35 | ```go
36 | s := []int{0, 1, 2}
37 | for range s {
38 | s = append(s, 10)
39 | }
40 | ```
41 | - When using a `range` loop, the provided expression is evaluated only once, **before** the **beginning** of the loop.
42 | - In this context, *evaluated* means the provided expression is copied to a **temporary** variable, and then `range` iterates over this variable. In this example, when the `s` expression is evaluated, the result is a **slice copy**:
43 |

44 |
45 | - The behavior is **different** with a classic for `loop`.
46 | - The same logic applies to **channels** regarding how the `range` expression is evaluated.
47 | ```go
48 | ch1 := make(chan int, 3)
49 | go func() {
50 | ch1 <- 0
51 | ch1 <- 1
52 | ch1 <- 2
53 | close(ch1)
54 | }()
55 |
56 | ch2 := make(chan int, 3)
57 | go func() {
58 | ch2 <- 10
59 | ch2 <- 11
60 | ch2 <- 12
61 | close(ch2)
62 | }()
63 |
64 | ch := ch1
65 | for v := range ch {
66 | fmt.Println(v)
67 | ch = ch2
68 | }
69 | ```
70 | - The expression provided to `range` is a `ch` channel pointing to `ch1`. Hence, `range` evaluates `ch`, performs a **copy to a temporary** variable, and iterates over elements from this channel. Despite the `ch = ch2` statement, range keeps iterating over `ch1`, **not** `ch2.`
71 | - In **arrays**, the `range` expression is also evaluated **before** the **beginning** of the loop, what is assigned to the temporary loop variable is a **copy** of the array.
72 | - Let’s see this principle in action with the following example that updates a specific array index during the iteration:
73 | ```go
74 | a := [3]int{0, 1, 2}
75 | for i, v := range a {
76 | a[2] = 10
77 | if i == 2 {
78 | fmt.Println(v)
79 | }
80 | }
81 | ```
82 | - This code updates the last index to `10`. However, if we run this code, it does not print `10`; it prints `2`, instead.
83 | - The loop doesn’t update the copy; it updates the **original** array ‼️
84 | - If we want to print the actual value of the last element, we can do so in two ways:
85 | - By accessing the element from its **index**: `fmt.Println(a[2])`.
86 | - Using an array pointer: `for i, v := range &a`.
87 | - We assign a copy of the array pointer to the temporary variable used by `range`. But because both pointers **reference** the **same array**.
88 | - Doesn’t lead to copying the whole array, which may be something to keep in mind in case the array is **significantly large** 💡.
89 |
90 | ## #32: Ignoring the impact of using pointer elements in range loops
91 |
92 | - 💡 If we store **large** structs, and these structs are **frequently mutated**, we can use pointers instead to **avoid a copy** and an insertion for each mutation.
93 | - We will consider the following two structs:
94 | ```go
95 | // A Store that holds a map of Customer pointers
96 | type Store struct {
97 | m map[string]*Customer
98 | }
99 |
100 | // A Customer struct representing a customer
101 | type Customer struct {
102 | ID string
103 | Balance float64
104 | }
105 | ```
106 | - The following method iterates over a slice of `Customer` elements and stores them in the `m` map:
107 | ```go
108 | func (s *Store) storeCustomers(customers []Customer) {
109 | for _, customer := range customers {
110 | s.m[customer.ID] = &customer
111 | }
112 | }
113 | ```
114 | - Iterating over the customers slice using the `range` loop, regardless of the number of elements, creates a **single** customer variable with a **fixed** address ⚠️. We can verify this by printing the pointer address during each iteration:
115 | ```go
116 | func (s *Store) storeCustomers(customers []Customer) {
117 | for _, customer := range customers {
118 | fmt.Printf("%p\n", &customer)
119 | s.m[customer.ID] = &customer
120 | }
121 | }
122 | >>>
123 | 0xc000096020
124 | 0xc000096020
125 | 0xc000096020
126 | ```
127 | - We can overcome this issue by: forcing the creation of a **local variable** in the loop’s scope (`current := customer`) or **creating a pointer** referencing a slice element via its **index** (`customer := &customers[i]`).
128 | - Both solutions are fine. Also note that we took a slice data structure as an input, but the problem
129 | would be similar with a map.
130 |
131 | ## #33: Making wrong assumptions during map iterations
132 |
133 | ### Ordering
134 |
135 | - Regarding ordering, we need to understand a few fundamental behaviors of the map data structure:
136 | - It doesn’t keep the data **sorted by key** (a map isn’t based on a binary tree).
137 | - It doesn’t **preserve the order** in which the data was added.
138 | - But can we at least expect the code to print the keys in the order in which they are currently stored in the map ? No, not even this 😮💨.
139 | - However, let’s note that using packages from the **standard library** or **external libraries** can lead to different behaviors. For example, when the `encoding/json` package **marshals** a map into `JSON`, it reorders the data **alphabetically** by keys, regardless of the insertion order.
140 |
141 | ### Map insert during iteration
142 |
143 | - Consider the following example:
144 | ```go
145 | m := map[int]bool{
146 | 0: true,
147 | 1: false,
148 | 2: true,
149 | }
150 | for k, v := range m {
151 | if v {
152 | m[10+k] = true
153 | }
154 | }
155 | fmt.Println(m) // The result of this code is unpredictable
156 | ```
157 | To understand the reason, we have to read what the Go specification says about a new map entry during an iteration:
158 |
159 | > If a map entry is created during iteration, it may be produced during the iteration or skipped. The choice may vary for each entry created and from one iteration to the next.
160 |
161 | Hence, when an element is added to a map during an iteration, it may be produced during a follow-up iteration, or it may not ⚠️.
162 |
163 | 👍 One solution is to create a copy of the map, like so: `m2 := copyMap(m)` and update `m2` instead.
164 |
165 | ## 34: Ignoring how the break statement works
166 |
167 | - One essential rule to keep in mind is that a `break` statement terminates the execution of the **innermost** `for`, `switch`, or `select` statement.
168 | - So how can we write code that breaks the loop instead of the `switch` statement? The most idiomatic way is to use a label:
169 | ```go
170 | loop:
171 | for i := 0; i < 5; i++ {
172 | fmt.Printf("%d ", i)
173 | switch i {
174 | default:
175 | case 2:
176 | break loop // Not a fancy goto statement !
177 | }
178 | }
179 | ```
180 | - 📔 We can also use `continue` with a label to go to the next iteration of the labeled loop.
181 |
182 | ## #35: Using defer inside a loop
183 |
184 | - Consider the following example:
185 | ```go
186 | func readFiles(ch <-chan string) error {
187 | for path := range ch {
188 | file, err := os.Open(path)
189 | if err != nil {
190 | return err
191 | }
192 | defer file.Close()
193 | // Do something with file
194 | }
195 | return nil
196 | ```
197 | - The `defer` calls are executed not during each loop iteration but when the `readFiles` function returns. If `readFiles` doesn’t return, the file descriptors will be kept open forever, causing **leaks**.
198 | - So, what are the options if we want to keep using `defer`?
199 | 1. We have to **create another surrounding function** around `defer` that is called during each iteration. For example, we can implement a `readFile` function holding the logic for each new file path received:
200 | ```go
201 | func readFile(path string) error {
202 | file, err := os.Open(path)
203 | if err != nil {
204 | return err
205 | }
206 | defer file.Close()
207 | // Do something with file
208 | return nil
209 | }
210 | ```
211 | 2. Another approach could be to make the `readFile` function a **closure**:
212 | ```go
213 | func readFiles(ch <-chan string) error {
214 | for path := range ch {
215 | err := func() error {
216 | // ...
217 | defer file.Close()
218 | // ...
219 | }()
220 | if err != nil {
221 | return err
222 | }
223 | }
224 | return nil
225 | }
226 | ````
227 |
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/chapter-05-strings.md:
--------------------------------------------------------------------------------
1 | # Chapter 5: Strings
2 |
3 | ## #36: Not understanding the concept of a rune
4 |
5 | - 🎗️ In Go, a `rune` is a **Unicode code point**.
6 | - `UTF-8` encodes characters into 1 to 4 bytes, hence, up to 32 bits. This is why in Go, a `rune` is an **alias** of `int32`: `type rune = int32`.
7 | - 📌 In Go, a source code is encoded in `UTF-8`. So, all string literals are encoded into a sequence of bytes using `UTF-8`. However, a string is a **sequence of arbitrary bytes**; it’s not necessarily based on UTF-8.
8 | - A character isn’t always encoded into a **single byte**:
9 | ```go
10 | s := "汉"
11 | fmt.Println(len(s)) // 3 - len built-in function applied on a string doesn’t return the number of characters; it returns the number of bytes.
12 | ```
13 | - Conversely, we can create a string from a list of bytes. We mentioned that the `汉` character was encoded using three bytes, `0xE6`,` 0xB1`, and `0x89`:
14 | ```go
15 | s := string([]byte{0xE6, 0xB1, 0x89})
16 | fmt.Printf("%s\n", s // 汉
17 | ```
18 |
19 | ## #37: Inaccurate string iteration
20 |
21 | - Let’s look at a concrete example. Here, we want to print the different `runes` in a string and their corresponding positions:
22 | ```go
23 | s := "hêllo"
24 | for i := range s {
25 | fmt.Printf("position %d: %c\n", i, s[i])
26 | }
27 | fmt.Printf("len=%d\n", len(s))
28 | ```
29 | - We have to recognize that in this example, we don’t iterate over each `rune`; instead, we iterate over each **starting index** of a `rune`.
30 | - Printing `s[i]` doesn’t print the *ith* `rune`; it prints the `UTF-8` representation of the byte at index `i`. To fix this, we have to use the value element of the range operator:
31 | ```go
32 | s := "hêllo"
33 | for i, r := range s {
34 | fmt.Printf("position %d: %c\n", i, r)
35 | }
36 | ```
37 | - The other approach is to convert the string into a slice of `runes` and iterate over it:
38 | ```go
39 | s := "hêllo"
40 | runes := []rune(s)
41 | for i, r := range runes {
42 | fmt.Printf("position %d: %c\n", i, r)
43 | }
44 | ```
45 |
46 | ## #38: Misusing trim functions
47 |
48 | - One common mistake made by Go developers when using the `strings` package is to **mix** `TrimRight` and `TrimSuffix`.
49 | - `TrimRight` iterates backward over each `rune`. If a rune is part of the provided set, the function removes it. If not, the function stops its iteration and returns the remaining string.
50 | - On the other hand, `TrimSuffix` returns a string without a provided trailing suffix.
51 | - Also, removing the trailing suffix **isn’t a repeating** operation, so `TrimSuffix("123xoxo", "xo")` returns `123xo`.
52 | - The principle is the same for the left-hand side of a string with `TrimLeft` and `TrimPrefix`.
53 |
54 |
55 | ## #39: Under-optimized string concatenation
56 |
57 | - Concatenating strings using `+=` does not perform well when we need to concatenate many strings. 🎯 Don't forget one of the core characteristics of a string: its **immutability**. Therefore, each iteration doesn’t update the string; it reallocates a new string in memory, which significantly impacts performance.
58 | - Solution is to use `strings.Builder`. Using this struct, we can also append:
59 | - A byte slice using `Write`.
60 | - A single byte using `WriteByte`.
61 | - A single rune using `WriteRune`.
62 | - **Internally**, `strings.Builder` holds a **byte slice**. Each call to `WriteString` results in a call to `append` on this slice.
63 | - There are two impacts:
64 | - First, this struct shouldn’t be used **concurrently**, as the calls to `append` would lead to **race conditions**.
65 | - The second impact is something that we saw in mistake #21, “Inefficient slice initialization”: if the future length of a slice is already known, we should **preallocate** it. For that purpose, `strings.Builder` exposes a method `Grow(n int)` to guarantee space for another `n` bytes.
66 | ```go
67 | func concat(values []string) string {
68 | total := 0
69 | for i := 0; i < len(values); i++ {
70 | total += len(values[i])
71 | }
72 | sb := strings.Builder{}
73 | sb.Grow(total)
74 | for _, value := range values {
75 | _, _ = sb.WriteString(value)
76 | }
77 | return sb.String()
78 | }
79 | ```
80 | - 👍 `strings.Builder` is the recommended solution to concatenate a list of strings. Usually, this solution should be used within a **loop**.
81 |
82 | ## #40: Useless string conversions
83 |
84 | - When choosing to work with a `string` or a `[]byte`, most programmers tend to favor strings for convenience. But most I/O is actually done with `[]byte`.
85 | - There is a price to pay when converting a `[]byte` into a `string` and then converting a `string` into a `[]byte`. Memory-wise, each of these conversions requires an extra **allocation**. Indeed, even though a string is backed by a `[]byte`, converting a `[]byte` into a `string` requires a **copy** of the byte slice. It means a new memory allocation and a copy of all the bytes.
86 | - Indeed, all the **exported functions** of the `strings` package also have alternatives in the `bytes` package: `Split`, `Count`, `Contains`, `Index`, and so on. Hence, whether we’re doing I/O or not, we should first check whether we could implement a whole workflow using bytes instead of strings and avoid the price of additional conversions.
87 |
88 | ## #41: Substrings and memory leaks
89 |
90 | - To extract a subset of a string, we can use the following syntax:
91 | ```go
92 | s1 := "Hello, World!"
93 | s2 := s1[:5] // Hello
94 | ```
95 | - `s2` is constructed as a substring of `s1`. This example creates a string from the **first five bytes**, not the **first five runes**. Hence, we shouldn’t use this syntax in the case of runes encoded with multiple bytes. Instead, we should convert the input string into a `[]rune` type first:
96 | ```go
97 | s1 := "Hêllo, World!"
98 | s2 := string([]rune(s1)[:5]) // Hêllo
99 | ```
100 | - When doing a substring operation, the Go specification doesn’t specify whether the resulting string and the one involved in the substring operation should share the
101 | same data. However, the standard Go compiler does let them **share the same backing array**, which is probably the best solution **memory-wise** and **performance-wise** as it prevents a new allocation and a copy.
102 | - We mentioned that log messages can be quite heavy. `log[:36] `will create a new string referencing the same backing array. Therefore, each uuid string that we store in
103 | memory will contain not just 36 bytes but the number of bytes in the initial log string: potentially, thousands of bytes.
104 | - How can we fix this? By making a **deep copy** of the substring so that the internal byte slice of uuid references a new backing array of only 36 bytes:
105 | ```go
106 | func (s store) handleLog(log string) error {
107 | if len(log) < 36 {
108 | return errors.New("log is not correctly formatted")
109 | }
110 | uuid := string([]byte(log[:36])) // The copy is performed by converting the substring into a []byte first and then into a string again.
111 | s.store(uuid)
112 | // Do something
113 | }
114 | ```
115 | - As of Go 1.18, the standard library also includes a solution with `strings.Clone` that returns a fresh copy of a string: `uuid := strings.Clone(log[:36])`.
116 |
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/chapter-06-functions-and-methods.md:
--------------------------------------------------------------------------------
1 | # Chapter 6: Functions and methods
2 |
3 | ## #42: Not knowing which type of receiver to use
4 |
5 | - Choosing between value and pointer receivers isn’t always straightforward. Let’s discuss some of the conditions to help us choose.
6 | - **A receiver must be a pointer:**
7 | - If the method needs to **mutate** the receiver. This rule is also valid if the receiver is a **slice** and a method needs to append elements:
8 | ```go
9 | type slice []int
10 | func (s *slice) add(element int) {
11 | *s = append(*s, element)
12 | }
13 | ```
14 |
15 | - If the method receiver contains a field that cannot be copied: for example, a type part of the `sync` package.
16 | - **A receiver should be a pointer**:
17 | - If the receiver is a **large** object. Using a pointer can make the call more efficient, as doing so prevents making an **extensive copy**. When in doubt about how
18 | large is large, benchmarking can be the solution; it’s pretty much impossible to state a specific size, because it depends on many factors.
19 | - **A receiver must be a value**:
20 | - If we have to enforce a receiver’s **immutability**.
21 | - If the receiver is a `map`, `function`, or `channel`. Otherwise, a compilation error occurs.
22 | - **A receiver should be a value**:
23 | - If the receiver is a slice that doesn’t have to be mutated.
24 | - If the receiver is a **small array** or **struct** that is naturally a value type without mutable fields, such as `time.Time`.
25 | - If the receiver is a basic type such as `int`, `float64`, or `string`.
26 |
27 | > ⚠️ Mixing receiver types should be avoided in general but is not forbidden in 100% of cases.
28 |
29 | ## #43: Never using named result parameters
30 |
31 | - What are the rules regarding named result parameters?
32 | - In most cases, using named result parameters in the context of an **interface definition** can increase **readability** without leading to any side effects. But there’s no strict rule in the context of a **method implementation**.
33 | - In some cases, named result parameters can also increase readability: for example, if two parameters have the same type:
34 | ```go
35 | type locator interface {
36 | // Just by reading this code, can you guess what these two float32 results are?
37 | // Perhaps they are a latitude and a longitude, but in which order? using named
38 | // result parameters makes it clear.
39 | getCoordinates(address string) (lat, lng float32, err error)
40 | }
41 | ```
42 | - In other cases, they can also be used for **convenience**:
43 | ```go
44 | func ReadFull(r io.Reader, buf []byte) (n int, err error) {
45 | // Because both n and err are initialized to their zero value, the implementation is shorter.
46 | for len(buf) > 0 && err == nil {
47 | var nr int
48 | nr, err = r.Read(buf)
49 | n += nr
50 | buf = buf[nr:]
51 | }
52 | return
53 | }
54 | ```
55 | - 👍 Therefore, we should use named result parameters sparingly when there’s a **clear benefit**.
56 |
57 | > 🎯 One note regarding naked returns (returns without arguments): they are considered acceptable in **short functions**; otherwise, they can harm readability because the reader must remember the outputs throughout the entire function. We should also be consistent within the scope of a function, using either only naked returns or only returns with arguments.
58 |
59 | ## #44: Unintended side effects with named result parameters
60 |
61 | - Here’s the new implementation of the `getCoordinates` method. Can you spot what’s wrong with this code?
62 | ```go
63 | func (l loc) getCoordinates(ctx context.Context, address string) (
64 | lat, lng float32, err error) {
65 | isValid := l.validateAddress(address)
66 | if !isValid {
67 | return 0, 0, errors.New("invalid address")
68 | }
69 | if ctx.Err() != nil {
70 | return 0, 0, err
71 | }
72 | // Get and return coordinates
73 | }
74 | ```
75 | - The error might not be obvious at first glance. Here, the error returned in the if `ctx.Err() != nil` scope is `err`. But we haven’t assigned any value to the `err` variable. It’s still assigned to the zero value of an error type: `nil`. Hence, this code will always return a nil error ‼️
76 | - ⚠️ Remain cautious when using named result parameters, to avoid potential side effects.
77 |
78 | ## #45: Returning a nil receiver
79 |
80 | - Consider the example below:
81 | ```go
82 | func (c Customer) Validate() error {
83 | var m *MultiError
84 | if c.Age < 0 {
85 | m = &MultiError{}
86 | m.Add(errors.New("age is negative"))
87 | }
88 | if c.Name == "" {
89 | if m == nil {
90 | m = &MultiError{}
91 | }
92 | m.Add(errors.New("name is nil"))
93 | }
94 | return m
95 | }
96 | ```
97 | - Now, let’s test this implementation by running a case with a valid `Customer`:
98 | ```go
99 | customer := Customer{Age: 33, Name: "John"}
100 | if err := customer.Validate(); err != nil {
101 | log.Fatalf("customer is invalid: %v", err)
102 | }
103 | // Output:
104 | // > 2021/05/08 13:47:28 customer is invalid:
105 | ```
106 | - 🎯 In Go, we have to know that a pointer receiver can be `nil`. In Go, a method is just **syntactic sugar** for a function whose **first parameter** is the receiver.
107 | - `m` is initialized to the zero value of a pointer: `nil`. Then, if all the checks are valid, the argument provided to the return statement isn’t `nil` **directly** but a **nil pointer** ⚠️.
108 | - Because a `nil` pointer is a **valid receiver**, **converting the result into an interface** won’t **yield** a `nil` value. In other words, the caller of `Validate` will always get a **non-nil** error.
109 | - To make this point clear, let’s remember that in Go, an interface is a **dispatch wrapper**. Here, the *wrappee* is `nil` (the `MultiError` pointer), whereas the *wrapper* isn’t (the error interface).
110 | - Therefore, regardless of the `Customer` provided, the caller of this function will always receive a non-nil error.
111 | 
112 |
113 | - Remember: An interface converted from a `nil` pointer isn’t a `nil` interface ‼️ For that reason, when we have to return an **interface**, we should return not a `nil` **pointer** but a `nil` **value** directly.
114 |
115 | ## #46: Using a filename as a function input
116 |
117 | - When creating a new function that needs to read a file, passing a filename isn’t considered a best practice and can have negative effects, such as making **unit tests harder to write**.
118 | - In Go, the idiomatic way is to start from the **reader’s abstraction** 👍.
119 | - What are the benefits of this approach? First, this function **abstracts the data source**. Is it a file? An HTTP request? A socket input? It’s not important for the function. Because `*os.File` and the `Body` field of `http.Request` implement `io.Reader`, we can reuse the same function regardless of the input type.
120 | - Another benefit is related to **testing**. We mentioned that creating one file per test case could quickly become **cumbersome** 👎. Now that `countEmptyLines` accepts an `io.Reader`, we can implement unit tests by creating an `io.Reader` from a string:
121 | ```go
122 | func TestCountEmptyLines(t *testing.T) {
123 | emptyLines, err := countEmptyLines(strings.NewReader(
124 | `foo
125 | bar
126 | baz
127 | `))
128 | // Test logic
129 | }
130 | ```
131 | - In this test, we create an `io.Reader` using `strings.NewReader` from a **string literal directly**. Therefore, we don’t have to create one file per test case.
132 | - Each test case can be **self-contained**, improving the test **readability** and **maintainability** as we don’t have to open another file to see the content.
133 |
134 | ## #47: Ignoring how defer arguments and receivers are evaluated
135 |
136 | ### Argument evaluation
137 |
138 | - Consider the example below:
139 | ```go
140 | func f() error {
141 | var status string
142 | defer notify(status)
143 | defer incrementCounter(status)
144 |
145 | if err := foo(); err != nil {
146 | status = StatusErrorFoo
147 | return err
148 | }
149 |
150 | if err := bar(); err != nil {
151 | status = StatusErrorBar
152 | return err
153 | }
154 | status = StatusSuccess
155 | return nil
156 | }
157 | ```
158 | - We need to understand something crucial about argument evaluation in a `defer` function:
159 | - The arguments are **evaluated right away**, not once the surrounding function returns ⚠️.
160 | - How can we solve this problem if we want to keep using `defer`? There are two leading solutions.
161 | - The first solution is to pass a string **pointer** to the `defer` functions: `defer notify(&status)`.
162 | - There’s another solution: calling a **closure** as a `defer` statement:
163 | ```go
164 | func f() error {
165 | var status string
166 | defer func() {
167 | notify(status)
168 | incrementCounter(status)
169 | }()
170 |
171 | // The rest of the function is unchanged
172 | }
173 | ```
174 | - `status` is evaluated once the **closure is executed**, not when we call `defer`.
175 | - This solution also works and doesn’t require `notify` and `incrementCounter` to change their **signature**.
176 |
177 |
178 | ### Pointer and value receivers
179 |
180 | - Consider the example below:
181 | ```go
182 | func main() {
183 | s := Struct{id: "foo"}
184 | defer s.print()
185 | s.id = "bar"
186 | }
187 | type Struct struct {
188 | id string
189 | }
190 | func (s Struct) print() {
191 | fmt.Println(s.id)
192 | }
193 | ```
194 | - As with arguments, calling `defer` makes the receiver be evaluated **immediately**. Hence, `defer` delays the method’s execution with a struct that contains an `id` field equal to `foo`.
195 | - Conversely, if the pointer is a receiver, the potential changes to the receiver **after the call** to `defer` are visible.
196 |
--------------------------------------------------------------------------------
/100-go-mistakes-and-how-to-avoid-them/chapter-07-error-management.md:
--------------------------------------------------------------------------------
1 | # Chapter 7: Error management
2 |
3 | ## #48: Panicking
4 |
5 | - Example:
6 | ```go
7 | func main() {
8 | defer func() {
9 | if r := recover(); r != nil {
10 | fmt.Println("recover", r)
11 | }
12 | }()
13 |
14 | f()
15 | }
16 |
17 | func f() {
18 | fmt.Println("a")
19 | panic("foo")
20 | fmt.Println("b")
21 | }
22 | ```
23 | - Once a panic is triggered, it continues up the call stack until either the current goroutine has returned or panic is caught with `recover`.
24 | - Panicking in Go should be used **sparingly**. We have seen two prominent cases:
25 | - 👍 One to signal a **programmer error**:
26 | - Invalid HTTP status code: `code < 100 || code > 999 `
27 | - SQL driver is `nil` (`driver.Driver` is an interface) or has already been registered: `driver == nil`
28 | - 👍 And another where our app fails to create a **mandatory dependency**. Hence, there are exceptional conditions that lead us to stop the app.
29 | - We depend on a service that needs to validate the provided email address with `MustCompile`.
30 | - In most other cases, error management should be done with a function that returns a **proper error** type as the last return argument.
31 |
32 | ## #49: Ignoring when to wrap an error
33 |
34 | - Error wrapping is about wrapping or packing an error inside a wrapper container that also makes the source error available.
35 | - In general, the two main use cases for error wrapping are the following:
36 | - Adding additional context to an error
37 | - Marking an error as a specific error
38 | - Before Go 1.13, to wrap an error, the only option without using an external library was to create a custom error type:
39 | ```go
40 | type BarError struct {
41 | Err error
42 | }
43 | func (b BarError) Error() string {
44 | return "bar failed:" + b.Err.Error()
45 | }
46 | ```
47 | - To overcome this situation, Go 1.13 introduced the `%w` directive:
48 | ```go
49 | if err != nil {
50 | return fmt.Errorf("bar failed: %w", err)
51 | }
52 | ```
53 | - The last option we will discuss is to use the `%v` directive, instead:
54 | ```go
55 | if err != nil {
56 | return fmt.Errorf("bar failed: %v", err)
57 | }
58 | ```
59 | - The difference is that the error itself isn’t wrapped. We transform it into another error to add context, and the source error is no longer available.
60 | - Let’s review all the different options we tackled:
61 | | Option | Extra Context | Marking an error | Source error available |
62 | | ------------------------ | ----------------------------------------------------------------- | ---------------- | --------------------------------------------------------------------- |
63 | | Returning error directly | No | No | Yes |
64 | | Custom error type | Possible (if the error type contains a string field, for example) | Yes | Possible (if the source error is exported or accessible via a method) |
65 | | fmt.Errorf with %w | Yes | No | Yes |
66 | | fmt.Errorf with %v | Yes | No | No |
67 | - To summarize, when handling an error, we can decide to wrap it. Wrapping is about adding additional context to an error and/or marking an error as a specific type.
68 | - If we need to mark an error, we should create a custom error type.
69 | - However, if we just want to add extra context, we should use `fmt.Errorf` with the `%w` directive as it doesn’t require creating a new error type.
70 | - Yet, error wrapping creates potential **coupling** as it makes the source error available for the caller.
71 | - If we want to prevent it, we shouldn’t use error wrapping but error transformation, for example, using `fmt.Errorf` with the `%v` directive.
72 |
73 | ## #50: Checking an error type inaccurately
74 |
75 | 
76 |
77 | - Go 1.13 came with a directive to wrap an error and a way to check whether the **wrapped error** is of a certain type with `errors.As`.
78 | - This function **recursively** unwraps an error and returns true if an error in the chain matches the expected type.
79 | ```go
80 | // Get transaction ID
81 | amount, err := getTransactionAmount(transactionID)
82 | if err != nil {
83 | if errors.As(err, &transientError{}) {
84 | http.Error(w, err.Error(), http.StatusServiceUnavailable)
85 | } else {
86 | http.Error(w, err.Error(), http.StatusBadRequest)
87 | }
88 | return
89 | }
90 | ```
91 | - ▶️ Regardless of whether the error is returned directly by the function we call or wrapped inside an error, `errors.As` will be able to recursively unwrap our main error and see if one of the errors is a specific type.
92 |
93 | ## #51: Checking an error value inaccurately
94 |
95 | - A **sentinel error** is an error defined as a global variable:
96 | ```go
97 | import "errors"
98 | var ErrFoo = errors.New("foo") // the convention is to start with Err followed by the error type
99 | ```
100 | - The general principle behind sentinel errors is to convey **expected** error that clients will expect to check. Therefore, as general guidelines:
101 | - 👍 **Expected** errors should be designed as error **values** (sentinel errors): `var ErrFoo = errors.New("foo")`.
102 | - 👍 **Unexpected** errors should be designed as error **types**: `type BarError struct { … }`, with `BarError` implementing the error interface.
103 | - We have seen how `errors.As` is used to check an error against a **type**. With error **values**, we can use its counterpart: `errors.Is`:
104 | ```go
105 | err := query()
106 | if err != nil {
107 | if errors.Is(err, sql.ErrNoRows) {
108 | // ...
109 | } else {
110 | // ...
111 | }
112 | }
113 | ```
114 | - ▶️ if we use error wrapping in our app with the `%w` directive and `fmt.Errorf`, checking an error against a specific value should be done using` errors.Is` instead of `==`. Thus, even if the sentinel error is **wrapped**, `errors.Is` can recursively unwrap it and compare each error in the chain against the provided value.
115 |
116 | ## #52: Handling an error twice
117 |
118 | - Consider the log below:
119 | ```
120 | 2021/06/01 20:35:12 invalid latitude: 200.000000
121 | 2021/06/01 20:35:12 failed to validate source coordinates
122 | ```
123 | - Having **two log lines** for a **single error** is a problem. Why?
124 | - Because it makes **debugging harder**. For example, if this function is called multiple times concurrently, the two messages may not be one after the other in the logs, making the debugging process more complex.
125 | - As a rule of thumb, an error should be handled **only once**. Logging an error is handling an error, and so is returning an error. Hence, we should **either log or return** an error, **never both** ❗.
126 | - Let’s rewrite our implementation to handle errors only once:
127 | ```go
128 | func GetRoute(srcLat, srcLng, dstLat, dstLng float32) (Route, error) {
129 | err := validateCoordinates(srcLat, srcLng)
130 | if err != nil {
131 | return Route{}, err
132 | }
133 | err = validateCoordinates(dstLat, dstLng)
134 | if err != nil {
135 | return Route{}, err
136 | }
137 | return getRoute(srcLat, srcLng, dstLat, dstLng)
138 | }
139 | ```
140 | - The issue with this implementation is that we lost the origin of the error, so we need to **add additional context**:
141 | - Let’s rewrite the latest version of our code using Go 1.13** error wrapping**:
142 | ```go
143 | func GetRoute(srcLat, srcLng, dstLat, dstLng float32) (Route, error) {
144 | err := validateCoordinates(srcLat, srcLng)
145 | if err != nil {
146 | return Route{}, fmt.Errorf("failed to validate source coordinates: %w", err)
147 | }
148 | err = validateCoordinates(dstLat, dstLng)
149 | if err != nil {
150 | return Route{}, fmt.Errorf("failed to validate target coordinates: %w", err)
151 | }
152 | return getRoute(srcLat, srcLng, dstLat, dstLng)
153 | }
154 | ```
155 |
156 | ## #53: Not handling an error
157 |
158 | - When we want to ignore an error in Go, there’s only one way to write it:
159 | ```go
160 | _ = notify() // good
161 | notify() // bad
162 | ```
163 | - 👍 It may be a good idea to write a comment that indicates the **rationale** for **why** the error is **ignored**.
164 | - Even if we are sure that an error can and should be ignored, we must do so **explicitly** by assigning it to the blank identifier. This way, a future reader will understand that we ignored the error intentionally.
165 |
166 | ## #54: Not handling defer errors
167 |
168 | - As discussed in the previous section, if we don’t want to handle the error, we should ignore it explicitly using the blank identifier:
169 | ```go
170 | defer func() {
171 | _ = rows.Close()
172 | }()
173 | ```
174 | - In this case, calling `Close()` returns an error when it fails to free a DB connection from the pool. Hence, ignoring this error is probably not what we want to do.
175 | - Most likely, a better option would be to log a message, or propagate it to the caller of `getBalance` so that they can decide how to handle it?
176 | ```go
177 | defer func() {
178 | err := rows.Close()
179 | if err != nil {
180 | return err
181 | }
182 | }()
183 | ```
184 | - This implementation doesn’t compile. Indeed, the `return` statement is associated with the **anonymous** `func()` function, not `getBalance`. If we want to tie the error returned by `getBalance` to the error caught in the `defer` call, we must use **named result parameters**. Let’s write the first version:
185 | ```go
186 | func getBalance(db *sql.DB, clientID string) (balance float32, err error) {
187 | rows, err := db.Query(query, clientID)
188 | if err != nil {
189 | return 0, err
190 | }
191 | defer func() {
192 | err = rows.Close()
193 | }()
194 | if rows.Next() {
195 | err := rows.Scan(&balance)
196 | if err != nil {
197 | return 0, err
198 | }
199 | return balance, nil
200 | }
201 | }
202 | ```
203 | - This code may look okay, but there’s a problem with it. If `rows.Scan` returns an error, `rows.Close` is executed anyway; but because this call overrides the error returned by `getBalance`, instead of returning an error, we may return a `nil` error if `rows.Close` returns successfully.
204 | - Here’s our final implementation of the anonymous function:
205 | ```go
206 | defer func() {
207 | closeErr := rows.Close()
208 | if err != nil {
209 | if closeErr != nil {
210 | log.Printf("failed to close rows: %v", err)
211 | }
212 | return
213 | }
214 | err = closeErr
215 | }()
216 | ```
217 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2021 Noteworthy
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # software-engineering-notes
2 |
3 | My software engineering notes.
4 |
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/README.md:
--------------------------------------------------------------------------------
1 | # A Common-Sense Guide to Data Structures and Algorithms
2 |
3 | Study notes taken from reading the second edition of "A Common-Sense Guide to Data Structures and Algorithms, 2nd edition" by *Jay Wengrow*.
4 |
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/assets/comparaison-of-different-insertion-sort-scenarios.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/a-common-sense-guide-to-data-structures-and-algorithms/assets/comparaison-of-different-insertion-sort-scenarios.png
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/assets/different-case-scenarios.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/a-common-sense-guide-to-data-structures-and-algorithms/assets/different-case-scenarios.png
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/assets/linear-vs-binary-search.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/a-common-sense-guide-to-data-structures-and-algorithms/assets/linear-vs-binary-search.png
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/assets/plotted-on-vs-o1-vs-olog.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/a-common-sense-guide-to-data-structures-and-algorithms/assets/plotted-on-vs-o1-vs-olog.png
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/assets/plotted-on-vs-o1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/a-common-sense-guide-to-data-structures-and-algorithms/assets/plotted-on-vs-o1.png
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/assets/plotted-on2-vs-on.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/a-common-sense-guide-to-data-structures-and-algorithms/assets/plotted-on2-vs-on.png
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/assets/selection-vs-insertion-sort.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/a-common-sense-guide-to-data-structures-and-algorithms/assets/selection-vs-insertion-sort.png
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/assets/tree.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/a-common-sense-guide-to-data-structures-and-algorithms/assets/tree.png
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/chapter-01-why-data-structures-matter.md:
--------------------------------------------------------------------------------
1 | # Why Data Structures Matter
2 |
3 | - Depending on how you choose to **organize** your data, your program may run faster or slower by orders of magnitude.
4 |
5 | ## The Array: The Foundational Data Structure
6 |
7 | - One of the most basic data structures in computer science.
8 | - Many data structures are used in four basic ways, which we refer to as **operations**. These operations are:
9 | - Read
10 | - Insert
11 | - Delete
12 | - Search
13 |
14 | ## Measuring Speed
15 |
16 | - When we measure how “fast” an operation takes, we do **not** refer to how fast the operation takes in terms of **pure time**, but instead in how many **steps** it takes 🎯.
17 | - Measuring the speed of an operation in terms of time is undependable, since the time will always change depending on the **hardware** it is run on.
18 | - Measuring the speed of an operation is also known as measuring its **time complexity**.
19 |
20 | ## Reading
21 |
22 | - Reading from an array is an efficient operation, since the computer can read any index by jumping to any memory address in **one step**.
23 |
24 | ## Searching
25 |
26 | - Searching, though, is tedious, since the computer has no way to jump to a particular value.
27 | - For N cells in an array, linear search would take a maximum of **N steps**.
28 |
29 | ## Insertion
30 |
31 | - The efficiency of inserting a new piece of data into an array depends on **where within the array** you’re inserting it.
32 | - Inserting at the end of the array takes just one step. But there’s one hitch, we need an extra **memory allocation**.
33 | - Inserting to the middle of the array need to shift pieces of data to make room for what we’re inserting, leading to additional steps.
34 | - The **worst-case** scenario for insertion into an array - that is, the scenario in which insertion takes the most steps - is when we insert data at the **beginning** of the array.
35 | - We can say that insertion in a worst-case scenario can take **N + 1** steps for an array containing `N` elements.
36 |
37 | ## Deletion
38 |
39 | - Like insertion, the **worst-case** scenario of deleting an element is deleting the **very first** element of the array. This is because index 0 would become empty, and we’d have to shift all the remaining elements to the left to fill the gap.
40 | - We can say then, that for an array containing `N` elements, the maximum number of steps that deletion would take is **N steps**.
41 |
42 | ## Sets: How a Single Rule Can Affect Efficiency
43 |
44 | - A set is a data structure that does **not allow duplicate** values to be contained within it.
45 | - Reading / searching a set is exactly the same as reading / searching an array.
46 | - Insertion, however, is where arrays and sets diverge.
47 | - Every insertion into a set first requires a search.
48 | - In the worst-case scenario, where we’re inserting a value at the beginning of a set, the computer needs to search N cells to ensure that the set doesn’t already contain that value, another N steps to shift all the data to the right, and another final step to insert the new value. ▶️ That’s a total of 2N + 1 steps.
49 |
50 | ## Exercises:
51 |
52 | > 1. For an array containing 100 elements, provide the number of steps the following operations would take:
53 |
54 | a. Reading -> 1
55 | b. Searching for a value not contained within the array -> 100
56 | c. Insertion at the beginning of the array -> 101
57 | d. Insertion at the end of the array -> 1
58 | e. Deletion at the beginning of the array -> 100
59 | f. Deletion at the end of the array -> 1
60 |
61 | > 2. For an array-based set containing 100 elements, provide the number of steps the following operations would take:
62 |
63 | a. Reading -> 1
64 | b. Searching for a value not contained within the array -> 100
65 | c. Insertion of a new value at the beginning of the set -> 201
66 | d. Insertion of a new value at the end of the set -> 101
67 | e. Deletion at the beginning of the set -> 100
68 | f. Deletion at the end of the set -> 1
69 |
70 | > 3. Normally the search operation in an array looks for the first instance of a given value. But sometimes we may want to look for every instance of a given value. For example, say we want to count how many times the value “apple” is found inside an array. How many steps would it take to find all the “apples”? Give your answer in terms of N
71 |
72 | N
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/chapter-02-why-algorithms-matter.md:
--------------------------------------------------------------------------------
1 | # Why Algorithms Matter
2 |
3 | - Another major factor can affect the efficiency of our code: the proper selection of which **algorithm** to use.
4 |
5 | ## Ordered Arrays
6 |
7 | - When inserting into an ordered array, we need to always conduct a search before the actual insertion to determine the correct spot for the insertion.
8 | - This is one difference in performance between a classic array and an ordered array.
9 | - Interestingly, the number of steps for insertion **remains similar** no matter where in the ordered array our new value ends up 🤒.
10 |
11 | ## Searching an Ordered Array
12 |
13 | - With an ordered array, we can **stop a search early** even if the value isn’t contained within the array.
14 | - In this light, linear search can take fewer steps in an ordered array than in a classic array in certain situations. That being said, if we’re searching for a value that happens to be the **final value** or **not within the array** at all, we will still end up searching each and every cell 🤕.
15 |
16 | ## Binary Search
17 |
18 | - Is a searching algorithm used in a sorted array by repeatedly dividing the search interval in **half**.
19 | - Here is an iterative implementation of binary search in C++:
20 | ```c++
21 | #include
22 |
23 | int binary_search(int arr[], int low, int high, int x) {
24 | while (low <= high) {
25 | int mid = low + (high - low) / 2;
26 | if (arr[mid] == x) {
27 | return mid;
28 | }
29 | if (arr[mid] < x ) {
30 | low = mid+1;
31 | } else {
32 | high = mid -1;
33 | }
34 | }
35 | return -1;
36 | }
37 |
38 | int main () {
39 | int arr[] = {2, 3, 4, 10, 40};
40 | int x = 10;
41 | int n = sizeof(arr) / sizeof(arr[0]);
42 | int result = binary_search(arr, 0, n-1, x);
43 | if (result == -1) {
44 | std::cout << "Element is not present in array" << std::endl;
45 | } else {
46 | std::cout << "Element is present in array at index: " << result << std::endl;
47 | }
48 | return 0;
49 | }
50 | ```
51 |
52 | ## Binary Search vs. Linear Search
53 |
54 | - With ordered arrays of a **small size**, the algorithm of binary search **doesn’t** have much of an **advantage** over linear search.
55 | - When we use binary search, however, each guess we make eliminates half of the possible cells we’d have to search.
56 | - The pattern that emerges is that for each time we **double** the **size** of the ordered array, the number of steps needed for binary search **increases by one**. This makes sense, as each lookup eliminates half of the elements from the search.
57 | 
58 |
59 | ## Exercises
60 |
61 | > 1. How many steps would it take to perform a linear search for the number 8 in the ordered array, [2, 4, 6, 8, 10, 12, 13]?
62 |
63 | 4
64 |
65 | > 2. How many steps would binary search take for the previous example?
66 |
67 | 1
68 |
69 | > 3. What is the maximum number of steps it would take to perform a binary search on an array of size 100,000?
70 |
71 | Keep dividing until we get down to 1.
72 |
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/chapter-03-o-yes-big-o-notation.md:
--------------------------------------------------------------------------------
1 | # O Yes! Big O Notation
2 |
3 | - To help ease communication regarding time complexity, computer scientists have borrowed a concept from the world of mathematics to describe a concise and consistent language around the efficiency of data structures and algorithms. Known as **Big O** Notation, this formalized expression of these concepts allows us to easily categorize the efficiency of a given algorithm and convey it to others.
4 |
5 | ## Big O: How Many Steps Relative to N Elements?
6 |
7 | - As we’ve previously phrased it: for N elements in the array, linear search can take up to N steps ➡️ `O(N)`.
8 | - Some pronounce this as “*Big Oh of N.*” Others call it “*Order of N.*” My personal preference, however, is “*Oh of N*.”
9 | - An algorithm that is `O(N)` is also known as having **linear time**.
10 | - Reading from an array takes just one step. So, we express this as `O(1)`, which I pronounce “*Oh of 1.*”
11 | - `O(1)` algorithm can also be referred to as having **constant time**.
12 |
13 | ## The Soul of Big O
14 |
15 | - We have learned that if there are `N` data elements, how many steps will the algorithm take? While that key question is indeed the strict definition of Big O, there’s actually more to Big O than meets the eye 🤔.
16 | - Let’s say we have an algorithm that always takes **three** steps no matter how much data there is. That is, for `N elements, the algorithm always takes three steps. How would you express that in terms of Big O? Based on everything you’ve learned up to this point, you’d probably say that
17 | it’s `O(3)`. However, it’s actually `O(1)` 🙃.
18 | - The soul of Big O is what Big O is truly concerned about: *how will an algorithm’s performance change as the data increases ❓*
19 | 
20 |
21 | - Notice that `O(N)` makes a perfect **diagonal line**. This is because for every additional piece of data, the algorithm takes one additional step. Accordingly, the more data, the more steps the algorithm will take.
22 | - Contrast this with `O(1)`, which is a perfect **horizontal line**. No matter how much data there is, the number of steps remain **constant**.
23 | - If we were to describe the **efficiency** of **linear search** in its **totality**, we’d say that linear search is:
24 | - `O(1)` in a **best-case** scenario,
25 | - `O(N)` in a **worst-case** scenario.
26 | - While Big O effectively describes both the best- and worst-case scenarios of a given algorithm, Big O Notation **generally** refers to the **worst-case** scenario unless specified otherwise.
27 |
28 | ## An Algorithm of the Third Kind
29 |
30 | - `O(log N)` is the Big O way of describing an algorithm that **increases** one step each time the data is **doubled**. As you learned in the previous chapter, **binary search** does just that.
31 | 
32 |
33 | ## O(log N) Explained
34 |
35 | - In computer science, whenever we say `O(log N)`, it’s actually shorthand for saying `O(log2 N)`. We just omit that small 2 for convenience.
36 | - The following table demonstrates a striking difference between the efficiencies of `O(N)` and `O(log N)`:
37 |
38 | | N Elements | O(N) | O(log N) |
39 | | ---------- | ---- | -------- |
40 | | 8 | 8 | 3 |
41 | | 16 | 16 | 4 |
42 | | 32 | 32 | 5 |
43 | | 64 | 64 | 6 |
44 | | 128 | 128 | 7 |
45 | | 256 | 256 | 8 |
46 | | 512 | 512 | 9 |
47 | | 1024 | 1024 | 10 |
48 |
49 | ## Exercises
50 |
51 | > 1. Use Big O Notation to describe the time complexity of the following function that determines whether a given year is a leap year:
52 | > ```js
53 | > function isLeapYear(year) {
54 | > return (year % 100 === 0) ? (year % 400 === 0) : (year % 4 === 0);
55 | > }
56 | > ```
57 |
58 | 0(1)
59 |
60 |
61 | > 2. Use Big O Notation to describe the time complexity of the following function that sums up all the numbers from a given array:
62 | ```js
63 | function arraySum(array) {
64 | let sum = 0;
65 | for(let i = 0; i < array.length; i++) {
66 | sum += array[i];
67 | }
68 | return sum;
69 | }
70 | ```
71 |
72 | O(N)
73 |
74 | > 3. Imagine you have a chessboard and put a single grain of rice on one square. On the second square, you put 2 grains of rice, since that is double the amount of rice on the previous square. On the third square, you put 4 grains. On the fourth square, you put 8 grains, and on the fifth square, you put 16 grains, and so on.
75 |
76 | > Use Big O Notation to describe the time complexity of this function, which is below:
77 | ```js
78 | function chessboardSpace(numberOfGrains) {
79 | let chessboardSpaces = 1;
80 | let placedGrains = 1;
81 | while (placedGrains < numberOfGrains) {
82 | placedGrains *= 2;
83 | chessboardSpaces += 1;
84 | }
85 | return chessboardSpaces;
86 | }
87 | ```
88 |
89 | O(Log N)
90 |
91 | > 4. The following function accepts an array of strings and returns a new array that only contains the strings that start with the character "a". Use Big O Notation to describe the time complexity of the function:
92 | ```js
93 | function selectAStrings(array) {
94 | let newArray = [];
95 | for(let i = 0; i < array.length; i++) {
96 | if (array[i].startsWith("a")) {
97 | newArray.push(array[i]);
98 | }
99 | }
100 | return newArray;
101 | }
102 | ```
103 |
104 | O(N)
105 |
106 | > 5. The following function calculates the median from an ordered array. Describe its time complexity in terms of Big O Notation:
107 | ```js
108 | function median(array) {
109 | const middle = Math.floor(array.length / 2);
110 | // If array has even amount of numbers:
111 | if (array.length % 2 === 0) {
112 | return (array[middle - 1] + array[middle]) / 2;
113 | } else { // If array has odd amount of numbers:
114 | return array[middle];
115 | }
116 | }
117 | ```
118 |
119 | 1
120 |
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/chapter-04-speeding-up-your-code-with-big-o.md:
--------------------------------------------------------------------------------
1 | # Speeding Up Your Code with Big O
2 |
3 | ## Bubble Sort
4 |
5 | - Bubble Sort is a **basic** sorting algorithm.
6 | - It is called *Bubble Sort* because in each **pass-through**, the highest unsorted value “bubbles” up to its correct position.
7 | - Because we made at **least one swap** during this pass-through, we need to conduct another pass-through ❗
8 | - Here’s an implementation of Bubble Sort in Python:
9 |
10 | ```py
11 | def bubble_sort(list):
12 | unsorted_until_index = len(list) - 1
13 | sorted = False
14 | while not sorted:
15 | sorted = True
16 | for i in range(unsorted_until_index):
17 | if list[i] > list[i+1]:
18 | list[i], list[i+1] = list[i+1], list[i]
19 | sorted = False
20 | unsorted_until_index -= 1
21 | return list
22 | ```
23 |
24 | ## The Efficiency of Bubble Sort
25 |
26 | - For N elements, we make `(N - 1) + (N - 2) + (N - 3) … + 1` **comparisons**.
27 | - In a **worst-case** scenario, where the array is sorted in **descending** order (the exact opposite of what we want), we’d actually need a swap for each comparison. So, we’d have **10 comparisons** and **10 swaps** in such a scenario for a grand total of 20 steps.
28 | - If you look at the growth of steps as N increases, you’ll see that it’s growing by approximately `N^2`. Take a look at the following table:
29 | | N Data Elements | # of Bubble Sort Steps | N2 |
30 | | --------------- | ---------------------- | ---- |
31 | | 5 | 20 | 25 |
32 | | 10 | 90 | 100 |
33 | | 20 | 380 | 400 |
34 | | 40 | 1560 | 1600 |
35 | | 80 | 6320 | 6400 |
36 |
37 | Because for `N` values, Bubble Sort takes `N^2` steps, in Big O, we say that Bubble Sort has an efficiency of `O(N2)`. O(N2) is considered to be a relatively **inefficient** algorithm, since as the data increases, the steps increase dramatically. Look at this graph, which compares `O(N2)` against the faster `O(N)`:
38 |
39 | 
40 |
41 | - One last note: `O(N2)` is also referred to as **quadratic time**.
42 |
43 | ## A Quadratic Problem
44 |
45 | Consider the example below which check for duplicated values in an array:
46 | ```js
47 | function hasDuplicateValue(array) {
48 | for(let i = 0; i < array.length; i++) {
49 | for(let j = 0; j < array.length; j++) {
50 | if(i !== j && array[i] === array[j]) {
51 | return true;
52 | }
53 | }
54 | }
55 | return false;
56 | }
57 | ```
58 |
59 | Very often (but not always), when an algorithm **nests one loop inside another**, the algorithm is `O(N2)`. So, whenever you see a nested loop, `O(N2)` alarm bells should go off in your head ⏰.
60 |
61 | ## A Linear Solution
62 |
63 | Another way to solve the duplicated value in an array is to track even value encountered in a "hash map".
64 | ```js
65 | function hasDuplicateValue(array) {
66 | let existingNumbers = [];
67 | for(let i = 0; i < array.length; i++) {
68 | if(existingNumbers[array[i]] === 1) {
69 | return true;
70 | } else {
71 | existingNumbers[array[i]] = 1;
72 | }
73 | }
74 | return false;
75 | }
76 | ```
77 |
78 | This approach will have an O(N) and we know that O(N) is much faster than O(N2), so by using this second approach, we’ve optimized our `hasDuplicateValue` function significantly. This is a huge speed boost (Neglecting the time complexity for insertion and searching in the map or the array ⚠️).
79 |
80 | ## Exercises
81 |
82 | > 1. Replace the question marks in the following table to describe how many steps occur for a given number of data elements across various types of Big O:
83 |
84 | | N Elements | O(N) | O(log N) | O(N2) |
85 | | ---------- | -------- | -------- | ----------- |
86 | | 100 | 100 | **7** | **10000** |
87 | | 2000 | **2000** | **11** | **4000000** |
88 |
89 |
90 | > 2. If we have an O(N2) algorithm that processes an array and find that it takes 256 steps, what is the size of the array?
91 |
92 | 16
93 |
94 | > Use Big O Notation to describe the time complexity of the following function. It finds the greatest product of any pair of two numbers within a given array:
95 |
96 | ```py
97 | def greatestProduct(array):
98 | greatestProductSoFar = array[0] * array[1]
99 | for i, iVal in enumerate(array):
100 | for j, jVal in enumerate(array):
101 | if i != j and iVal * jVal > greatestProductSoFar:
102 | greatestProductSoFar = iVal * jVal
103 | return greatestProductSoFar
104 | ```
105 |
106 | N^2
107 |
108 | > 4. The following function finds the greatest single number within an array, but has an efficiency of O(N2). Rewrite the function so that it becomes a speedy O(N):
109 |
110 |
111 | ```cpp
112 | int find_greatest(std::array &arr) {
113 |
114 | int greatest = arr[0];
115 | for (size_t i = 1; i < arr.size(); i++) {
116 | if (arr[i] > greatest) {
117 | greatest = arr[i];
118 | }
119 | }
120 | return greatest;
121 | }
122 | ```
123 |
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/chapter-05-optimizing-code-with-and-without-big-o.md:
--------------------------------------------------------------------------------
1 | # Optimizing Code with and Without Big O
2 |
3 | ## Selection Sort
4 |
5 | Selection Sort is a comparison-based sorting algorithm. It sorts an array by repeatedly selecting the **smallest** (or largest) element from the unsorted portion and **swapping** it with the first unsorted element. This process continues until the entire array is sorted.
6 |
7 | Here’s a JavaScript implementation of Selection Sort:
8 |
9 | ```js
10 | function selectionSort(array) {
11 | for(let i = 0; i < array.length - 1; i++) {
12 | let lowestNumberIndex = i;
13 | for(let j = i + 1; j < array.length; j++) {
14 | if(array[j] < array[lowestNumberIndex]) {
15 | lowestNumberIndex = j;
16 | }
17 | }
18 | if(lowestNumberIndex != i) {
19 | let temp = array[i];
20 | array[i] = array[lowestNumberIndex];
21 | array[lowestNumberIndex] = temp;
22 | }
23 | }
24 | return array;
25 | }
26 | ```
27 |
28 | ## The Efficiency of Selection Sort
29 |
30 | To put it in a way that works for arrays of all sizes, we’d say that for `N` elements, we make `(N - 1) + (N - 2) + (N - 3) … + 1` comparisons.
31 |
32 | As for swaps, though, we only need to make a maximum of one swap per passthrough.
33 |
34 | Here’s a side-by-side comparison of **Bubble Sort** and **Selection Sort**:
35 |
36 | | N Elements | Max # of Steps in Bubble Sort | Max # of Steps in Selection Sort |
37 | | ---------- | ----------------------------- | ---------------------------------- |
38 | | 5 | 20 | 14 (10 comparisons + 4 swaps) |
39 | | 10 | 90 | 54 (45 comparisons + 9 swaps) |
40 | | 20 | 380 | 199 (180 comparisons + 19 swaps) |
41 | | 40 | 1560 | 819 (780 comparisons + 39 swaps) |
42 | | 80 | 6320 | 3239 (3160 comparisons + 79 swaps) |
43 |
44 | Because Selection Sort takes **roughly half** of N^2 steps, it would seem reasonable that we’d describe the efficiency of Selection Sort as being `O(N2 / 2)`. That is, for N data elements, there are `N2 / 2` steps.
45 |
46 | In reality, however, Selection Sort is described in Big O as `O(N2)` 🤒, just like **Bubble Sort**. This is because of a major rule of Big O that I’m now introducing for the first time: *Big O Notation ignores constants* 📌.
47 |
48 | This is simply a mathematical way of saying that Big O Notation never includes regular numbers that aren’t an **exponent**. We simply drop these regular numbers from the expression. In our case, then, even though the algorithm takes `N2 / 2` steps, we drop the `“/ 2”` because it’s a regular number, and express the efficiency as `O(N2)`.
49 |
50 | ## Big O Categories
51 |
52 | All the types of Big O we’ve encountered, whether it’s `O(1)`, `O(log N)`, `O(N)`, `O(N2)`, or the types we’ll encounter later in this book, are **general categories** of Big O that are widely different from each other. **Multiplying** or **dividing** the number of steps by a regular number doesn’t make them change to another category 💁.
53 |
54 | However, when two algorithms fall under the **same classification** of Big O, it doesn’t necessarily mean that both algorithms have the **same speed**. After all, Bubble Sort is twice as slow as Selection Sort even though both are `O(N2)`. So, while Big O is perfect for contrasting algorithms that fall under different classifications of Big O, when two algorithms fall under the same classification, **further analysis** is required to determine which algorithm is **faster**.
55 |
56 | ### Significant Steps
57 |
58 | In the previous chapters, I alluded to the fact that you’d learn how to determine which steps are significant enough to be counted when expressing the Big O of an algorithm. In our case, then, which of these steps are considered **significant** ❓ Do we care about the **comparisons**, the **printing**, or the **incrementing** of number?
59 |
60 | The answer is that **all steps are significant**. It’s just that when we express the steps in Big O terms, we drop the constants and thereby simplify the expression 💁.
61 |
62 | Let’s apply this here. If we count all the steps, we have **N comparisons**, **N incrementings**, and **N / 2 printings**. This adds up to **2.5N steps**. However, because we eliminate the constant of **2.5**, we express this as `O(N)`. So, which step was significant? They all were, but by dropping the constant, we effectively focus more on the number of times the loop runs, rather than the exact details of what happens within the loop.
63 |
64 | ## Exercises
65 |
66 | > 1. Use Big O Notation to describe the time complexity of an algorithm that takes 4N + 16 steps.
67 |
68 | O(N).
69 |
70 | > 2. Use Big O Notation to describe the time complexity of an algorithm that takes 2N2.
71 |
72 | o(N^2)
73 |
74 | > 3. Use Big O Notation to describe the time complexity of the following function, which returns the sum of all numbers of an array after the numbers have been doubled:
75 |
76 | ```py
77 | def double_then_sum(array)
78 | doubled_array = []
79 | array.each do |number|
80 | doubled_array << number *= 2
81 | end
82 | sum = 0
83 | doubled_array.each do |number|
84 | sum += number
85 | end
86 | return sum
87 | end
88 | ```
89 |
90 | O(N).
91 |
92 | > 4. Use Big O Notation to describe the time complexity of the following function, which accepts an array of strings and prints each string in multiple cases:
93 |
94 | ```py
95 | def multiple_cases(array)
96 | array.each do |string|
97 | puts string.upcase
98 | puts string.downcase
99 | puts string.capitalize
100 | end
101 | end
102 | ```
103 |
104 | O(N).
105 |
106 | > 5. The next function iterates over an array of numbers, and for each number whose index is even, it prints the sum of that number plus every number in the array. What is this function’s efficiency in terms of Big O Notation?
107 |
108 | ```py
109 | def every_other(array)
110 | array.each_with_index do |number, index|
111 | if index.even?
112 | array.each do |other_number|
113 | puts number + other_number
114 | end
115 | end
116 | end
117 | end
118 | ```
119 |
120 | O(N^2).
121 |
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/chapter-06-optimizing-for-optimistic-scenarios.md:
--------------------------------------------------------------------------------
1 | # Chapter 6: Optimizing for Optimistic Scenarios
2 |
3 | ## Insertion Sort
4 |
5 | - Insertion sort is a simple sorting algorithm that works by iteratively inserting each element of an unsorted list into its correct position in a sorted portion of the list.
6 | - It is like sorting playing cards in your hands. You split the cards into two groups: the **sorted** cards and the **unsorted** cards. Then, you pick a card from the unsorted group and put it in the right place in the sorted group.
7 | - Here is a Python implementation of Insertion Sort:
8 | ```py
9 | def insertion_sort(array):
10 | for index in range(1, len(array)):
11 | temp_value = array[index]
12 | position = index - 1
13 | while position >= 0:
14 | if array[position] > temp_value:
15 | array[position + 1] = array[position]
16 | position = position - 1
17 | else:
18 | break
19 | array[position + 1] = temp_value
20 | return array
21 | ```
22 |
23 | ## The Efficiency of Insertion Sort
24 |
25 | - Four types of steps occur in Insertion Sort: **removals**, **comparisons**, **shifts**, and **insertions**.
26 | - In a worst-case scenario, where the array is sorted in reverse order:
27 | - `1 + 2 + 3 + … + (N - 1)` comparisons ~= `N2 / 2` **comparisons**.
28 | - `N2 / 2` **shifts**.
29 | - Removing and inserting the *temp_value* from the array happens once per passthrough. Since there are always `N - 1` pass-throughs, we can conclude that there are `N - 1` **removals** and `N - 1` **insertions**.
30 | - If we tally up ➡️ `N2 + 2N - 2` steps, simplify this to `O(N2 + N)`.
31 | - However, there is another major rule of Big O that I’ll reveal now:
32 | - *Big O Notation only takes into account the highest order of N when we have multiple orders added together*.
33 | - That is, if we have an algorithm that takes `N4 + N3 + N2 + N` steps, we only consider `N4` to be significant—and just call it `O(N4)` 💁.
34 | - It emerges that in a worst-case scenario, Insertion Sort has the same time complexity as Bubble Sort and Selection Sort. They’re all `O(N2)` 😮💨.
35 |
36 | ## The Average Case
37 |
38 | - Indeed, in a **worst-case** scenario, **Selection** Sort is **faster** than **Insertion** Sort. However, it is critical we also take into account the **average-case** scenario.
39 | - By definition, the cases that occur most frequently are average scenarios 🤓. Take a look at this simple bell curve:
40 | 
41 |
42 | - In the **best-case** scenario, where the data is already sorted in ascending order, we end up making just **one comparison per pass-through** and **not a single shift**, since each value is already in its correct place.
43 | - For the **average** scenario, we can say that in the aggregate,
44 | we probably compare and shift about **half** the data. Thus, if Insertion Sort takes `N2` steps for the **worst-case** scenario, we’d say that it takes about `N2 / 2` steps for the **average** scenario. (In terms of Big O, however, both scenarios are `O(N2)` 💁.)
45 | - You can see these three types of performance in the following graph:
46 | 
47 |
48 | > 💡 Contrast this with Selection Sort. Selection Sort takes N2 / 2 steps in all cases, from worst to average to best-case scenarios. This is because Selection Sort doesn’t have any mechanism for ending a pass-through early at any point. Each pass-through compares every value to the right of the chosen index no matter what.
49 |
50 | Here’s a table that compares Selection Sort and Insertion Sort:
51 | 
52 |
53 | - So, which is better ❓ Selection Sort or Insertion Sort? The answer is: well, it **depends**. In an **average** case—where an array is randomly sorted—they perform **similarly**.
54 | - If you have reason to assume you’ll be dealing with data that is **mostly sorted**, Insertion Sort will be a **better** choice.
55 | - If you have reason to assume you’ll be dealing with data that is **mostly sorted** in **reverse** order, **Selection** Sort will be **faster**.
56 | - If you have no idea what the data will be like, that’s essentially an average case, and both will be **equal**.
57 |
58 | ## A Practical Example
59 |
60 | Suppose we want to get the **intersection** between two arrays. Here’s one possible implementation:
61 | ```js
62 | function intersection(firstArray, secondArray){
63 | let result = [];
64 | for (let i = 0; i < firstArray.length; i++) {
65 | for (let j = 0; j < secondArray.length; j++) {
66 | if (firstArray[i] == secondArray[j]) {
67 | result.push(firstArray[i]);
68 | }
69 | }
70 | }
71 | return result;
72 | }
73 | ```
74 | - If the two arrays are of **equal** size, and say that` N` is the size of either array, the number of comparisons performed are `N2`. So, this intersection algorithm has an efficiency of `O(N2)`.
75 | - The **insertions**, at most, would take `N` steps (if the two arrays happened to be identical). This is a lower order compared to `N2`, so we’d still consider the algorithm to be `O(N2)`.
76 | - If the arrays are **different sizes** —say N and M— we’d say that the efficiency of this function is `O(N * M)`.
77 | - Is there any way we can improve this algorithm ❓ This is where it’s important to consider scenarios beyond the **worst** case. In the current implementation of the intersection function, we make `N2` comparisons **in all scenarios**, no matter whether the arrays are identical or the arrays do not share a single common value.
78 | - Here is an improved implementation:
79 | ```js
80 | function intersection(firstArray, secondArray){
81 | let result = [];
82 | for (let i = 0; i < firstArray.length; i++) {
83 | for (let j = 0; j < secondArray.length; j++) {
84 | if (firstArray[i] == secondArray[j]) {
85 | result.push(firstArray[i]);
86 | break;
87 | }
88 | }
89 | }
90 | return result;
91 | }
92 | ```
93 | - With the addition of the *break*, we can cut the inner loop short and save steps (and therefore time).
94 | - ➡️ In the best-case scenario, where the two arrays are **identical**, we only have to perform `N` comparisons. In an average case, where the two arrays are different **but share some** values, the performance will be somewhere **between** `N` and `N2`.
95 |
96 | ## Exercises
97 |
98 | > 1. Use Big O Notation to describe the efficiency of an algorithm that takes 3N2 + 2N + 1 steps.
99 |
100 | O(N^2).
101 |
102 | > 2. Use Big O Notation to describe the efficiency of an algorithm that takes N + log N steps.
103 |
104 | O(N).
105 |
106 | > 3. The following function checks whether an array of numbers contains a pair of two numbers that add up to 10. What are the best-, average-, and worst-case scenarios? Then, express the worst-case scenario in terms of Big O Notation.
107 |
108 | ```c
109 | function twoSum(array) {
110 | for (let i = 0; i < array.length; i++) {
111 | for (let j = 0; j < array.length; j++) {
112 | if (i !== j && array[i] + array[j] === 10) {
113 | return true;
114 | }
115 | }
116 | }
117 | return false;
118 | }
119 | ```
120 |
121 | - Best-case: The two pair are the first 2 numbers of the array.
122 | - Average-case: The two pair are somewhat in the middle.
123 | - Worst-case: O(N2)
124 |
125 | > 4. The following function returns whether or not a capital “X” is present within a string.
126 | ```c
127 | function containsX(string) {
128 | foundX = false;
129 | for(let i = 0; i < string.length; i++) {
130 | if (string[i] === "X") {
131 | foundX = true;
132 | }
133 | }
134 | return foundX;
135 | }
136 | ```
137 | - What is this function’s time complexity in terms of Big O Notation? `O(N)`.
138 | - Then, modify the code to improve the algorithm’s efficiency for best- and average-case scenarios.
139 |
140 | ```c
141 | function containsX(string) {
142 | for(let i = 0; i < string.length; i++) {
143 | if (string[i] === "X") {
144 | return true;
145 | }
146 | }
147 | return false;
148 | }
149 | ```
150 |
--------------------------------------------------------------------------------
/a-common-sense-guide-to-data-structures-and-algorithms/chapter-15-speeding-up-all-the-things-with-binary-search-trees.md:
--------------------------------------------------------------------------------
1 | # Speeding Up All the Things with Binary Search Trees
2 |
3 | - An ordered array is a simple but effective tool for keeping data in order:
4 | - 👍 `O(1)` reads and `O(log N)` search (when using binary search).
5 | - 👎 When it comes to insertions and deletions, ordered arrays are relatively **slow** (O(N), due to shifting).
6 | - Now, if we were looking for a DS that delivers all-around amazing speed, a **hash table** is a great choice:
7 | - They are `O(1)` for *search*, insertion, and deletion. However, they do not maintain order 🤷.
8 |
9 | ## Trees
10 |
11 | - A 🌴 is a **node-based** DS, but within a tree (as opposed to linked lists), each node can have links to **multiple nodes**.
12 | 
13 |
14 | - The uppermost node (in our example, the “`j`”) is called the **root**.
15 | - “`j`” is a **parent** to “`m`” and “`b`.” Conversely, “`m`” and “`b`” are **children** of “`j`.”.
16 | - A node’s **descendants** are all the nodes that stem from a node, while a node’s **ancestors** are all the nodes that it stems from.
17 | - Trees are said to have **levels**. Each level is a **row** within the tree.
18 | - One property of a tree is how **balanced** it is. A tree is balanced when its nodes’ subtrees have the **same number of nodes** in it.
19 | - The following tree, on the other hand, is **imbalanced**:
20 | ```c
21 | A
22 | / \
23 | B C
24 | / / \
25 | D E F
26 | \
27 | G
28 | ```
29 |
30 | ## Binary Search Trees
31 |
32 | - A **binary** tree is a 🌴 in which each node has **zero**, **one**, or **two** children.
33 | - A **binary search** tree is a binary tree that also abides by the following rules:
34 | - Each node can have at most one **“left”** child and one **“right”** child.
35 | - A node’s **“left”** descendants can only contain values that are **less** than the node itself. Likewise, a node’s **“right”** descendants can only contain values that are **greater** than the node itself.
36 | ```c
37 | 8
38 | / \
39 | 3 10
40 | / \ \
41 | 1 6 14
42 | / \ /
43 | 4 7 13
44 | ```
45 |
46 | ## Searching
47 |
48 | 1. Designate a node to be the “**current node**” (At the beginning of the algorithm, the **root** node is the first “current node”).
49 | 2. Inspect the value at the current node. If we’ve found the value we’re looking for, great!
50 | 3. If the value we’re looking for is **less** than the current node, search for it
51 | in its **left** subtree.
52 | 4. If the value we’re looking for is **greater** than the current node, search for
53 | it in its **right** subtree.
54 | 5. Repeat Steps 1 through 4 until we find the value we’re searching for, or until we hit the **bottom** of the tree, in which case our value must not be in the tree.
55 |
56 | ## The Efficiency of Searching a Binary Search Tree
57 |
58 | - Notice that each step **eliminates half** of the remaining nodes from our search.
59 | - We’d say, then, that searching in a binary search tree is `O(log N)` (though, that this is only for a **perfectly balanced** binary search tree, which is a best-case scenario 🤥).
60 | - Another way of describing why search in a binary search tree is `O(log N)`:
61 | - ➡️ If there are `N` nodes in a balanced binary tree, there will be about `log N` levels (rows).
62 | - Each time we **add a new full level** to the tree, we end up roughly doubling the
63 | number of nodes that the tree has (Really, we’re doubling the nodes and adding one 🤓).
64 | - In this regard, then, searching a binary search tree has the **same efficiency** as **binary search** within an **ordered array**. Where binary search trees really shine over ordered arrays, though, is with **insertion**.
65 |
66 | ## Code Implementation: Searching a Binary Search Tree
67 |
68 | Here’s how we can use recursion to implement search with Python:
69 | ```py
70 | def search(searchValue, node):
71 | # Base case: If the node is nonexistent
72 | # or we've found the value we're looking for:
73 | if node is None or node.value == searchValue:
74 | return node
75 | # If the value is less than the current node, perform
76 | # search on the left child:
77 | elif searchValue < node.value:
78 | return search(searchValue, node.leftChild)
79 | # If the value is greater than the current node, perform
80 | # search on the right child:
81 | else: # searchValue > node.value
82 | return search(searchValue, node.rightChild)
83 | ```
--------------------------------------------------------------------------------
/a-tour-of-cpp/01-basics.md:
--------------------------------------------------------------------------------
1 | # Basics
2 |
3 | ## Variables
4 |
5 | - The `=` form is traditional and dates back to C, but if in doubt, use the general `{}` list form.
6 | - If nothing else, it saves you from conversions that **lose information**:
7 | ```cpp
8 | int i1 = 7.8; // i1 becomes 7 (surprise?)
9 | int i2 {7.8}; // error: floating-point to integer conversion
10 | ```
11 |
12 | - We use `auto` where we don’t have a specific reason to mention the type explicitly. *Specific reasons* include:
13 | - The definition is in a large scope where we want to make the type **clearly visible to readers** of our code.
14 | - The type of the initializer **isn’t obvious**.
15 | - We want to be **explicit** about a variable’s **range** or **precision** (e.g., `double` rather than `float`).
16 |
17 | ## Scope
18 |
19 | A declaration introduces its name into a scope:
20 | - **Local scope**: A name declared in a **function** or **lambda** is called a local name. Its scope extends from its point of declaration to the end of the block in which its declaration occurs. A block is delimited by a `{}` pair. Function argument names are considered local names.
21 | - **Class scope**: A name is called a **member name** (or a class member name) if it is defined in a class, outside any function, lambda, or enum class. Its scope extends from the opening `{` of its enclosing declaration to the matching `}`.
22 | - **Namespace scope**: A name is called a namespace member name if it is defined in a namespace outside any function, lambda, class, or enum class. Its scope extends from the point of declaration to the end of its namespace.
23 | - A name not declared inside any other construct is called a global name and is said to be in the **global namespace**.
24 |
25 | ## Constants
26 |
27 | - **const**: primarily to specify interfaces so that data can be passed to functions using pointers and references without fear of it being modified.
28 | - The value of a `const` may be calculated at run time!
29 | ```cpp
30 | const double s1 = sum(v); // OK: sum(v) is evaluated at run time
31 | ```
32 | - **constexpr**: meaning roughly *to be evaluated at compile time.*.
33 | - This is used primarily to specify constants, to allow placement of data in read-only memory (where it is unlikely to be corrupted), and for performance.
34 | - The value of a `constexpr` must be calculated by the compiler.
35 | ```cpp
36 | constexpr double s2 = sum(v); // error: sum(v) is not a constant expression
37 | ```
38 | - For a function to be usable in a **constant expression**, that is, in an expression that will be evaluated by the compiler, it must be defined `constexpr` or `consteval`. For example:
39 | ```cpp
40 | constexpr double square(double x) { return x∗x; }
41 | constexpr double max1 = 1.4∗square(17); // OK: 1.4*square(17) is a constant expression
42 | constexpr double max2 = 1.4∗square(var); // error: var is not a constant, so square(var) is not a constant
43 | const double max3 = 1.4∗square(var); // OK: may be evaluated at run time
44 | ```
45 | - **consteval**: When we want a function to be used only for evaluation at compile time, we declare it `consteval` rather than `constexpr`.
46 | - It fails if used in a runtime context.
47 | - Only for functions (not variables).
48 | - For example:
49 | ```cpp
50 | consteval double square2(double x) { return x∗x; }
51 |
52 | constexpr double max1 = 1.4∗square2(17); // OK: 1.4*square(17) is a constant expression
53 | const double max3 = 1.4∗square2(var); // error: var is not a constant
54 | ```
55 | - `const` before and after a method:
56 | - Using `const` before means it will return a `const` reference to T (here data_)
57 | ```c
58 | Class c;
59 | T& t = c.get_data() // Not allowed.
60 | const T& tc = c.get_data() // OK.
61 | ```
62 | - Using `const` after means the method will not modify any member variables of the class (unless the member is **mutable**).
63 | - A const member function can be invoked for **both const and non-const objects**, but a non-const member function can only be invoked for non-const objects ⚠️.
64 | ```c
65 | const T& get_data() const { return data_; }
66 | ```
67 | - Pointer to const V const Pointer:
68 | ```c
69 | int i = 10;
70 | int* p = &i; // pointer
71 | const int* p = &i; // pointer to const
72 | int* const p = &i // const pointer
73 | const int* const p = &i // const pointer to const value
74 | ```
75 |
76 | ## Pointers, arrays and References
77 |
78 | - Passing a value by reference is just **syntax sugar**.
79 | - There is nothing you can do in reference that you can't do without a pointer 😏.
80 | - References are **cleaner** and simpler to read.
81 | - You cannot set the value of a **references multiple times**, once you ref a variable, you can't assign it to another variable ⚠️.
82 | - References has to be **initialized** when declared.
83 | - References is not **re-bindable** ⚠️.
84 | - Pointers are **addressable**, you can get their `@`, ref you can't, it's hidden.
85 | - Pointers are **nullable**.
86 | - using **nullptr** eliminates potential confusion between integers (such as 0 or NULL) and pointers.
87 |
88 |
89 |
90 | A name declared in a condition is in scope on both branches of the if-statement:
91 | ```cpp
92 | void do_something(vector& v)
93 | {
94 | if (auto n = v.size(); n!=0) {
95 | // ... we get here if n!=0 ...
96 | }
97 | // ...
98 | }
99 | ```
100 |
101 | ## Reference lifetime extension
102 |
103 | - Prevents temporary objects from being **destroyed prematurely** when bound to const **lvalue** references or **rvalue** references.
104 | - It enhances safety and efficiency by avoiding dangling references and unnecessary copies.
105 | - It does not apply to **non-const references**, references stored in **containers**, or references returned from **function**.
106 |
--------------------------------------------------------------------------------
/a-tour-of-cpp/02-user-defined-types.md:
--------------------------------------------------------------------------------
1 | # User Defined Types
2 |
3 | ## Classes
4 |
5 | - The `public` and `private` parts of a class declaration can appear in any order, but conventionally we place the public declarations first and the `private` declarations later.
6 | ```cpp
7 | class Vector:
8 | public:
9 | Vector(int s) : elem{new double[s]}, sz{s} {} // initializes the Vector members using a member initializer list
10 | double& operator[](int i) {
11 | return elem[i];
12 | }
13 | int size () { return sz };
14 |
15 |
16 | private:
17 | double *elem;
18 | int sz;
19 | ```
20 |
21 | - What is the difference between a `struct` and a `class` ❓
22 | - From the programmer's point-of-view, there is a very minor difference. Members of a `struct` have **public** visibility by **default**, whereas members of a class have **private** visibility by **default**.
23 | - `struct` **inherits publicly** by default while `class` **inherits privately** by default.
24 | - `struct` is typically used for plain old data (POD) types (simple data containers without complex logic). `class` is used for **encapsulated objects** with **behavior** (methods, private data, etc.).
25 | - The general rule to follow is that `structs` should be **small**, **simple** (one-level) collections of related properties, that are **immutable** once created; for anything else, use a `class`.
26 | - Structs were left in C++ for compatibility reasons with C.
27 |
28 | ## Enumerations
29 |
30 | - Used to represent small sets of integer values.
31 | - They are used to make code more readable and less error-prone than it would have been had the symbolic (and mnemonic) enumerator names not been used.
32 | ```cpp
33 | enum class Color { red, blue, green };
34 | Color c = 2; // initialization error: 2 is not a Color
35 | Color x = Color{5}; // OK, but verbose
36 | Color y {6}; // also OK
37 | int x = int(Color::red); // explicitly convert an enum value to its underlying type
38 | ```
39 | - If you don’t ever want to explicitly qualify enumerator names and want enumerator values to be `ints` (without the need for an explicit conversion), you can remove the `class` from `enum` class to get a **plain** enum.
40 | - Advantages of using **class enums** compared to **traditional C** enums:
41 | - The enumerators are **scoped** inside the enum (avoid namespace pollution).
42 | - **No implicit conversion** to int (strongly typed), must use `static_cast` for explicit conversion.
43 | - C enums can lead to unintended comparisons between unrelated enums.
44 | - Allows explicit specification of the underlying type (e.g., int, char, short).
45 | - Useful for **memory optimization** or **serialization**.
46 | ```cpp
47 | - enum Color { Red, Green, Blue }; // Could be `int`, `short`, etc.
48 | - enum class Color : char { Red, Green, Blue }; // Stored as `char`
49 | - enum class BigEnum : uint64_t { Value1, Value2 }; // Guaranteed 64-bit
50 | ```
51 | - Can always be forward-declared (since the underlying type defaults to int unless specified).
52 | - Could not be forward-declared (before C++11).
53 | - In C++11, can be forward-declared only if the underlying type is specified.
54 | ```cpp
55 | enum Color : int; // Forward declaration (C++11+)
56 | enum Color : int { Red, Green, Blue };
57 | ```
58 |
59 | ## Unions
60 |
61 | - `std::variant` (C++17) eliminates the need for manual tagging:
62 | ```cpp
63 | std::variant data;
64 | data = 42; // Stores int
65 |
66 | // Safe access
67 | if (std::holds_alternative(data)) {
68 | std::cout << std::get(data);
69 | }
70 | ```
71 | - For many uses, a **variant** is simpler and safer to use than a `union`.
72 |
--------------------------------------------------------------------------------
/a-tour-of-cpp/03-modularity.md:
--------------------------------------------------------------------------------
1 | # Modularity
2 |
3 | C++ supports a notion of separate compilation where user code sees only declarations of the types and functions used. This can be done in two ways:
4 | - **Header files**: Place declarations in separate files, called header files, and textually `#include` a header file where its declarations are needed.
5 | - **Modules**: Define module files, compile them separately, and import them where needed. Only explicitly exported declarations are seen by code importing the module
6 | - A `.cpp` file that is compiled by itself (including the `h` files it `#includes`) is called a **translation unit**. A program can consist of thousands of translation units.
7 | - The use of header files and `#include` is a very old way of simulating modularity with significant **disadvantages**:
8 | - 👎 **Compilation time**:
9 | - If you `#include header.h` in **101** translation units, the text of `header.h` will be processed by the compiler **101** times.
10 | - If a header file changes, all source files that include it must be **recompiled**.
11 | - Template-heavy code: headers like `STL` or `Eigen` force full recompilation when modified.
12 | - 👎 **Order dependencies**:
13 | - `` includes `` (a deprecated Winsock version), which conflicts with ``.
14 | - If included in the wrong order, you get "redefinition" errors.
15 | - 👎 **Inconsistencies**:
16 | - Defining an entity, such as a type or a function, in one file and then defining it slightly differently in another file, can lead to crashes or subtle errors.
17 | - 👎 **Transitivity**:
18 | - All code that is needed to express a declaration in a header file must be present in that header file. This leads to massive code bloat as header files `#include` other headers and this results in the user of a header file – accidentally or deliberately – becoming dependent on such implementation details.
19 | - 👎 **Poor Encapsulation**:
20 | - Implementation details exposed: Private members must be declared in headers, leaking internal details.
21 | - 👎 **One Definition Rule (ODR) Violations**:
22 | - Duplicates definitions: If a header defines (rather than declares) functions/variables, multiple inclusions cause linker errors.
23 | - Requires `#pragma once` or include guards (manual maintenance).
24 | - 👎 **Macro Pollution** & **Global Namespace Issues**:
25 | - Macros in headers (#define) affect all included files, leading to:
26 | - Naming conflicts (e.g., `min`, `max` macros breaking `std::min`).
27 | - Unintended side effects (e.g., `Windows.h` macros).
28 | - 👎 **Circular Dependencies**:
29 | - Hard to resolve: If A.h includes B.h and B.h includes A.h, compilation fails without careful forward declarations.
30 | - The differences between headers and modules are not just **syntactic**.
31 | - A module is compiled **once only** (rather than in each translation unit in which it is used).
32 | - Two modules can be imported in either **order** without changing their meaning.
33 | - If you `import` or `#include` something into a module, users of your module do not implicitly gain access to that: `import` is not **transitive**.
34 | - `import std` compiles 10 times faster than the version using `#include` even though the format contains 10x times as much information:
35 | - The reason is that modules only **export interfaces** whereas a header delivers all that it directly or indirectly contains to the compiler.
36 | - Mixing is allowed as it's essential for gradually upgrading older code from using `#include` to using `import`.
37 |
38 | ## Namespaces
39 |
40 | - Namespaces are primarily used to organize larger program components, such as libraries. They simplify the composition of a program out of separately developed parts.
41 | - If repeatedly qualifying a name becomes tedious or distracting, we can bring the name into a scope with a `using`-declaration:
42 | ```cpp
43 | void my_code(vector& x, vector& y)
44 | {
45 | using std::swap; // make the standard-library swap available locally
46 | // ...
47 | swap(x,y); // std::swap()
48 | other::swap(x,y); // some other swap()
49 | // ...
50 | }
51 |
52 | // To gain access to all names in the standard-library namespace, we can use a using-directive:
53 | using namespace std;
54 | ```
55 | - A `using`-declaration makes a name from a namespace usable as if it were declared in the **scope** in which it appears.
56 | - To gain access to all names in the standard-library namespace, we can use a using-directive: `using namespace std;`.
57 | - When defining a module, we do **not** have to **separate declarations and definitions** into separate files; we can if that improves our source code organization, but we don’t have to. We could define the simple `Vector` module like this:
58 | ```cpp
59 | export module Vector; // defining the module called "Vector"
60 | export class Vector {
61 | // ...
62 | };
63 | export bool operator==(const Vector& v1, const Vector& v2)
64 | {
65 | // ...
66 | }
67 | ```
68 | - Graphically, the program fragments can be represented like this:
69 | 
70 |
71 | - Using modules, we don’t have to complicate our code to hide implementation details from users; a module will only **grant access** to **exported declarations**.
72 |
73 | ## Function Arguments and Return Values
74 |
75 | - We return **by reference** only when we want to grant a caller access to something that is not local to the function.
76 | - A local variable disappears when the function returns, so we should not return a pointer or reference to it.
77 | - **Copy elision** is an optimization technique in C++ that allows the compiler to avoid unnecessary object copies or moves, improving performance. It is particularly relevant in the context of returning objects from functions. With C++17, the standard mandates copy elision in certain situations, making it a more predictable and reliable optimization.
78 |
79 | ## Suffix Return Type
80 |
81 | - In C++11 and later, the **suffix return type** (also called a trailing return type) is a way to declare a function's return type after its parameter list, using the `->` syntax.
82 | - This feature was introduced primarily to support **lambda** expressions and **template meta-programming**, where the return type may depend on the function's arguments or complex expressions.
83 | - Adding the return type after the argument list where we want to be **explicit** about the return type.
84 | - That makes `auto` mean *the return type will be mentioned later or be deduced.*:
85 | ```cpp
86 | auto mul(int i, double d) −> double { return i∗d; } // the return type is "double"`
87 | ```
88 |
89 | ## Structured Binding
90 |
91 | A function can return only a single value, but that value can be a class object with many members. This allows us to **elegantly** return **many values**.
92 |
93 | ```cpp
94 | struct Entry {
95 | string name;
96 | int value;
97 | };
98 |
99 | Entry read_entry(istream& is) // naive read function (for a better version, see §11.5)
100 | {
101 | string s;
102 | int i;
103 | is >> s >> i;
104 | return {s,i}; // {s,i} is used to construct the Entry return value.
105 | }
106 |
107 | auto e = read_entry(cin);
108 | cout << "{ " << e.name << " , " << e.value << " }\n"
109 |
110 | // Similarly, we can unpack an Entry’s members into local variables:
111 | auto [n,v] = read_entry(is); //This mechanism for giving local names to members of a class object is called structured binding.
112 |
113 | // Consider another example:
114 | map m;
115 |
116 | // ... fill m ...
117 |
118 | for (const auto [key,value] : m)
119 | cout << "{" << key << "," << value << "}\n";
120 | ```
121 |
--------------------------------------------------------------------------------
/a-tour-of-cpp/assets/modules-export.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/a-tour-of-cpp/assets/modules-export.png
--------------------------------------------------------------------------------
/clean-architecture/assets/boundary-crossing-against-control-flow.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/boundary-crossing-against-control-flow.png
--------------------------------------------------------------------------------
/clean-architecture/assets/boundary-gui-business-rules.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/boundary-gui-business-rules.png
--------------------------------------------------------------------------------
/clean-architecture/assets/boundary-line.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/boundary-line.png
--------------------------------------------------------------------------------
/clean-architecture/assets/business-rules-and-database-components.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/business-rules-and-database-components.png
--------------------------------------------------------------------------------
/clean-architecture/assets/class-diagram-low-high-level-policy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/class-diagram-low-high-level-policy.png
--------------------------------------------------------------------------------
/clean-architecture/assets/clean-architecture-hunt-the-wumpus.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/clean-architecture-hunt-the-wumpus.png
--------------------------------------------------------------------------------
/clean-architecture/assets/clean-architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/clean-architecture.png
--------------------------------------------------------------------------------
/clean-architecture/assets/cohesion-principles-tension.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/cohesion-principles-tension.png
--------------------------------------------------------------------------------
/clean-architecture/assets/component-based-taxi-arch.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/component-based-taxi-arch.png
--------------------------------------------------------------------------------
/clean-architecture/assets/components-relationships-uni.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/components-relationships-uni.png
--------------------------------------------------------------------------------
/clean-architecture/assets/cross-cutting-concerns.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/cross-cutting-concerns.png
--------------------------------------------------------------------------------
/clean-architecture/assets/db-behind-interface.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/db-behind-interface.png
--------------------------------------------------------------------------------
/clean-architecture/assets/facade-pattern.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/facade-pattern.png
--------------------------------------------------------------------------------
/clean-architecture/assets/factories.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/factories.png
--------------------------------------------------------------------------------
/clean-architecture/assets/hardware-is-a-detail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/hardware-is-a-detail.png
--------------------------------------------------------------------------------
/clean-architecture/assets/isp-problematic-architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/isp-problematic-architecture.png
--------------------------------------------------------------------------------
/clean-architecture/assets/isp-segregated-ops.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/isp-segregated-ops.png
--------------------------------------------------------------------------------
/clean-architecture/assets/isp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/isp.png
--------------------------------------------------------------------------------
/clean-architecture/assets/loan-entity.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/loan-entity.png
--------------------------------------------------------------------------------
/clean-architecture/assets/low-high-level-boundary-crossing.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/low-high-level-boundary-crossing.png
--------------------------------------------------------------------------------
/clean-architecture/assets/lsp-inheritance.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/lsp-inheritance.png
--------------------------------------------------------------------------------
/clean-architecture/assets/lsp-square-rectangle.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/lsp-square-rectangle.png
--------------------------------------------------------------------------------
/clean-architecture/assets/memory-layout-old-days.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/memory-layout-old-days.png
--------------------------------------------------------------------------------
/clean-architecture/assets/mix-soft-firm-anti-pattern.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/mix-soft-firm-anti-pattern.png
--------------------------------------------------------------------------------
/clean-architecture/assets/ocp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/ocp.png
--------------------------------------------------------------------------------
/clean-architecture/assets/one-dim-boundary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/one-dim-boundary.png
--------------------------------------------------------------------------------
/clean-architecture/assets/os-abstraction-layer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/os-abstraction-layer.png
--------------------------------------------------------------------------------
/clean-architecture/assets/package-by-component.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/package-by-component.png
--------------------------------------------------------------------------------
/clean-architecture/assets/package-by-feature.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/package-by-feature.png
--------------------------------------------------------------------------------
/clean-architecture/assets/package-by-layer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/package-by-layer.png
--------------------------------------------------------------------------------
/clean-architecture/assets/plugging-in-to-business-rules.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/plugging-in-to-business-rules.png
--------------------------------------------------------------------------------
/clean-architecture/assets/ports-adapters.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/ports-adapters.png
--------------------------------------------------------------------------------
/clean-architecture/assets/segregation-of-mutability.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/segregation-of-mutability.png
--------------------------------------------------------------------------------
/clean-architecture/assets/spr.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/spr.png
--------------------------------------------------------------------------------
/clean-architecture/assets/strategy-pattern.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/strategy-pattern.png
--------------------------------------------------------------------------------
/clean-architecture/assets/taxi-object-oriented-arch.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/taxi-object-oriented-arch.png
--------------------------------------------------------------------------------
/clean-architecture/assets/taxi-service-arch.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/taxi-service-arch.png
--------------------------------------------------------------------------------
/clean-architecture/assets/three-layers.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/three-layers.png
--------------------------------------------------------------------------------
/clean-architecture/assets/typical-components-diagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/typical-components-diagram.png
--------------------------------------------------------------------------------
/clean-architecture/assets/video-sales-arch.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/video-sales-arch.png
--------------------------------------------------------------------------------
/clean-architecture/assets/video-sales-uml.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/video-sales-uml.png
--------------------------------------------------------------------------------
/clean-architecture/assets/violating-sdp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/violating-sdp.png
--------------------------------------------------------------------------------
/clean-architecture/assets/zone-of-exclusion.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/clean-architecture/assets/zone-of-exclusion.png
--------------------------------------------------------------------------------
/cpp-data-structures-and-algorithms/.gitignore:
--------------------------------------------------------------------------------
1 | # Prerequisites
2 | *.d
3 |
4 | # Compiled Object files
5 | *.slo
6 | *.lo
7 | *.o
8 | *.obj
9 |
10 | # Precompiled Headers
11 | *.gch
12 | *.pch
13 |
14 | # Compiled Dynamic libraries
15 | *.so
16 | *.dylib
17 | *.dll
18 |
19 | # Fortran module files
20 | *.mod
21 | *.smod
22 |
23 | # Compiled Static libraries
24 | *.lai
25 | *.la
26 | *.a
27 | *.lib
28 |
29 | # Executables
30 | *.exe
31 | *.out
32 | *.app
33 |
34 |
--------------------------------------------------------------------------------
/cpp-data-structures-and-algorithms/msvc.ps1:
--------------------------------------------------------------------------------
1 | # Save the current directory
2 | $originalDir = Get-Location
3 |
4 | Import-Module "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\Common7\Tools\Microsoft.VisualStudio.DevShell.dll"
5 | Enter-VsDevShell -VsInstallPath "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional" -DevCmdArguments '-arch=x64'
6 |
7 | # Restore the original directory at the end
8 | Set-Location $originalDir
9 |
10 |
--------------------------------------------------------------------------------
/cpp-data-structures-and-algorithms/sorting/bubble-sort.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 |
4 | void print_array(const std::array &arr) {
5 | for (const int &value : arr) {
6 | std::cout << value << " ";
7 | }
8 |
9 | std::cout << "\n";
10 | }
11 |
12 | void bubble_sort(std::array &arr) {
13 |
14 | size_t end = arr.size() - 1;
15 | bool swapped = false;
16 | do {
17 | swapped = false;
18 | for (size_t i = 0; i < end; i++) {
19 | if (arr[i] > arr[i + 1]) {
20 | std::swap(arr[i], arr[i + 1]);
21 | swapped = true;
22 | }
23 | }
24 | end--;
25 | } while (swapped);
26 | }
27 |
28 | int main() {
29 |
30 | std::array arr = {1, 5, 99, 14, 56, 4, 78, 100, 45, 87, 1};
31 |
32 | std::cout << "Original array" << std::endl;
33 | print_array(arr);
34 | bubble_sort(arr);
35 | std::cout << "Sorted array" << std::endl;
36 | print_array(arr);
37 |
38 | return 0;
39 | }
40 |
--------------------------------------------------------------------------------
/cpp-data-structures-and-algorithms/sorting/insertion-sort.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 |
4 | void print_array(const std::array &arr) {
5 | for (const int &value : arr) {
6 | std::cout << value << " ";
7 | }
8 |
9 | std::cout << "\n";
10 | }
11 |
12 | void insertion_sort(std::array &arr) {
13 |
14 | for (size_t i = 0; i < arr.size() - 1; i++) {
15 | int temp = arr[i + 1];
16 | size_t j = i;
17 | while (j >= 0 && arr[j] > temp) {
18 | arr[j + 1] = arr[j];
19 | j--;
20 | }
21 | arr[j + 1] = temp;
22 | }
23 | }
24 |
25 | int main() {
26 |
27 | std::array arr = {1, 5, 99, 14, 56, 4, 78, 100, 45, 87, 1};
28 |
29 | std::cout << "Original array" << std::endl;
30 | print_array(arr);
31 | insertion_sort(arr);
32 | std::cout << "Sorted array" << std::endl;
33 | print_array(arr);
34 |
35 | return 0;
36 | }
37 |
--------------------------------------------------------------------------------
/cpp-data-structures-and-algorithms/sorting/selection-sort.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 |
5 | void print_array(const std::array &arr) {
6 |
7 | for (const int &value : arr) {
8 | std:
9 | std::cout << value << " ";
10 | }
11 |
12 | std::cout << "\n";
13 | }
14 |
15 | void selection_sort(std::array &arr) {
16 |
17 | for (size_t i = 0; i < arr.size(); i++) {
18 | int min = i;
19 | for (size_t j = i + 1; j < arr.size(); j++) {
20 | if (arr[j] < arr[min]) {
21 | min = j;
22 | }
23 | }
24 |
25 | if (min != i) {
26 | std::swap(arr[i], arr[min]);
27 | }
28 | }
29 | }
30 |
31 | int main() {
32 |
33 | std::array arr = {1, 5, 99, 14, 56, 4, 78, 100, 45, 87, 1};
34 |
35 | std::cout << "Original array" << std::endl;
36 | print_array(arr);
37 | selection_sort(arr);
38 | std::cout << "Sorted array" << std::endl;
39 | print_array(arr);
40 |
41 | return 0;
42 | }
43 |
--------------------------------------------------------------------------------
/database-internals/README.md:
--------------------------------------------------------------------------------
1 | Study notes taken from reading *Database Internals: A Deep Dive into How Distributed Data Systems Work* by *Alex Petrov*.
2 |
3 | # Part I: Storage Engines
4 |
5 | Since the term **database management system (DBMS)** is quite bulky, throughout this book we use more compact terms, **database system** and **database**, to refer to the same concept.
6 | - **DBMS** are apps built on top of **storage engines**, offering a schema, a query language, indexing, transactions, and many other useful features.
7 | - This clear separation is 👍 because:
8 | - It enabled database developers to bootstrap database systems using existing storage engines, and concentrate on the other subsystems.
9 | - It opens up an opportunity to switch between different engines, potentially better suited for particular use cases.
10 |
11 | ## Comparing Databases
12 |
13 | - To reduce the risk of an **expensive migration**, you can invest some time before you decide on a specific database to build confidence in its ability to meet your application’s needs.
14 | - Even a superficial understanding of how each database works and what’s inside it can help you land a more weighted conclusion then looking at DB [comparison](https://db-engines.com/en/ranking) websites.
15 | - If you’re searching for a database that would be a good fit for the workloads you have, the best thing you can do is to **simulate these workloads** against different DB systems **measure the performance** metrics that are important for you, and compare results.
16 | - Some issues, especially when it comes to performance and scalability, **start showing only after some time** or as the **capacity grows** 😼.
17 | - To compare databases, it’s helpful to understand the use case in great detail and define the current and anticipated **variables**, such as:
18 | - Schema and record sizes
19 | - Number of clients
20 | - Types of queries and access patterns
21 | - Rates of the read and write queries
22 | - Expected changes in any of these variables
23 | - Knowing these variables can help to answer the following questions:
24 | - Does the database support the required queries?
25 | - Is this database able to handle the amount of data we’re planning to store?
26 | - How many read and write operations can a single node handle?
27 | - How many nodes should the system have?
28 | - How do we expand the cluster given the expected growth rate?
29 | - What is the maintenance process?
30 | - One of the popular tools used for benchmarking, performance evaluation, and comparison is **Yahoo! Cloud Serving Benchmark (YCSB)**.
31 | - Also worth checking the **Transaction Processing Performance Council (TPC)**.
32 |
33 | ## Understanding Trade-Offs
34 |
35 | - There are many different approaches to storage engine design, and every implementation has its own upsides and downsides.
36 | - Some are optimized for **low read or write latency**, some try to **maximize density** (the amount of stored data per node), and some concentrate on **operational simplicity**.
--------------------------------------------------------------------------------
/database-internals/assets/conceptual-storage-webtable.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/database-internals/assets/conceptual-storage-webtable.png
--------------------------------------------------------------------------------
/database-internals/assets/dbms-architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/database-internals/assets/dbms-architecture.png
--------------------------------------------------------------------------------
/database-internals/assets/primary-index-indirection.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/database-internals/assets/primary-index-indirection.png
--------------------------------------------------------------------------------
/database-internals/assets/tree-balancing.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/database-internals/assets/tree-balancing.png
--------------------------------------------------------------------------------
/database-internals/chapter-01-introduction-and-overview.md:
--------------------------------------------------------------------------------
1 | # Chapter 1: Introduction and Overview
2 |
3 | - DBMS can serve different purposes: some are used primarily for **temporary hot data**, some serve as a **long-lived cold storage**, some allow **complex analytical queries**, some only allow accessing values by the **key**, some are optimized to store **time-series** data, and some store **large blobs** efficiently.
4 | - There are many ways DBMSs can be classified. For example, in terms of a **storage medium** (*Memory vs Disk-Based*) or **layout** (*“Column- vs Row-Oriented*). Some sources group them into three major categories:
5 | - *Online transaction processing (OLTP) databases* - These handle a large number of user-facing requests and transactions. Queries are often **predefined** and **short-lived**.
6 | - *Online analytical processing (OLAP) databases* - These handle **complex aggregations**. OLAP databases are often used for **analytics** and data **warehousing**, and are capable of handling complex, **long-running ad hoc queries**.
7 | - **Hybrid transactional and analytical processing (HTAP)** - These databases combine properties of both OLTP and OLAP stores.
8 | - There are many other terms and classifications: key-value stores, relational databases, document-oriented stores, and graph databases.
9 |
10 | ## DBMS Architecture
11 |
12 | - DBMS use a **client/server** model, where database system instances (nodes) take the role of servers, and application instances take the role of clients.
13 | 
14 |
15 | - Client requests arrive through the **transport subsystem**. Requests come in the form of queries, most often expressed in some query language. The transport subsystem is also responsible for **communication with other nodes** in the database cluster.
16 | - Upon receipt, the transport subsystem hands the query over to a **query processor**, which parses, interprets, and validates it.
17 | - The parsed query is passed to the **query optimizer**, which first eliminates impossible and redundant parts of the query, and then attempts to find the most efficient way to execute it based on internal statistics and data placement.
18 | - The query is usually presented in the form of an **execution plan** (or **query plan**): a sequence of operations that have to be carried out for its results to be considered complete. Since the same query can be satisfied using different execution plans that
19 | can vary in efficiency, the optimizer picks the best available plan.
20 | - The execution plan is handled by the **execution engine**, which collects the results of the execution of local and remote operations. **Remote** execution can involve writing and reading data to and from **other nodes** in the cluster, and replication. **Local** queries (coming directly from clients or from other nodes) are executed by the storage engine. The storage engine has several components with dedicated responsibilities:
21 | - **Transaction manager** - This manager schedules transactions and ensures they cannot leave the database in a logically inconsistent state.
22 | - **Lock manager** - This manager locks on the database objects for the running transactions, ensuring that concurrent operations do not violate physical data integrity.
23 | - **Access methods** (storage structures) - These manage access and organizing data on disk. Access methods include heap files and storage structures such as B-Trees or LSM Trees.
24 | - **Buffer manager** - This manager caches data pages in memory.
25 | - **Recovery manager** - This manager maintains the operation log and restoring the system state in case of a failure.
26 |
27 | ## Memory-Versus Disk-Based DBMS
28 |
29 | In-memory database management systems (sometimes called **main memory** DBMS) store data primarily in memory and use the disk for **recovery** and **logging**. Disk-based DBMS hold most of the data on disk and use memory for **caching** disk contents or as a temporary storage.
30 |
31 | Databases using memory as a primary data store do this mainly because of **performance**, comparatively **low access costs**, and **access granularity**. Programming for main memory is also significantly **simpler** than doing so for the disk. OSs abstract memory management and allow us to think in terms of allocating and freeing arbitrarily sized memory chunks. On disk, we have to manage data references, serialization formats, freed memory, and fragmentation manually 🤷.
32 |
33 | ### Durability in Memory-Based Stores
34 |
35 | - In-memory database systems maintain backups on disk to provide durability and prevent loss of the **volatile** data.
36 | - Before the operation can be considered complete, its results have to be written to a **sequential log file**.
37 | - To avoid replaying complete log contents during startup or after a crash, in-memory stores maintain a **backup copy**. The backup copy is maintained as a sorted disk-based structure, and modifications to this structure are often **asynchronous** (decoupled from client requests) and applied in **batches** to reduce the number of I/O
38 | operations. During recovery, database contents can be restored from the backup and logs.
39 |
40 | ## Column-Versus Row-Oriented DBMS
41 |
42 | - Most database systems store a set of data **records**, consisting of **columns** and **rows** in **tables**.
43 | - **Field** is an **intersection** of a column and a row.
44 | - A collection of values that belong logically to the same record (usually identified by the key) constitutes a row.
45 | - One of the ways to classify databases is by how the data is **stored on disk**: row or column wise.
46 | - Tables can be partitioned either **horizontally** (storing values belonging to the same row together) ▶️ *MySQL*, *PostgreSQL*.
47 | - or **vertically** (storing values belonging to the same column together) ▶️ *MonetDB* and *C-Store*.
48 |
49 | ### Row-Oriented Data Layout
50 |
51 | - Their layout is quite close to the **tabular** data representation, where every row has the **same** set of **fields**.
52 | | ID | Name | Birth Date | Phone Number |
53 | | --- | ----- | ----------- | -------------- |
54 | | 10 | John | 01 Aug 1981 | +1 111 222 333 |
55 | | 20 | Sam | 14 Sep 1988 | +1 555 888 999 |
56 | | 30 | Keith | 07 Jan 1984 | +1 333 444 555 |
57 | - This approach works well for cases where several fields constitute the record (name, birth date, and a phone number) uniquely identified by the key (in this example, a
58 | monotonically incremented number)
59 | - All fields representing a single user record are often **read together**. When creating records, we write them together as well.
60 | - Since row-oriented stores are most useful in scenarios when we have to access data by row, storing entire rows together improves **spatial locality**.
61 | - 👍 When we’d like to access an entire user record.
62 | - 👎 Makes queries accessing individual fields of multiple user records more 🤑, since data for the other fields will be **paged in** as well.
63 |
64 | ### Column-Oriented Data Layout
65 |
66 | - Here, values for the same column are stored contiguously on disk.
67 | - Storing values for different columns in separate files or file segments allows efficient queries by column, since they can be **read in one pass** rather than consuming entire rows and discarding data for columns that weren’t queried 🤷.
68 | - Column-oriented stores are a good fit for analytical workloads that **compute aggregates**, such as *finding trends*, *computing average values*, etc.
69 | - Processing complex aggregates can be used in cases when logical records have multiple fields, but some of them (in this case, price quotes) have different importance and are often consumed together.
70 | - Values belonging to the same row are stored closely together:
71 | | Rows |
72 | | ---------------------------------------------------------------- |
73 | | Symbol: 1:DOW; 2:DOW; 3:S&P; 4:S&P |
74 | | Date: 1:08 Aug 2018; 2:09 Aug 2018; 3:08 Aug 2018; 4:09 Aug 2018 |
75 | | Price: 1:24,314.65; 2:24,136.16; 3:2,414.45; 4:2,232.32 |
76 | - To reconstruct data tuples, which might be useful for joins, filtering, and multi-row aggregates, we need to preserve some **metadata** on the column level to identify which data points from other columns it is associated with. If you do this explicitly, each value will have to hold a **key**, which introduces duplication and increases the amount of stored data 🥶.
77 | - During the last several years, likely due to a rising demand to run complex analytical queries over growing datasets, we’ve seen many 🆕 column-oriented **file formats** such as *Apache Parquet*, *Apache ORC*, *RCFile*, as well as column-oriented stores, such as *Apache Kudu*, *ClickHouse*, and many others.
78 |
79 | ### Distinctions and Optimizations
80 |
81 | - Reading multiple values for the same column in one run significantly improves **cache utilization** and **computational efficiency**. On modern CPUs, vectorized instructions (SIMD) can be used to process multiple data points with a single CPU instruction.
82 | - Storing values that have the same data type together offers a better **compression** ratio. We can use different compression algorithms depending on the data type and pick the most effective compression method for each case.
83 | - To decide whether to use a column- or a row-oriented store, you need to understand your **access patterns**.
84 | - If the read data is consumed in records (i.e., most or all of the columns are requested) and the workload consists mostly of **point queries** and **range scans**, the row-oriented approach is likely to yield better results.
85 | - If scans span many rows, or **compute aggregate** over a subset of columns, it is worth considering a column-oriented approach.
86 |
87 | ### Wide Column Stores
88 |
89 | - Column-oriented databases should not be mixed up with wide column stores, such as *BigTable* or *HBase*, where data is represented as a **multidimensional map**, columns
90 | are grouped into **column families** (usually storing data of the same type), and inside each column family, data is stored **row-wise**. This layout is best for storing data retrieved by a **key** or a **sequence of keys**.
91 | - A canonical example from the *Bigtable* paper is a *Webtable*. A *Webtable* stores snapshots of web page contents, their attributes, and the relations among them at a specific timestamp. 
92 | - Data is stored in a multidimensional sorted map with hierarchical indexes:
93 | - We can locate the data related to a specific web page by its reversed URL and its *contents* or *anchors* by the timestamp.
94 | - Each row is indexed by its row key. Related columns are grouped together in column families — *contents* and *anchor* — which are stored on disk separately.
95 | - Each column inside a column family is identified by the column key, which is a combination of the column family name and a qualifier (html, cnnsi.com, my.look.ca in this example).
96 | - Column families store multiple versions of data by timestamp. This layout allows us to quickly locate the higher-level entries (web pages, in this case) and their parameters (versions of content and links to the other pages).
97 |
98 | ## Data Files and Index Files
99 |
100 | ❓ Why do we need a database management system and **not just a bunch of files**? How does file organization improve efficiency? 🤔
101 |
102 | - Database systems do use files for storing the data, but instead of relying on **filesystem hierarchies** of directories and files for locating records, they compose files using **implementation-specific** formats. The main reasons to use specialized file organization over flat files are:
103 | - 👍 **Storage efficiency** - Files are organized in a way that minimizes storage overhead per stored data record.
104 | - 👍 **Access efficiency** - Records can be located in the smallest possible number of steps.
105 | - 👍 **Update efficiency** - Record updates are performed in a way that minimizes the number of changes on disk.
106 | - A database system usually separates **data files** and **index files**: data files store **data records**, while index files store record **metadata** and use it to locate records in data files.
107 | - Files are partitioned into **pages**, which typically have the size of a single or multiple disk blocks. Pages can be organized as sequences of records or as a *slotted pages*.
108 |
109 | ### Data Files
110 |
111 | - Data files (sometimes called **primary files**) can be implemented as:
112 | - **heap-organized tables (heap files)**: Records in heap files are not required to follow any particular **order**, and most of the time they are placed in a write order. This way, no additional work or file reorganization is required when new pages are appended. Heap files require additional index structures, pointing to the locations where data records are stored, to make them searchable.
113 | - **hash-organized tables (hashed files)**: records are stored in **buckets**, and the hash value of the key determines which bucket a record belongs to. Records in the bucket can be stored in append order or sorted by key to improve lookup speed.
114 | - **index-organized tables (IOT)**: store data records in the **index itself**. Since records are stored in key order, range scans in IOTs can be implemented by sequentially scanning its contents.
115 |
116 | ### Index Files
117 |
118 | - An index on a primary (data) file is called the **primary index**. However, in most cases we can also assume that the primary index is built over a **primary key** or a **set of keys** identified as primary. All other indexes are called **secondary**.
119 | - Secondary indexes can point directly to the data record, or simply store its primary key.
120 | - If the order of data records follows the **search key order**, this index is called **clustered** (also known as clustering). Data records in the clustered case are usually stored in the same file or in a clustered file, where the key order is **preserved**.
121 | - If the data is stored in a **separate file**, and its order does not follow the key order, the index is called **non-clustered** (sometimes called **unclustered**).
122 | - 💁 Index-organized tables store information in index order and are clustered by definition. Primary indexes are most often clustered. Secondary indexes are nonclustered by definition, since they’re used to facilitate access by keys other than the primary one. Clustered indexes can be both index-organized or have separate index and data files.
123 |
124 | ### Primary Index as an Indirection
125 |
126 | - There are different opinions in the database community on whether data records should be referenced directly (through file offset) or via the primary key index.
127 | - By referencing **data directly**:
128 | - 👍 we can reduce the number of disk seeks,
129 | - 👎 but have to pay a **cost of updating** the pointers whenever the record is updated or relocated during a maintenance process.
130 | - Using **indirection** in the form of a primary index allows us to:
131 | - 👍 reduce the cost of **pointer updates**
132 | - 👎 but has a higher cost on a **read path**.
133 | 
134 |
135 | ### Buffering, Immutability, and Ordering
136 |
137 | Storage structures have three common variables: they use **buffering** (or avoid using it), use **immutable** (or mutable) files, and store values in **order** (or out of order).
138 | - **Buffering**: This defines whether or not the storage structure chooses to collect a certain amount of data in memory before putting it on disk. Of course, every on-disk
139 | structure has to use buffering to **some degree**, since the smallest unit of data transfer to and from the disk is a block, and it is desirable to write **full blocks**. Here, we’re talking about avoidable buffering, something storage engine implementers choose to do.
140 | - **Mutability** (or immutability): This defines whether or not the storage structure reads parts of the file, updates them, and writes the updated results at the **same location** in the file.
141 | - Immutable structures are **append-only**: once written, file contents are not modified.
142 | - There are other ways to implement immutability. One of them is **copy-on-write**.
143 | - **Ordering**" This is defined as whether or not the data records are stored in the **key order** in the pages on disk. In other words, the keys that sort closely are stored in contiguous segments on disk. Ordering often defines whether or not we can efficiently scan the range of records, not only locate the individual data records. Storing data out of order (most often, in insertion order) opens up for some write-time optimizations. For example, *Bitcask* and *WiscKey* store data records directly in append-only files
--------------------------------------------------------------------------------
/database-internals/chapter-02-b-tree-basics.md:
--------------------------------------------------------------------------------
1 | # Chapter 2: B-Tree Basics
2 |
3 | ## Binary Search Tree
4 |
5 | - Insertion might lead to the situation where the tree is **unbalanced**. The worst-case scenario is where we end up with a **pathological** tree, which looks more like a linked list, and instead of desired logarithmic complexity `O(log2 N)`, we get linear `O(N)`.
6 | - One of the ways to keep the tree **balanced** is to perform a **rotation** step after nodes are added or removed.
7 | - If the insert operation leaves a branch unbalanced, we can rotate nodes around the middle one.
8 | - In the example below, during rotation the middle `node (3)`, known as a **rotation pivot**, is promoted one level higher, and its parent becomes its right child.
9 | 
.
10 |
11 | - At the same time, due to **low fanout** (fanout is the maximum allowed number of children per node), we have to perform balancing, relocate nodes, and update pointers rather frequently. Increased maintenance costs make BSTs impractical as on-disk data structures.
12 | - If we wanted to maintain a BST on disk, we’d face several problems:
13 | - locality: node child pointers may span across several disk pages, since elements are added in random order (we can improve the situation by modifying the tree layout and using **paged binary trees**).
14 | - tree height: since BT has a fanout of just two, height is a binary logarithm of the number of the elements in the tree, and we have to perform O(log2 N) **seeks** to locate the searched element and, subsequently, perform the same number of **disk transfers**.
15 | - Considering these factors, a version of the tree that would be better suited for **disk implementation** has to exhibit the following properties:
16 | - High fanout to improve locality of the neighboring keys.
17 | - Low height to reduce the number of seeks during traversal.
18 |
19 | ## Disk-Based Structures
20 |
21 | On-disk data structures are often used when the amounts of data are so large that keeping an entire dataset in memory is impossible or not feasible. Only a fraction of the data can be cached in memory at any time, and the rest has to be stored on disk in a manner that allows efficiently accessing it.
22 |
23 | ### Hard Disk Drives
24 |
25 | On spinning disks, seeks increase costs of random reads because they require disk rotation and mechanical head movements to position the read/write head to the desired location. However, once the expensive part is done, reading or writing contiguous bytes (i.e., sequential operations) is relatively cheap.
26 |
27 | Head positioning is the most expensive part of an operation on the HDD. This is one of the reasons we often hear about the positive effects of sequential I/O: reading and writing contiguous memory segments from disk.
28 |
29 | ### Solid State Drives
30 |
31 | Since in both device types (HDDs and SSDs) we are addressing chunks of memory rather than individual bytes (i.e., accessing data block-wise), most operating systems have a block device abstraction. It hides an internal disk structure and buffers I/O operations internally, so when we’re reading a single word from a block device, the whole block containing it is read. This is a constraint we cannot ignore and should always take into account when working with disk-resident data structures.
32 |
33 | In SSDs, we don’t have a strong emphasis on random versus sequential I/O, as in HDDs, because the difference in latencies between random and sequential reads is not as large. There is still some difference caused by prefetching, reading contiguous
34 | pages, and internal parallelism.
35 |
36 | Even though garbage collection is usually a background operation, its effects may negatively impact write performance, especially in cases of random and unaligned write workloads. Writing only full blocks, and combining subsequent writes to the same block, can help to reduce the number of required I/O operations.
37 |
38 | ### On-Disk Structures
39 |
40 | Besides the cost of disk access itself, the main limitation and design condition for building efficient on-disk structures is the fact that the smallest unit of disk operation is a block. To follow a pointer to the specific location within the block, we have to fetch an entire block. Since we already have to do that, we can change the layout of the data structure to take advantage of it.
41 |
42 | In summary, on-disk structures are designed with their target storage specifics in mind and generally optimize for fewer disk accesses. We can do this by improving locality, optimizing the internal representation of the structure, and reducing the
43 | number of out-of-page pointers.
44 |
45 | We came before to the conclusion that high fanout and low height are desired properties for an optimal on-disk data structure. We’ve also just discussed additional space overhead coming from pointers, and maintenance over‐
46 | head from remapping these pointers as a result of balancing. B-Trees combine these ideas: increase node fanout, and reduce tree height, the number of node pointers, and the frequency of balancing operations.
47 |
48 | ## Ubiquitous B-Trees
49 |
50 | B-Trees build upon the foundation of balanced search trees and are different in that they have higher fanout and smaller height.
--------------------------------------------------------------------------------
/designing-data-intensive-applications/README.md:
--------------------------------------------------------------------------------
1 | # Designing Data Intensive Applications
2 |
3 | notes taken from reading the _Designing Data Intensive Applications_ book by Martin Kleppmann.
4 |
--------------------------------------------------------------------------------
/designing-data-intensive-applications/assets/b-trees-structure.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/designing-data-intensive-applications/assets/b-trees-structure.png
--------------------------------------------------------------------------------
/designing-data-intensive-applications/assets/graph-structured-data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/designing-data-intensive-applications/assets/graph-structured-data.png
--------------------------------------------------------------------------------
/designing-data-intensive-applications/assets/hash-indexes.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/designing-data-intensive-applications/assets/hash-indexes.png
--------------------------------------------------------------------------------
/designing-data-intensive-applications/assets/response-time.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/designing-data-intensive-applications/assets/response-time.png
--------------------------------------------------------------------------------
/designing-data-intensive-applications/chapter-01-reliable-scalable-and-maintainable-applications.md:
--------------------------------------------------------------------------------
1 | # Chapter 1. Reliable, Scalable and Maintainable Applications
2 |
3 | - A data-intensive application is typically built from standard building blocks which provide commonly needed functionality.
4 | - Store data so that they, or another application, can find it again later (**databases**),
5 | - Remember the result of an expensive operation, to speed up reads (**caches**),
6 | - Allow users to search data by keyword or filter it in various ways (**search indexes**),
7 | - Send a message to another process, to be handled asynchronously (**message queues**),
8 | - Observe what is happening, and act on events as they occur (**stream processing**),
9 | - Periodically crunch a large amount of accumulated data (**batch processing**).
10 |
11 | ## Thinking About Data Systems
12 |
13 | - Even though each category of these systems serves a specific purpose, many new tools for data storage and processing have emerged that are optimized for a variety of different use cases:
14 | - For example, there are data stores that are also used as **message queues (Redis)**,
15 | - and there are **message queues with database-like durability guarantees (Kafka)**,
16 | - ▶️ the boundaries between the categories are becoming **blurred**.
17 | - If you are designing a data system or service, a lot of tricky questions arise:
18 | - How do you ensure that the data remains correct and complete, even when things go wrong internally?
19 | - How do you provide consistently good performance to clients, even when parts of your system are degraded?
20 | - How do you scale to handle an increase in load? What does a good API for the service look like?
21 | - We focus on three concerns that are important in most software systems:
22 | - Reliability
23 | - Scalability
24 | - Maintainability
25 |
26 | ## Reliability
27 |
28 | - We understand reliability as meaning, roughly: _continuing to work correctly, even when things go wrong_.
29 | - The things that can go wrong are called **faults**, and systems that anticipate faults and can cope with them are called **fault-tolerant**.
30 | - Note that a fault is not the same as a **failure** for an overview of the terminology. A **fault** is usually defined as **one** component of the system **deviating from its spec**, whereas a **failure** is when the system as a **whole** stops providing the required service to the user.
31 |
32 | ### Hardware faults
33 |
34 | - Hard disks crash, RAM becomes faulty, the power grid has a blackout, someone unplugs the wrong network cable. Anyone who has worked with **large data centers** can tell you that these things happen **all the time** when you have a lot of machines.
35 | - Hard disks are reported as having a **mean time to failure (MTTF)** of about 10 to 50 years. Thus, on a storage cluster with 10,000 disks, we should expect on average **one disk to die per day** :smiley:
36 | - There is a move towards systems that can tolerate the **loss of entire machines**, by using software fault-tolerance techniques in preference to hardware redundancy. Such systems also have operational advantages:
37 | - A single-server system requires planned downtime if you need to reboot the machine (to apply security patches, for example).
38 | - Whereas a system that can tolerate machine failure can be patched one node at a time, without downtime of the entire system.
39 |
40 | ### Software errors
41 |
42 | - Harder to anticipate because they are correlated across nodes, they tend to cause many more system failures than uncorrelated hardware faults. Examples include:
43 | - A software **bug** that causes every instance of an application server to crash when given a particular bad
44 | input. For example, consider the leap second on _June 30, 2012_ that caused many applications to hang simultaneously, due to a **bug in the Linux kernel**.
45 | - A **runaway process** uses up some shared resource—CPU time, memory, disk space or network bandwidth.
46 | - A **service** that the system depends on slows down, becomes **unresponsive** or starts returning corrupted responses.
47 | - **Cascading failures**, where a small fault in one component triggers a fault in another component, which in turn triggers further faults.
48 |
49 | ## Human errors
50 |
51 | - How do we make our system reliable, in spite of unreliable humans?:
52 | - Design systems in a way that **minimizes opportunities for error**. For example, **well-designed abstractions, APIs and admin interfaces** make it easy to do “the right thing”, and discourage “the wrong thing”.
53 | - Provide fully-featured **non-production** sandbox environments where people can explore and experiment safely, using real data, without affecting real users.
54 | - Test thoroughly at all levels, from **unit tests** to whole-system **integration tests** and **manual tests**.
55 | - Allow quick and easy recovery from human errors, to minimize the impact in the case of a failure. For example, make it **fast to roll back** configuration changes, **roll out new code gradually** (so that any unexpected bugs affect only a small subset of users), and provide tools to recompute data (in case it turns out that the old computation was incorrect).
56 | - Set up **detailed and clear monitoring and telemetry**, such as performance metrics and error rates.
57 | - How important is reliability? Bugs in business applications cause **lost productivity** (and legal risks if figures are reported incorrectly), and outages of e-commerce sites can have **huge costs** in terms of lost revenue and reputation.
58 |
59 | ## Scalability
60 |
61 | - Scalability is the term we use to describe a system’s ability to **adapt to increased load**.
62 |
63 | ### Describing load
64 |
65 | - Load can be described with a few numbers which we call **load parameters**. Examples:
66 | - it’s requests per second;
67 | - ratio of reads to write;
68 | - the number of simultaneously active users;
69 | - number of messages in the queue ...
70 |
71 | ### Describing performance
72 |
73 | - In a batch-processing system such as _Hadoop_, we usually care about **throughput**: the number of records we can process per second, or the total time it takes to run a job on a dataset of a certain size.
74 | - In online systems, **latency** is usually more important—the time it takes to serve a request, also known as _response time_.
75 | - In practice, in a system handling a variety of requests, the latency per request can **vary a lot**. We therefore need to think of latency not as a **single number**, but as a **probability distribution**.
76 | - Even in a scenario where you’d think all requests should take the same time, you get variation: random additional latency could be introduced by:
77 | - a context switch to a background process;
78 | - the loss of a network packet and TCP retransmission;
79 | - a garbage collection pause, a page fault forcing a read from disk, or many other things.
80 | - It’s common to see the **average** response time of a service reported (arithmetic mean).
81 | - However, the mean is not a very good metric if you want to know your “typical” response time, because it is easily **biased by outliers**.
82 | - Usually it is better to use **percentiles**. If you take your list of response times and sort it, from fastest to slowest, then the median is the half-way point: for example, if your median response time is 200 ms, that means half your requests return in less than 200 ms, and half your requests take longer than that.
83 | - This makes the median a **good metric** if you want to know how long users typically have to wait. The median is also known as **50th percentile**, and sometimes abbreviated as **p50**.
84 | 
85 |
86 | ### Approaches for coping with load
87 |
88 | - good architectures usually involve a pragmatic mixture of approaches (vertical scaling and horizontal scaling).
89 | - there is no such a thing as **magic scaling sauce**. The problem may be:
90 | - the volume of reads, the volume of writes, the volume of data to store;
91 | - the complexity of the data, the latency requirements, the access patterns;
92 | - or (usually) some mixture of all of these plus many more issues.
93 | - In an **early-stage startup** or an unproven product it’s usually more important to be able to **iterate quickly** on product features, than it is to scale to some **hypothetical future load**.
94 |
95 | ## Maintainability
96 |
97 | - It is well-known that the majority of the cost of software is not in its initial development, but in its ongoing maintenance.
98 | - We will pay particular attention to three design principles for software systems:
99 | - **Operability**: Make it easy for operations teams to keep the system running smoothly
100 | - **Simplicity**: Make it easy for new engineers to understand the system, by removing as much complexity as possible from the system.
101 | - **Plasticity**: Make it easy for engineers in future to make changes to the system, adapting it for unanticipated use cases as requirements change. Also known as extensibility, modifiability or malleability.
102 |
103 | ### Operability: making life easy for operations
104 |
105 | - _Good operations can often work around the limitations of bad (or incomplete) software, but good software cannot run reliably with bad operations._
106 |
107 | ## Simplicity: managing complexity
108 |
109 | - *Moseley and Marks* define complexity as _accidental_ if it is not inherent in the problem that the software solves (as seen by the users), but arises only from the implementation.
110 | - One of the best tools we have for removing accidental complexity is **abstraction**. A good abstraction can hide a great deal of implementation detail behind a clean, simple-to-understand facade.
111 | - High-level programming languages are abstractions that hide machine code, CPU registers and syscalls.
112 | - SQL is an abstraction that hides complex on-disk and in-memory data structures, concurrent requests from other clients, and inconsistencies after crashes.
113 |
114 | ## Plasticity: making change easy
115 |
116 | - In terms of organizational processes, _agile_ working patterns provide a framework for adapting to change. The agile community has also developed technical tools and patterns that are helpful when developing software in a frequently-changing environment, such as **test-driven development (TDD)** and **refactoring**.
117 |
--------------------------------------------------------------------------------
/designing-data-intensive-applications/chapter-03-storage-and-retrieval.md:
--------------------------------------------------------------------------------
1 | # Chapter 3. Storage and Retrieval
2 |
3 | ## Data Structures That Power Your Database
4 |
5 | - Many databases internally use a _log_, an **append-only** data file, which has pretty good performance and generally very efficient.
6 | - Real databases have more issues to deal with (such as **concurrency** control, **reclaiming disk** space so that the log doesn’t grow forever, handling **errors**, **partially written** records, and so on) but the basic principle is the same. Logs are incredibly useful.
7 | - In order to efficiently find the value for a particular key in the database, we need a different data structure: an **index**.
8 | - The idea behind them is simple: keep some additional metadata on the side, which acts as a signpost and helps you to locate the data you want.
9 | - If you want to search the same data in several different ways, you may need several different indexes on different parts of the data.
10 | - An index is an additional structure that is derived from the primary data—many databases allow you to add and remove indexes, and this doesn’t affect the contents of the database, it only affects the performance of queries. Maintaining additional structures is **overhead**, especially on **writes**. For writes, it’s hard to beat the performance of simply appending to a file, so any kind of index usually slows down writes.
11 | - This is an important trade-off in storage systems: well-chosen indexes can **speed up read** queries, but every index **slows down writes**. For this reason, databases don’t usually index everything **by default**, but require you to choose indexes manually, using your knowledge of the application’s typical query patterns.
12 |
13 | ### Hash indexes
14 |
15 | - Key-value stores are quite similar to the dictionary type that you can find in most programming languages, and which is usually implemented as a **hash map** (hash table).
16 | - Whenever you append a new key-value pair to the file, you also update the hash map to reflect the offset of the data you just wrote (this works both for inserting new keys and for updating existing keys). When you want to look up a value, use the hash map to find the offset in the data file, seek to that location, and read the value. 
17 | - How do we avoid eventually running out of disk space? A good solution is to break the log into **segments** of a certain size, and to periodically run a background process for **compaction and merging** of segments.
18 | - **Compaction** means throwing away all key-value pairs in the log except for the most recent update for each key. This makes the segments **much smaller** (assuming that every key is updated multiple times on average), so we can also **merge** several segments into one.
19 | - Each segment now has its **own in-memory** hash table, mapping keys to file offsets. In order to find the value for a key, we first check the **most recent** segment’s hash map; if the key is not present we check the second-most-recent segment, and so on.
20 | - Pros :+1::
21 | - Appending and segment merging are **sequential write operations**, which are generally much faster than **random writes**. This performance difference applies both to traditional **spinning-disk** hard drives and to **flash-based** solid state drives (SSDs).
22 | - **Concurrency** and **crash recovery** are much simpler if files are immutable. For example, you don’t have to worry about the case where a crash happened while a value was being overwritten, leaving you with a file containing part of the old and part of the new value spliced together.
23 | - Merging old segments avoids problems of data files getting **fragmented** over time.
24 | - Cons :-1::
25 | - The hash table must **fit in memory**, so if you have a very large number of keys, you’re out of luck. In principle, you could maintain a hash map on disk, but unfortunately it is difficult to make an on-disk hash map perform well. It requires a lot of random access I/O, it is expensive to grow when it becomes full, and hash collisions require fiddly logic.
26 | - **Range queries** are not efficient. For example, you cannot easily fetch the values for all keys between `kitty00000` and `kitty99999`, you’d have to look up each key individually in the hash maps.
27 | - 👁️ Example of storage engines using a log-structured hash table: [Bitcask](https://en.wikipedia.org/wiki/Bitcask).
28 |
29 | ## SSTables and LSM-trees
30 |
31 | - In **Sorted String Table**, or SSTable for short, we require that the sequence of key-value pairs is **sorted by key**. We also require that each key only **appears once** within each merged segment file (the compaction process already ensures that).
32 | - SSTables have several big advantages over log segments with hash indexes 👍:
33 | - **Merging segments is simple and efficient**, even if the files are bigger than the available memory. You start reading the input files side-by-side, look at the first key in each file, copy the lowest-numbered key to the output file, and repeat. If the same key appears in several input files, keep the one from the most recent input file, and discard the values in older segments. This produces a new merged segment which is also sorted by key, and which also has exactly one value per key.
34 | - In order to find a particular key in the file, you **no longer need to keep an index** of all the keys in memory. You still need an in-memory index to tell you the **offsets for some of the keys**, but it can be sparse: one key for every few kilobytes of segment file is sufficient, because a few kilobytes can be scanned very quickly.
35 | - Since read requests need to scan over several key-value pairs in the requested range anyway, it is possible to **group those records into a block and compress it** before writing it to disk. Each entry of the sparse in-memory index then points at the start of a compressed block. Nowadays, **disk bandwidth is usually a worse bottleneck than CPU**, so it is worth spending a few additional CPU cycles to reduce the amount of data you need to write to and read from disk.
36 |
37 | ### Constructing and maintaining SSTables
38 |
39 | - Maintaining a sorted structure on disk is possible, but maintaining it in memory is much easier. There are plenty of well-known tree data structures that you can use, such as _Red-Black trees_ or _AVL_ trees. With these data structures, you can insert keys in any order, and read them back in **sorted** order.
40 | - When a write comes in, add it to an in-memory balanced tree data structure. This in-memory tree is sometimes called a **memtable**.
41 | - When the memtable gets bigger than some threshold — typically a few megabytes — write it out to disk as an **SSTable** file. This can be done efficiently because the tree already maintains the key-value pairs sorted by key. The new SSTable file becomes the most recent segment of the database. When the new SSTable is ready, the memtable can be emptied.
42 | - The basic idea - keeping a cascade of SSTables that are merged in the **background** - is simple and effective. Even when the dataset is much bigger than memory it continues to work well. Since data is stored in sorted order, you can efficiently perform range queries (scanning all keys above some minimum and up to some maximum). And because the disk writes are sequential, the LSM-tree can support remarkably **high write throughput**.
43 |
44 | ### Making an LSM-tree out of SSTables
45 |
46 | - ⭐The algorithm described here is essentially what is used in **LevelDB** and **RocksDB**, key-value storage engine libraries.
47 | - Originally this indexing structure was described by *Patrick O’Neil* under the name **Log-Structured Merge-Tree** building on earlier work on log-structured filesystems. Storage engines that are based on this principle of merging and compacting sorted files are often called **LSM storage engines**
48 | - ⭐ **Lucene**, an indexing engine for FTS used by **Elasticsearch** and **Solr**, uses a similar method for storing its term dictionary. A full-text index is much more
49 | complex than a KV index but is based on a similar idea:
50 | - Given a word in a search query, find all the documents that mention the word. This is implemented with a KV structure where the key is a word (a **term**) and the value is the list of IDs of all the documents that contain the word (the **postings list**). In Lucene, this mapping from term to postings list is kept in SSTable-like sorted files, which are merged in the background as needed.
51 |
52 | ### Performance optimizations
53 |
54 | - LSM-tree algorithm can be **slow** when looking up keys that do **not exist** in the database:
55 | 1. You have to check the memtable,
56 | 2. Then the segments all the way back to the **oldest** (possibly having to read from disk for each one) before you can be sure that the key does not exist.
57 | - In order to optimize this kind of access, storage engines often use additional **Bloom filters**.
58 | - There are also different strategies to determine the **order** and **timing** of how SSTables are compacted and merged. The most common options are **size-tiered** and **leveled compaction**.
59 | - ▶️ **LevelDB** and **RocksDB** use leveled compaction (hence the name of Lev‐lDB), **HBase** uses size-tiered, and **Cassandra** supports both.
60 |
61 | ## B-trees
62 |
63 | - Are the most widely-used indexing structure.
64 | - Remain the standard index implementation in almost all relational databases, and many non-relational databases use them too.
65 | - The **log-structured indexes** we saw earlier break the database down into **variable-size segments**, typically several megabytes or more in size, and always write a segment sequentially. By contrast, **B-trees** break the database down into **fixed-size blocks** or pages, traditionally `4 kB` in size, and read or write one page at a time. This corresponds more closely to the underlying hardware, as disks are also arranged in fixed-size blocks.
66 | - Each page can be identified using an address or location, which allows one page to refer to another—similar to a **pointer**, but on disk instead of in memory. We can use these page references to construct a tree of pages: 
67 | - The number of references to child pages in one page of the B-tree is called the **branching factor**.
68 | - A four-level tree of 4 KB pages with a branching factor of 500 can store up to 256 TB 🆗.
69 |
70 | ### Making B-trees reliable
71 |
72 | - Some operations require several **different pages** to be overwritten. For example, if you split a page because an insertion caused it to be overfull, you need to write the two pages that were split, and also overwrite their parent page to update the references to the two child pages.
73 | - ⚠️ This is a dangerous operation, because if the DB crashes after only some of the pages have been written, you end up with a corrupted index.
74 | - In order to make the DB resilient to crashes, it is common for B-tree implementations to include an additional data structure on disk: a **write-ahead log** (WAL, also known as a *redo log*). This is an append-only file to which every B-tree modification must be written before it can be applied to the pages of the tree itself. When the DB comes back up after a crash, this log is used to restore the B-tree back to a consistent state.
75 | - An additional complication of updating pages in place is that careful **concurrency** control is required if multiple threads are going to access the B-tree at the same time - otherwise a thread may see the tree in an inconsistent state.
76 | - This is typically done by protecting the tree’s data structures with **latches** (lightweight locks). Log-structured approaches are simpler in this regard, because they do all the merging in the background **without interfering** with incoming queries and atomically swap old segments for new segments from time to time.
77 |
78 | ### B-tree optimizations
79 |
80 | - Instead of overwriting pages and maintaining a WAL for crash recovery, some databases (like **LMDB**) use a CoW scheme.
81 | - We can save space in pages by not storing the entire key, but **abbreviating** it. keys only need to provide enough information to act as boundaries between key ranges.
82 | - ▶️ Packing more keys into a page allows the tree to have a **higher branching factor**, and thus **fewer levels**.
83 |
84 | ### Comparing B-Trees and LSM-Trees
85 |
86 | - As a rule of thumb, LSM-trees are typically faster for **writes**, whereas B-trees are thought to be faster for **reads**.
87 | - Reads are typically slower on LSM-trees because they have to check several different data structures and SSTables at different stages of compaction.
88 | - 👍 Advantages of LSM-trees:
89 | - A B-tree index must write every piece of data at least **twice**: once to the write-ahead log, and once to the tree page itself (and perhaps again as pages are split).
90 | - Log-structured indexes also rewrite data **multiple** times due to repeated compaction and merging of SSTables. This effect —one write to the database resulting in multiple writes to the disk over the course of the database’s lifetime— is known as **write amplification**.
91 | - LSM-trees are typically able to sustain **higher write throughput** than Btrees, partly because they sometimes have **lower write amplification** (although this depends on the storage engine configuration and workload), and partly because they **sequentially** write compact SSTable files rather than having to overwrite several pages in the tree. This difference is particularly important on magnetic hard drives, where sequential writes are much faster than random writes.
92 | - LSM-trees can be **compressed** better, and thus often produce smaller files on disk than B-trees.
93 |
94 |
95 | ## Multi-column indexes
96 |
97 | - The most common type of multi-column index is called a **concatenated index**, which simply combines several fields into one key by appending one column to another (the index definition specifies in which order the fields are concatenated).
98 | - Multi-dimensional indexes are a more general way of querying several columns at once, which is particularly important for **geospatial** data:
99 |
100 | ```sql
101 | SELECT * FROM restaurants
102 | WHERE latitude > 51.4946 AND latitude < 51.5079 AND longitude > -0.1162 AND longitude < -0.1004;
103 | ```
104 |
105 | - A standard B-tree or LSM-tree index is not able to answer that kind of query efficiently: it can give you either all the restaurants in a range of latitudes (but at any longitude), or all the restaurants in a range of longitudes (but anywhere between north and south pole), but not both simultaneously.
106 | - One option is to translate a two-dimensional location into a single number using a space-filling curve, and then to use a regular B-tree index. More commonly, specialized spatial indexes such as R-trees are used. For example, PostGIS implements geospatial indexes as R-trees using PostgreSQL’s Generalized Search Tree indexing facility.
107 |
--------------------------------------------------------------------------------
/effective-cpp/assets/deadly-mi-diamond.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/effective-cpp/assets/deadly-mi-diamond.png
--------------------------------------------------------------------------------
/effective-cpp/assets/virtual-inheritance.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/effective-cpp/assets/virtual-inheritance.png
--------------------------------------------------------------------------------
/go-concurrency-patterns/.gitignore:
--------------------------------------------------------------------------------
1 | # If you prefer the allow list template instead of the deny list, see community template:
2 | # https://github.com/github/gitignore/blob/main/community/Golang/Go.AllowList.gitignore
3 | #
4 | # Binaries for programs and plugins
5 | *.exe
6 | *.exe~
7 | *.dll
8 | *.so
9 | *.dylib
10 |
11 | # Test binary, built with `go test -c`
12 | *.test
13 |
14 | # Output of the go coverage tool, specifically when used with LiteIDE
15 | *.out
16 |
17 | # Dependency directories (remove the comment below to include it)
18 | # vendor/
19 |
20 | # Go workspace file
21 | go.work
22 | go.work.sum
23 |
24 | # env file
25 | .env
26 |
27 |
--------------------------------------------------------------------------------
/go-concurrency-patterns/01-generator/main.go:
--------------------------------------------------------------------------------
1 | package main
2 |
3 | import (
4 | "fmt"
5 | "math/rand"
6 | "time"
7 | )
8 |
9 | func boring(msg string) <-chan string { // Return receive-only channel of strings
10 |
11 | c := make(chan string)
12 |
13 | go func() {
14 | for i := 0; ; i++ {
15 | c <- fmt.Sprintf("%s %d", msg, i)
16 | time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond)
17 | }
18 | }()
19 |
20 | return c
21 |
22 | }
23 |
24 | func main() {
25 | joe := boring("Joe")
26 | ann := boring("Ann")
27 | for i := 0; i < 5; i++ {
28 | // Because of the sync nature of channels, the first chan (joe)
29 | // will block (ann's channel) from executing even though `ann`
30 | // might be ready to send a value!
31 | // We can get around that by using the `fan` or `multiplexing` pattern.
32 | fmt.Println(<-joe)
33 | fmt.Println(<-ann)
34 | }
35 | }
36 |
--------------------------------------------------------------------------------
/go-concurrency-patterns/02-fan-in/main.go:
--------------------------------------------------------------------------------
1 | package main
2 |
3 | import (
4 | "fmt"
5 | "math/rand"
6 | "time"
7 | )
8 |
9 | type Msg struct {
10 | str string
11 | wait chan bool
12 | }
13 |
14 | func boring(msg string) <-chan string { // Return receive-only channel of strings
15 |
16 | c := make(chan string)
17 |
18 | go func() {
19 | for i := 0; ; i++ {
20 | c <- fmt.Sprintf("%s %d", msg, i)
21 | time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond)
22 | }
23 | }()
24 |
25 | return c
26 |
27 | }
28 |
29 | func boringWithOrder(msg string) <-chan Msg {
30 | c := make(chan Msg)
31 | waitForIt := make(chan bool) // shared between all messages
32 | go func() {
33 | for i := 0; ; i++ {
34 | c <- Msg{
35 | str: fmt.Sprintf("%s %d", msg, i),
36 | wait: waitForIt,
37 | }
38 | time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond)
39 |
40 | // The code waits until the value to be received.
41 | <-waitForIt
42 | }
43 |
44 | }()
45 | return c
46 | }
47 |
48 | func fanIn(input1, input2 <-chan string) <-chan string {
49 | c := make(chan string)
50 |
51 | go func() {
52 | for {
53 | c <- <-input1
54 | }
55 | }()
56 |
57 | go func() {
58 | for {
59 | c <- <-input2
60 | }
61 |
62 | }()
63 | return c
64 | }
65 |
66 | func fanInWithOrder(inputs ...<-chan Msg) <-chan Msg {
67 |
68 | c := make(chan Msg)
69 |
70 | for i := range inputs {
71 | input := inputs[i]
72 | go func() {
73 | for {
74 | c <- <-input
75 | }
76 | }()
77 | }
78 | return c
79 | }
80 |
81 | // Rewrite our original fanIn function. Only one goroutine is needed.
82 | func fanInWithSelect(input1, input2 <-chan string) <-chan string {
83 |
84 | c := make(chan string)
85 |
86 | go func() {
87 | for {
88 | select {
89 | case s := <-input1:
90 | c <- s
91 | case s := <-input2:
92 | c <- s
93 |
94 | }
95 | }
96 |
97 | }()
98 |
99 | return c
100 | }
101 |
102 | func main() {
103 | // No order
104 | c := fanIn(boring("Joe"), boring("Ann"))
105 | for range 10 {
106 | fmt.Println(<-c)
107 | }
108 | fmt.Println("You're both boring. I'm leaving.")
109 |
110 | // Force order.
111 | c2 := fanInWithOrder(boringWithOrder("Joe"), boringWithOrder("Ann"))
112 | for range 5 {
113 | msg1 := <-c2 // block until the message is read
114 | fmt.Println(msg1.str)
115 | msg2 := <-c2
116 | fmt.Println(msg2.str)
117 |
118 | // each go routine have to wait
119 | msg1.wait <- true
120 | msg2.wait <- true
121 | }
122 | fmt.Println("You're both boring. I'm leaving.")
123 |
124 | // Using select and timeouts.
125 | c3 := fanInWithSelect(boring("Joe"), boring("Ann"))
126 | timeout := time.After(3 * time.Second)
127 | for {
128 | select {
129 | case s := <-c3:
130 | fmt.Println(s)
131 | case <-timeout:
132 | fmt.Println("global timeout reached.")
133 | return
134 | case <-time.After(1 * time.Second): // returns a channel after specified interval, this is a timeout for each iteration in the loop.
135 | fmt.Println("You're too slow.")
136 | return
137 | }
138 | }
139 | }
140 |
--------------------------------------------------------------------------------
/go-concurrency-patterns/03-fan-out/main.go:
--------------------------------------------------------------------------------
1 | package main
2 |
3 | import (
4 | "fmt"
5 | "sync"
6 | )
7 |
8 | func worker(id int, jobs <-chan int, results chan<- int, wg *sync.WaitGroup) {
9 | defer wg.Done()
10 | for job := range jobs {
11 | results <- job * 2
12 | }
13 | }
14 |
15 | func main() {
16 | jobs := make(chan int, 5)
17 | results := make(chan int, 5)
18 |
19 | var wg sync.WaitGroup
20 |
21 | // Fan-out: 3 workers
22 | for w := 1; w <= 3; w++ {
23 | wg.Add(1)
24 | go worker(w, jobs, results, &wg)
25 | }
26 |
27 | // Send jobs
28 | for i := 1; i <= 5; i++ {
29 | jobs <- i
30 | }
31 | close(jobs)
32 |
33 | wg.Wait()
34 | close(results)
35 |
36 | // Fan-in: collect results
37 | for res := range results {
38 | fmt.Println("Result:", res)
39 | }
40 | }
41 |
--------------------------------------------------------------------------------
/go-concurrency-patterns/04-daisy-chain/main.go:
--------------------------------------------------------------------------------
1 | package main
2 |
3 | import "fmt"
4 |
5 | const (
6 | N = 1000
7 | )
8 |
9 | func f(left, right chan int) {
10 | left <- 1 + <-right
11 | }
12 |
13 | func main() {
14 | leftmost := make(chan int)
15 | left := leftmost
16 | right := leftmost
17 |
18 | for i := 0; i < N; i++ {
19 | right = make(chan int) // create a new channel
20 | go f(left, right) // create a new goroutine
21 | left = right
22 | }
23 |
24 | go func(c chan int) {
25 | c <- 1
26 | }(right)
27 | fmt.Println(<-leftmost)
28 | }
29 |
--------------------------------------------------------------------------------
/go-concurrency-patterns/05-worker-pool/main.go:
--------------------------------------------------------------------------------
1 | package main
2 |
3 | import (
4 | "fmt"
5 | "sync"
6 | "time"
7 | )
8 |
9 | func worker(id int, jobs <-chan int, results chan<- int) {
10 | for j := range jobs {
11 | fmt.Println("worker", id, "started job", j)
12 | time.Sleep(time.Second)
13 | fmt.Println("worker", id, "finished job", j)
14 | results <- j * 2
15 | }
16 | }
17 |
18 | func workerEfficient(id int, jobs <-chan int, results chan<- int) {
19 | var wg sync.WaitGroup
20 |
21 | for j := range jobs {
22 | wg.Add(1)
23 |
24 | go func(job int) {
25 | fmt.Println("worker", id, "started job", j)
26 | time.Sleep(time.Second)
27 | fmt.Println("worker", id, "finished job", j)
28 | results <- j * 2
29 | wg.Done()
30 | }(j)
31 | }
32 | }
33 |
34 | func main() {
35 | const numJobs = 8
36 | jobs := make(chan int, numJobs)
37 | results := make(chan int, numJobs)
38 |
39 | // In this example, we define a fixed 3 workers
40 | // they receive the `jobs` from the channel jobs
41 | // we also naming the worker name with `w` variable.
42 | for w := 1; w <= 3; w++ {
43 | go workerEfficient(w, jobs, results)
44 | }
45 |
46 | // Push the jobs.
47 | for j := 1; j <= numJobs; j++ {
48 | jobs <- j
49 | }
50 | close(jobs)
51 | fmt.Println("closed jobs")
52 |
53 | for a := 1; a <= numJobs; a++ {
54 | <-results
55 | }
56 | close(results)
57 |
58 | }
59 |
--------------------------------------------------------------------------------
/go-concurrency-patterns/README.md:
--------------------------------------------------------------------------------
1 | # Go Concurrency Patterns
2 |
3 | - Go is the latest on the _Newsqueak-Alef-Limbo_ branch, distinguished by **first-class** channels.
4 | - The Go approach _Don't communicate by sharing memory, share memory by communicating._
5 |
6 | ## Generator
7 |
8 | - Generator: function that **returns a channel**.
9 | - Channels are **first-class values**, just like strings or integers.
10 | - Channels as a handle on a service.
11 | > Our boring function returns a channel that lets us communicate with the boring service it provides.
12 |
13 | ## Fan-In (Multiplexing)
14 |
15 | - The generator pattern makes _Joe_ and _Ann_ count in lockstep.
16 | - We can instead use a fan-in function to let whosoever is ready talk.
17 | - We stitch the two channel into a **single one**, and the fan-in function forwards the messages to the output channel.
18 | - Fan In is used when a **single function** reads from **multiple inputs** and proceeds until all are closed. This is made possible by multiplexing the input into a single channel.
19 | - What: Combine results from multiple goroutines into a single channel.
20 | - Why: Aggregate results or wait for all goroutines to complete.
21 |
22 | ## Fan-Out
23 |
24 | - What: Distribute work across multiple goroutines to run in parallel.
25 | - Why: Improve throughput by utilizing multiple CPU cores.
26 | - Example: Multiple workers pulling tasks from the same job queue.
27 |
28 | ## Daisy Chain
29 |
30 | - Goroutines and channels are chained together to pass data along a series of steps.
31 | - It’s often used to illustrate the power and simplicity of Go’s concurrency model and how cheap is a goroutine compared to a thread!
32 |
--------------------------------------------------------------------------------
/go-concurrency-patterns/examples/01-google-search/main.go:
--------------------------------------------------------------------------------
1 | package main
2 |
3 | import (
4 | "fmt"
5 | "math/rand"
6 | "time"
7 | )
8 |
9 | type Result string
10 | type Search func(query string) Result
11 |
12 | var (
13 | Web1 = fakeSearch("web1")
14 | Web2 = fakeSearch("web2")
15 | Image1 = fakeSearch("image1")
16 | Image2 = fakeSearch("image2")
17 | Video1 = fakeSearch("video1")
18 | Video2 = fakeSearch("video2")
19 | )
20 |
21 | func fakeSearch(kind string) Search {
22 | return func(query string) Result {
23 | time.Sleep((time.Duration(rand.Intn(100)) * time.Millisecond))
24 | return Result(fmt.Sprintf("%s result for %q\n", kind, query))
25 | }
26 | }
27 |
28 | // How do we avoid discarding result from the slow server?
29 | // We duplicate to many instances, and perform parallel requests.
30 | func First(query string, replicas ...Search) Result {
31 |
32 | c := make(chan Result)
33 |
34 | for i := range replicas {
35 | go func(idx int) {
36 | c <- replicas[idx](query)
37 | }(i)
38 | }
39 |
40 | // First function always wait for 1 time after receiving the result.
41 | return <-c
42 |
43 | }
44 |
45 | // Don't wait for the slowest server.
46 | func Google(query string) []Result {
47 |
48 | c := make(chan Result)
49 | var results []Result
50 |
51 | // each search is performed in a goroutine.
52 | go func() {
53 | c <- First(query, Web1, Web2)
54 | }()
55 | go func() {
56 | c <- First(query, Image1, Image2)
57 | }()
58 | go func() {
59 | c <- First(query, Video1, Video2)
60 | }()
61 |
62 | timeout := time.After(100 * time.Millisecond)
63 |
64 | for range 3 {
65 | select {
66 | case r := <-c:
67 | results = append(results, r)
68 | case <-timeout:
69 | fmt.Println("timeout")
70 | return results
71 | }
72 | }
73 | return results
74 |
75 | }
76 |
77 | func main() {
78 | rand.Seed(time.Now().UnixNano())
79 | start := time.Now()
80 | results := Google("golang")
81 | elapsed := time.Since(start)
82 | fmt.Println(results)
83 | fmt.Println(elapsed)
84 | }
85 |
--------------------------------------------------------------------------------
/go-concurrency-patterns/examples/02-ping-pong/main.go:
--------------------------------------------------------------------------------
1 | package main
2 |
3 | import (
4 | "fmt"
5 | "time"
6 | )
7 |
8 | type Ball struct {
9 | hits int
10 | }
11 |
12 | func player(name string, table chan *Ball) {
13 | for {
14 | ball := <-table // Player grabs the ball.
15 | ball.hits++
16 | fmt.Printf("%s %d\n", name, ball.hits)
17 | time.Sleep(100 * time.Millisecond)
18 | table <- ball // pass the ball.
19 | }
20 | }
21 |
22 | func main() {
23 | table := make(chan *Ball)
24 | go player("ping", table)
25 | go player("pong", table)
26 |
27 | table <- new(Ball) // game on: toss the ball.
28 | time.Sleep(1 * time.Second)
29 | <-table // game over, snatch the ball.
30 | panic("show me the stack")
31 | }
32 |
--------------------------------------------------------------------------------
/go-concurrency-patterns/go.mod:
--------------------------------------------------------------------------------
1 | module concurrency-patterns
2 |
3 | go 1.23.4
4 |
--------------------------------------------------------------------------------
/systems-performance-enterprise-and-the-cloud/README.md:
--------------------------------------------------------------------------------
1 | # Systems Performance: Enterprise and the Cloud 2nd Edition
2 |
3 | Study notes taken from reading the second edition of "Systems Performance: Enterprise and the Cloud" by *Brendan Gregg*.
--------------------------------------------------------------------------------
/systems-performance-enterprise-and-the-cloud/assets/analysis-perspectives.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/systems-performance-enterprise-and-the-cloud/assets/analysis-perspectives.png
--------------------------------------------------------------------------------
/systems-performance-enterprise-and-the-cloud/assets/counters-statistics-metrics.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/systems-performance-enterprise-and-the-cloud/assets/counters-statistics-metrics.png
--------------------------------------------------------------------------------
/systems-performance-enterprise-and-the-cloud/assets/cpu-flame-graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/systems-performance-enterprise-and-the-cloud/assets/cpu-flame-graph.png
--------------------------------------------------------------------------------
/systems-performance-enterprise-and-the-cloud/assets/full-stack.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayoubfaouzi/software-engineering-notes/9615fc566e82a630a0dd441d3aafc2385ad9f393/systems-performance-enterprise-and-the-cloud/assets/full-stack.png
--------------------------------------------------------------------------------
/systems-performance-enterprise-and-the-cloud/chapter-01-introduction.md:
--------------------------------------------------------------------------------
1 | # Chapter 01: Introduction
2 |
3 | ## 1.1 Systems Performance
4 |
5 | - Systems performance studies the performance of an entire computer system, including all major **software** and **hardware** components.
6 | - Anything in the **data path**, from storage devices to application software, is included, because it can affect performance.
7 | - Systems performance studies the full stack:
8 |
9 | 
10 |
11 | ## 1.2 Roles
12 |
13 | - Systems performance is done by a variety of job roles, including system administrators, SREs, app developers, network engineers, DB admins, web admins, and other support staff.
14 | - For some performance issues, finding the root cause or contributing factors requires a cooperative effort from m**ore than one team**.
15 | - Some companies employ **performance engineers**, whose primary activity is performance 🤸♀️.
16 |
17 | ## 1.3 Activities
18 |
19 | - The first step should be to set objectives and create a performance model. However,
20 | products are often developed without this step, **deferring** performance engineering work to a later time, after a **problem arises**.
21 | - With each step of the development process it can become progressively harder to fix performance issues that arise due to architectural decisions made earlier.
22 | - The term **capacity planning** can refer to a number of the preceding activities. During design, it includes studying the resource footprint of development software to see how well the design can **meet the target needs**. After deployment, it includes monitoring resource usage to predict problems before they occur.
23 |
24 | ## 1.4 Perspectives
25 |
26 | - Apart from a focus on different activities, performance roles can be viewed from different perspectives. Two perspectives for performance analysis are possible: **workload analysis** and **resource analysis**, which approach the software stack from different directions.
27 |
28 | 
29 |
30 |
31 | ## 1.5 Performance Is Challenging
32 |
33 | - **Subjectivity**: Performance is often subjective. With performance issues, it can be unclear whether there is an issue to begin with, and if so, when it has been fixed. Example: *The average disk I/O response time is 1 ms.* 🤷
34 | - Is this “good” or “bad”? To some degree, whether a given metric is “good” or “bad” may depend on the performance expectations of the **app developers** and **end users**.
35 | - **Complexity**: can be a challenging discipline due to the complexity of systems and the lack of an obvious **starting point** for analysis.
36 | - Issues may originate from complex **interactions** between subsystems that **perform well when analyzed in isolation**. This can occur due to a **cascading failure**, when one failed component causes performance issues in others.
37 | - Issues may also be caused by a complex characteristic of the **production workload**. These cases may never be **reproducible** in a lab environment, or only **intermittently** so.
38 | - **Multiple causes**: Imagine a scenario where three normal events occur **simultaneously** and combine to cause a performance issue: each is a normal event that in isolation is not the root cause.
39 | - **Multiple Performance Issues**: Finding a performance issue is usually not the problem; in complex software there are often many. To illustrate this.
40 | - The real task isn’t finding an issue; it’s identifying which issue or issues matter the most 👍.
41 |
42 | ## 1.6 Latency
43 |
44 | - Latency is a measure of time spent **waiting**, and is an essential performance metric.
45 | - Used broadly, it can mean the time for any operation to complete, such as an app request, a database query, a file system operation, and so forth.
46 | - As a metric, latency can allow **maximum** speedup to be **estimated**, example: 5x faster.
47 | - Such a calculation is not possible when using other metrics such as **IOPS**: there might be 5x fewer IOPS, but what if each of these I/O increased in size (bytes) by 10x ❓
48 | - Throughout this book, the author uses **connection latency** (the time for a connection to be established but not the data transfer time) and **request latency** (the total duration of a connection, including the data transfer) for clarification.
49 |
50 | ## 1.7 Observability
51 |
52 | - Observability refers to understanding a system through observation, and classifies the tools that accomplish this. This includes tools that use **counters**, **profiling**, and **tracing**.
53 | - **Counters, Statistics, and Metrics**:
54 | - Apps and the kernel typically provide data on their state and activity: operation counts, byte counts, latency measurements, resource utilization, and error rates. They are typically implemented as integer variables called **counters** that are hard-coded in the software, some of which are cumulative and always increment.
55 | - These cumulative counters can be read at different times by performance tools for calculating **statistics**.
56 | - A **metric** is a statistic that has been selected to evaluate or monitor a target.
57 | 
58 | - **Profiling** usually refers to the use of tools that perform **sampling**: taking a subset (a sample) of measurements to paint a coarse picture of the target.
59 | - An effective visualization of CPU profiles is **flame graphs**. CPU flame graphs can help you find more performance wins than any other tool, after **metrics**. They reveal not only CPU issues, but other types of issues as well, found by the CPU footprints they leave behind. Issues of **lock contention** can be found by looking for CPU time in spin paths; memory issues can be analyzed by finding excessive CPU time in **memory allocation** functions (`malloc()`), along with the code paths that led to them; performance issues involving misconfigured networking may be discovered by seeing CPU time in slow or legacy codepaths; and so on.
60 | 
61 | - **Tracing**: is event-based recording, where event data is captured and saved for later analysis or consumed on-the-fly for custom summaries and other actions:
62 | - Special-purpose tracing tools for system calls (Linux `strace` ) and network packets (Linux `tcpdump`).
63 | - General-purpose tracing tools that can analyze the execution of all software and hardware events (Linux `Ftrace`, `BCC`, and `bpftrace`).
64 | - **Static instrumentation** describes **hard-coded** software instrumentation points added to the **source code**. There are hundreds of these points in the Linux kernel that instrument disk I/O, scheduler events, system calls, and more.
65 | - The Linux technology for kernel static instrumentation is called **tracepoints**.
66 | - There is also a static instrumentation technology for user-space software called **user statically defined tracing** (USDT). Example: `execsnoop`.
67 | - **Dynamic instrumentation** creates instrumentation points after the software is running, by modifying in-memory instructions to insert instrumentation routines.
68 | - Dynamic instrumentation is so different from traditional observation that it can be difficult, at first, to grasp its role. Consider an OS kernel: analyzing kernel internals can be like venturing into a dark room, with candles (system counters) placed where the kernel engineers thought they were needed. Dynamic instrumentation is like having a flashlight that you can point anywhere. Example: `DTrace`.
69 | - **BPF** which originally stood for *Berkeley Packet Filter*, is powering the latest dynamic tracing tools for Linux. BPF originated as a mini in-kernel virtual machine for speeding up the execution of `tcpdump` expressions.
70 | - Since 2013 it has been extended (hence is sometimes called eBPF3) to become a generic in-kernel execution environment, one that provides safety and fast access to resources.
71 |
72 |
73 | ## 1.8 Experimentation
74 |
75 | - Experimentation tools - most of which are benchmarking tools - perform an experiment by applying a synthetic workload to the system and measuring its performance.
76 | - There are **macro-benchmark** tools that simulate a **real-world workload** such as clients making app requests; and there are **micro-benchmark** tools that test a specific component, such
77 | as CPUs, disks, or networks.
78 | - :star: On production systems you should first try **observability** tools. However, there are so many observability tools that you might spend hours working through them when an experimental tool would lead to **quicker** results.
79 |
80 |
81 | ## 1.9 Cloud Computing
82 |
83 | - Cloud decreased the need for rigorous capacity planning, as more capacity can be added from the cloud at short notice.
84 | - In some cases it has also increased the desire for performance analysis, because using **fewer resources** can mean **fewer systems** ▶️ immediate **cost savings**.
85 | - New difficulties caused by cloud computing and virtualization include the management of performance effects from **other tenants** (sometimes called performance isolation) and physical system observability from each tenant.
86 |
87 | ## 1.10 Methodologies
88 |
89 | - Methodologies are a way to document the recommended steps for performing various tasks in systems performance.
90 | - This is a Linux tool-based **checklist** that can be executed in the first **60 seconds** of a performance issue investigation 🧐, using traditional tools that should be available for most Linux distributions:
91 | | # | Tool | Check|
92 | |---|------|------|
93 | |1 | uptime | Load averages to identify if load is increasing or decreasing (compare 1-, 5-, and 15-minute averages).|
94 | |2 | dmesg -T | tail Kernel errors including OOM events.|
95 | |3 | vmstat -SM 1 | System-wide statistics: run queue length, swapping, overall CPU usage.|
96 | |4 | mpstat -P ALL 1 | Per-CPU balance: a single busy CPU can indicate poor thread scaling.|
97 | |5 | pidstat 1 | Per-process CPU usage: identify unexpected CPU consumers, and user/system CPU time for each process.|
98 | |6 | iostat -sxz 1 | Disk I/O statistics: IOPS and throughput, average wait time, percent busy.|
99 | |7 | free -m | Memory usage including the file system cache.|
100 | |8 | sar -n DEV 1 | Network device I/O: packets and throughput.|
101 | |9 | sar -n TCP,ETCP 1 | TCP statistics: connection rates, retransmits.|
102 | |10 | top | Check overview.|
103 |
104 |
105 | ## 1.11 Case Studies
106 |
107 | ### 1.11.1 Slow Disks
108 |
109 | - *Sumit* is a system administrator at a medium-size company. The database team has filed a support ticket complaining of “slow disks” on one of their database servers.
110 | > We have a log for queries slower than 1,000 milliseconds. These usually don’t happen, but during the past week they have been growing to dozens per hour. AcmeMon showed that the disks were busy.
111 | - This confirms that there is a **real database issue**, but it also shows that the **disk hypothesis** is likely a guess.
112 | - The historical data shows that disk utilization has been steadily increasing during
113 | the past week, while CPU utilization has been steady. `AcmeMon` doesn’t provide **saturation** or **error statistics** for the disks, so to complete the USE method *Sumit* must log in to the server and run some commands.
114 | - `AcmeMon` reported 80% utilization but uses a one-minute interval. At **one-second** granularity, `Sumit` can see that disk utilization fluctuates, often hitting **100%** and causing levels of saturation and increased disk I/O latency.
115 | > Sumit relied: the disks appear to be acting
116 | normally for the load. He asks if there is a simple explanation: did the database load increase?
117 | - The database team responds that it did not, and that the rate of queries.
118 | - He remembers that this disk I/O is largely caused by file **system cache** (page cache) **misses**.
119 | - *Sumit* checks the file system cache hit ratio using `cachestat` and finds it is currently at 91%.
120 | - A development project has a prototype application that is consuming a growing amount of memory, even though it isn’t under production load yet. This memory is taken from what is available for the file system cache, reducing its hit rate and causing more file system reads to become disk reads 😮💨.
121 |
122 | ### 1.11.2 Software Change
123 |
124 | - The application developers have developed a new core feature and are unsure whether its introduction could hurt performance.
125 | - Pamela decides to perform **non-regression** testing of the new app version, before it is deployed in production by **stress-testing** it.
126 | - She plots the results, showing completed request rate versus load, to visually identify the scalability profile. Both appear to reach an **abrupt ceiling**.
127 | - To investigate the client software she performs thread state analysis and finds that it is **single threaded**! That one thread is spending 100% of its time executing on-CPU. This convinces her that this is the limiter of the test.
128 |
--------------------------------------------------------------------------------