├── .gitignore
├── README.md
├── SystemDesign
    ├── RDBMS.md
    ├── cache.md
    ├── consistency_consensus.md
    ├── internet_protocol_suite.md
    ├── load_balancer.md
    ├── navigate_url.md
    ├── nosql_db.md
    ├── replication_partition.md
    ├── scale_web_app.md
    ├── storage_system.md
    └── transaction_isolation.md
├── Templates
    ├── backtrack.md
    ├── binary_search.md
    ├── dijkstra.md
    ├── graph_SCC.md
    ├── graph_traversal.md
    ├── linked_list.md
    ├── matrix_traversal.md
    ├── merge_sort.md
    ├── monotonic_stack.md
    ├── prim_spanning_tree.md
    ├── quick_sort.md
    ├── sliding_window.md
    ├── topological_sort.md
    ├── tree_traversal.md
    ├── trie.md
    └── union_find.md
└── images
    ├── Inorder.png
    ├── Postorder.png
    ├── Preorder.png
    ├── System-Components.png
    ├── Tree-DFS.png
    └── how-to-use-the-repo.png


/.gitignore:
--------------------------------------------------------------------------------
 1 | .idea
 2 | *.iml
 3 | *.docx
 4 | 
 5 | # Windows & Unix temporary files
 6 | ~$*
 7 | *~
 8 | 
 9 | # OS generated files #
10 | .DS_Store
11 | .DS_Store?
12 | ._*
13 | .Spotlight-V100
14 | .Trashes
15 | ehthumbs.db
16 | Thumbs.db
17 | 
18 | # Python
19 | *.pyc
20 | __pycache__
21 | 
22 | *.egg-info
23 | build
24 | dist
25 | 
26 | # IDE
27 | .vscode
28 | .vscode/
29 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
   1 | # The Coding Interview Guide
   2 | 
   3 | > This is a collection of my notes when I was/am studying for interviews (or just for learning) 
   4 | > and it is also intended to become a systematic guide for people who would like to 
   5 | > become a software development engineer (SDE).
   6 | 
   7 | > I was graduated as an Electrical Engineer and after worked 1.5 years as an Electrical
   8 | > Engineer I decided to switch to the programming world. I only had one programming course 
   9 | > at university and I have taught myself all the most common data structures and
  10 | > algorithms, machine learning fundamentals, and software architectures. I started working 
  11 | > on leetcode problems at end of Feb 2020, and since then I solve at least 1 problem 
  12 | > everyday.
  13 | 
  14 | > During my learning journey, I found that I learned all the knowledge piece by piece 
  15 | > when I need them, but I was always lack of a systematic understanding/overview. Since 
  16 | > I made lots of notes during study, I think its necessary to summarize them and make 
  17 | > them a systematic guide for whoever wants to become a software development engineer.
  18 | 
  19 | > Hope this repo is helpful and best of luck to you!
  20 | 
  21 | 
  22 | ## Preface 1: How to use this repo
  23 | 
  24 | In this repo I'm going to show you how to get started for those who want to switch career, 
  25 | to improve the interviewing technics, or to revisit the fundamentals.
  26 | 
  27 | - I'll build a systematic path for data structures, algorithms, and system design problems
  28 | - I'll be sharing the tips and tricks on how to practice for the algorithms
  29 | - I'll be sharing the technics for answering Behavior Questions (BQs)
  30 | - I'll be giving resources that I learnt 
  31 | - MAKE SURE YOU READ THE [QA SECTION](#appendix-1-question--answer)
  32 | 
  33 | Most importantly, learning is a life long journey, and I'm still learning everyday.
  34 | Hopefully you can learn with me, and I can help people getting into the field more easily
  35 | or improving their technical abilities.
  36 | 
  37 | ***How to use this repo:***
  38 | ![how to use this repo](./images/how-to-use-the-repo.png)
  39 | 
  40 | Basically, you should use this repo as a guide (yeah the name kinda indicates) to build your own knowledge base and templates.
  41 | 
  42 | And if you have enough time, you should go over every topics. If not you can just read the topics you are interested.
  43 | 
  44 | 
  45 | ## Preface 2: Before get started, REMEMBER the following facts
  46 | 
  47 | - **Programming is about writing the code, not reading**. So don't just read, IMPLEMENT it!
  48 | - **You can't memorize everything at the first time**. So keep repeating and practicing, and WRITE them down!
  49 | - **Don't feel you aren't smart enough**. In fact a lot of programming questions are tricky since you've never seen them before. Once you've seen enough, you'll know the tricks.
  50 | - **Review, review, review**. Review your code and your notes once a while, try to refactor/optimize your code, and refresh your memories about the fundamentals.
  51 | - **Learning is a life lon journey**. So keep learning and reading!
  52 | 
  53 | Above points are really really important, keep these in mind and they'll help you in your career.
  54 | 
  55 | Now let's begin our journey!
  56 | 
  57 | 
  58 | ## Table of Content
  59 | 
  60 | - [The Coding Interview Guide](#the-coding-interview-guide)
  61 |   - [Preface 1: How to use this repo](#preface-1-how-to-use-this-repo)
  62 |   - [Preface 2: Before get started, REMEMBER the following facts](#preface-2-before-get-started-remember-the-following-facts)
  63 |   - [Table of Content](#table-of-content)
  64 |   - [Section 1: Data Structures and Algorithms](#section-1-data-structures-and-algorithms)
  65 |     - [Chapter 1: Data Structures](#chapter-1-data-structures)
  66 |       - [1.1 Array](#11-array)
  67 |       - [1.2 Linked List](#12-linked-list)
  68 |       - [1.3 Stack](#13-stack)
  69 |         - [1.3.1 Arithmetic Expressions](#131-arithmetic-expressions)
  70 |       - [1.4 Queue](#14-queue)
  71 |       - [1.5 Hash Table](#15-hash-table)
  72 |       - [1.6 Trees](#16-trees)
  73 |         - [1.6.1 Tree Traversal: access the nodes of the tree](#161-tree-traversal-access-the-nodes-of-the-tree)
  74 |         - [1.6.2 Binary Search Tree (BST)](#162-binary-search-tree-bst)
  75 |         - [1.6.3 Heap / Priority Queue / Binary Heap](#163-heap--priority-queue--binary-heap)
  76 |         - [1.6.4 More Trees](#164-more-trees)
  77 |       - [1.7 Graph](#17-graph)
  78 |         - [1.7.1 Vocabulary and Definitions](#171-vocabulary-and-definitions)
  79 |         - [1.7.2 Graph Representation](#172-graph-representation)
  80 |         - [1.7.3 Graph Algorithms](#173-graph-algorithms)
  81 |     - [Chapter 2: Common Algorithm Types](#chapter-2-common-algorithm-types)
  82 |       - [2.1 Brute Force](#21-brute-force)
  83 |       - [2.2 Search](#22-search)
  84 |         - [2.2.1 Sequential Search](#221-sequential-search)
  85 |         - [2.2.2 Binary Search](#222-binary-search)
  86 |       - [2.3 Sort](#23-sort)
  87 |         - [2.3.1 Bubble Sort](#231-bubble-sort)
  88 |         - [2.3.2 Selection Sort](#232-selection-sort)
  89 |         - [2.3.3 Insertion Sort](#233-insertion-sort)
  90 |         - [2.3.4 Shell Sort](#234-shell-sort)
  91 |         - [2.3.5 Merge Sort](#235-merge-sort)
  92 |         - [2.3.6 Quick Sort](#236-quick-sort)
  93 |         - [2.3.7 Heap Sort](#237-heap-sort)
  94 |       - [2.4 Recursion](#24-recursion)
  95 |         - [2.4.1 Recursive function in Python](#241-recursive-function-in-python)
  96 |       - [2.5 Backtracking](#25-backtracking)
  97 |       - [2.6 Dynamic Programming](#26-dynamic-programming)
  98 |       - [2.7 Divide and Conquer](#27-divide-and-conquer)
  99 |       - [2.8 Greedy](#28-greedy)
 100 |       - [2.9 Branch and Bound](#29-branch-and-bound)
 101 |     - [Chapter 3: Frequently Used Technics and Algorithms](#chapter-3-frequently-used-technics-and-algorithms)
 102 |       - [3.1 Must know for interview](#31-must-know-for-interview)
 103 |       - [3.2 Good to know but can be skipped](#32-good-to-know-but-can-be-skipped)
 104 |     - [Summary](#summary)
 105 |   - [Section 2: System Design](#section-2-system-design)
 106 |     - [Chapter 4: System Design Interview Template](#chapter-4-system-design-interview-template)
 107 |     - [Chapter 5: System Design Components](#chapter-5-system-design-components)
 108 |     - [Chapter 6: Classic Designs](#chapter-6-classic-designs)
 109 |     - [Chapter 7: System Design Case Study](#chapter-7-system-design-case-study)
 110 |   - [Section 3: Transferrable Skills and Offer](#section-3-transferrable-skills-and-offer)
 111 |     - [Chapter 8: Behavioral Questions (BQ)](#chapter-8-behavioral-questions-bq)
 112 |       - [8.1 Four things to remember for the BQ](#81-four-things-to-remember-for-the-bq)
 113 |       - [8.2 How to prepare for BQ](#82-how-to-prepare-for-bq)
 114 |     - [Chapter 9: Offer Negotiation](#chapter-9-offer-negotiation)
 115 |   - [Appendix 1: Question & Answer](#appendix-1-question--answer)
 116 |     - [A1.1 Technical Questions](#a11-technical-questions)
 117 |       - [A1.1.1 How to use LeetCode as a beginner](#a111-how-to-use-leetcode-as-a-beginner)
 118 |       - [A1.1.2 How to solve LeetCode problem EFFECTIVELY](#a112-how-to-solve-leetcode-problem-effectively)
 119 |       - [A1.1.3 How to solve LeetCode problem EFFICIENTLY](#a113-how-to-solve-leetcode-problem-efficiently)
 120 |       - [A1.1.4 Pay close attention to these when solving problem (gain max value of leetcode problem)](#a114-pay-close-attention-to-these-when-solving-problem-gain-max-value-of-leetcode-problem)
 121 |       - [A1.1.5 Why you should use a template for the algorithm and data structures](#a115-why-you-should-use-a-template-for-the-algorithm-and-data-structures)
 122 |       - [A1.1.6 What should I do if I lost confident when practicing leetcode](#a116-what-should-i-do-if-i-lost-confident-when-practicing-leetcode)
 123 |       - [A1.1.7 I still can't solve new problems even if I finished x number of problems on LeetCode](#a117-i-still-cant-solve-new-problems-even-if-i-finished-x-number-of-problems-on-leetcode)
 124 |     - [A1.2 Interview Questions](#a12-interview-questions)
 125 |       - [A1.2.1 What's the interview process look like](#a121-whats-the-interview-process-look-like)
 126 |       - [A1.2.2 How to write an effective resume](#a122-how-to-write-an-effective-resume)
 127 |       - [A1.2.3 I have applied to many jobs but still no interview](#a123-i-have-applied-to-many-jobs-but-still-no-interview)
 128 |       - [A1.2.4 How to solve an algorithm/data structure problem in interview](#a124-how-to-solve-an-algorithmdata-structure-problem-in-interview)
 129 |     - [A1.3 General Questions](#a13-general-questions)
 130 |       - [A1.3.1 Large Company VS Small Company](#a131-large-company-vs-small-company)
 131 |       - [A1.3.2 How to get your FIRST job! (How to become more competitive among the candidates)](#a132-how-to-get-your-first-job-how-to-become-more-competitive-among-the-candidates)
 132 |   - [Appendix 2: Resources](#appendix-2-resources)
 133 |     - [A2.1 Learning Experience](#a21-learning-experience)
 134 |       - [A2.1.1 Online MOOC courses](#a211-online-mooc-courses)
 135 |     - [A2.2 How to solve Algorithm Questions](#a22-how-to-solve-algorithm-questions)
 136 |     - [A2.3 OOD (Object Oriented Design)](#a23-ood-object-oriented-design)
 137 |       - [A2.3.1 SOLID Principals](#a231-solid-principals)
 138 |       - [A2.3.2 Clean Code - Uncle Bob lessons](#a232-clean-code---uncle-bob-lessons)
 139 |     - [A2.4 Design Patterns](#a24-design-patterns)
 140 |     - [A2.5 Async in Python](#a25-async-in-python)
 141 |     - [A2.6 System Design](#a26-system-design)
 142 |     - [A2.7 Machine Learning](#a27-machine-learning)
 143 |     - [A2.8 Reinforcement Learning](#a28-reinforcement-learning)
 144 |   - [Postface](#postface)
 145 | 
 146 | ---
 147 | 
 148 | ## Section 1: Data Structures and Algorithms
 149 | 
 150 | **Book:** [Problem Solving with Algorithms and Data Structures using Python](https://runestone.academy/runestone/books/published/pythonds/index.html)
 151 | 
 152 | - For those who needs to study the fundamental data structures and algorithms, highly recommend to go over above textbook thoroughly first, and then come back to the following content, or practice on Leetcode or other platform
 153 | 
 154 | 
 155 | **Basic data structures**:
 156 | 
 157 | - Array
 158 | - Linked List
 159 | - Stack
 160 | - Queue
 161 | - Hash Table
 162 | - Tree
 163 | - Graph
 164 | 
 165 | **Common Algorithm Types**:
 166 | 
 167 | - Brute Force
 168 | - Search and Sort
 169 | - Recursive
 170 | - Backtracking 
 171 | - Dynamic Programming
 172 | - Divide and Conquer
 173 | - Greedy
 174 | - Branch and Bound
 175 | 
 176 | **Big O Notations**:
 177 | 
 178 | - It is critical that you understand and are able to calculate the Big O for the code you wrote.
 179 | - **The order of magnitude function describes the part of T(n) that increases the fastest as the value of n increases. Order of magnitude is often called Big-O notation (for “order”) and written as O(f(n)).**  
 180 | 
 181 | - Basically, the Big O measures the number of assignment statements
 182 | 
 183 |     | f(n)    | Name        |
 184 |     | :-----  | :----       |
 185 |     | 1       | Constant    |
 186 |     | log n   | Logarithmic |
 187 |     | n       | Linear      |
 188 |     | n log n | Log Linear  |
 189 |     | n^2     | Quadratic   |
 190 |     | n^3     | Cubic       |
 191 |     | 2^n     | Exponential |
 192 | 
 193 |     ![BigO Image](https://runestone.academy/runestone/books/published/pythonds/_images/newplot.png)
 194 | 
 195 | 
 196 | ### Chapter 1: Data Structures
 197 | 
 198 | #### 1.1 Array
 199 | 
 200 | - An array (in Python its called *list*) is a collection of items where each item holds a relative position with respect to the others.
 201 | 
 202 | #### 1.2 Linked List
 203 | 
 204 | - Similar to array, but requires O(N) time on average to visit an element by index
 205 | - Linked list utilize memory better than array, since it can use discrete memory space, whereas array must use continuous memory space
 206 | - [Details and Templates](./Templates/linked_list.md)
 207 | 
 208 | #### 1.3 Stack
 209 | 
 210 | - Stacks are fundamentally important, as they can be used to reverse the order of items. 
 211 | - The order of insertion is the reverse of the order of removal.
 212 | - Stack maintain a FILO (first in last out) ordering property.
 213 | - When pop is called on the end of the list it takes O(1) but when pop is called on the first element in the list or anywhere in the middle it is O(n) (in Python).
 214 | 
 215 | ##### 1.3.1 Arithmetic Expressions
 216 | 
 217 | - Infix: the operator is in between the two operands that it is working on (i.e. A+B)    
 218 | 	- Fully Parenthesized expression: uses one pair of parentheses for each operator. (i.e.  ((A + (B * C)) + D))
 219 | - Prefix: all operators precede the two operands that they work on (i.e. +AB)
 220 | - Postfix: operators come after the corresponding operands (i.e. AB+)    
 221 | 
 222 | | Infix Expression  | Prefix Expression | Postfix Expression |
 223 | | ----------------- | ----------------- | ------------------ |
 224 | | A + B             | + A B             | A B +              |
 225 | | A + B * C         | + A * B C         | A B C * +          |
 226 | | (A + B) * C       | * + A B C         | A B + C *          |
 227 | | A + B * C + D     | + + A * B C D     | A B C * + D +      |
 228 | | (A + B) * (C + D) | * + A B + C D     | A B + C D + *      |
 229 | | A * B + C * D     | + * A B * C D     | A B * C D * +      |
 230 | | A + B + C + D     | + + + A B C D     | A B + C + D +      |
 231 | 
 232 | - **NOTE:**
 233 | 	- Only infix notation requires parentheses to determine precedence
 234 | 	- The order of operations within prefix and postfix expressions is completely determined by the position of the operator and nothing else
 235 | 
 236 | #### 1.4 Queue
 237 | 
 238 | - A queue is structured as an ordered collection of items which are added at one end, called the “rear,” and removed from the other end, called the “front.” 
 239 | - Queues maintain a FIFO ordering property.
 240 | - A ***deque***, also known as a double-ended queue, is an ordered collection of items similar to the queue. 
 241 |     - It has two ends, a front and a rear, and the items remain positioned in the collection. 
 242 |     - New items can be added at either the front or the rear. 
 243 |     - Likewise, existing items can be removed from either end.
 244 | 
 245 | #### 1.5 Hash Table
 246 | 
 247 | - A **hash table** is a collection of items which are stored in such a way as to make it easy to find them later. 
 248 | - Each position of the hash table, often called a slot, can hold an item and is named by an integer value starting at 0.
 249 | - The mapping between an item and the slot where that item belongs in the hash table is called the **hash function**.
 250 | 	- **Remainder method** takes an item and divides it by the table size, returning the remainder as its hash value (i.e. `h(item) = item % 11`)
 251 | 	- **load factor** is the number of items divided by the table size
 252 | 	- **collision** refers to the situation that multiple items have the same hash value
 253 | 	- **folding method** for constructing hash functions begins by dividing the item into equal-size pieces (the last piece may not be of equal size). These pieces are then added together to give the resulting hash value. 
 254 | 	- **mid-square method** first squares the item, and then extract some portion of the resulting digits. For example, 44^2 = 1936, extract middle two digits 93, then perform remainder step (93%11=5).
 255 | - **Collision Resolution** is the process to systematically place the second item in the hash table when two items hash to the same slot.
 256 | - **Open addressing (linear probing):** sequentially find the next open slot or address in the hash table 
 257 | 	- A disadvantage to linear probing is the tendency for clustering; items become clustered in the table.
 258 | 	- **Rehashing** is one way to deal with clustering, which is to skip the slot when looking  sequentially for the next open slot, thereby more evenly distributing the items that have caused collisions.
 259 | - **Quadratic probing:** instead of using a constant “skip” value, we use a rehash function that increments the hash value by 1, 3, 5, 7, 9, and so on. This means that if the first hash value is h, the successive values are h+1, h+4, h+9, h+16, and so on.
 260 | - **Chaining** allows many items to exist at the same location in the hash table. 
 261 | 	- When collisions happen, the item is still placed in the proper slot of the hash table. 
 262 | 	- As more and more items hash to the same location, the difficulty of searching for the item in the collection increases.    
 263 | 	![](http://interactivepython.org/runestone/static/pythonds/_images/chaining.png)
 264 | - The initial size for the hash table has to be a prime number so that the collision resolution algorithm can be as efficient as possible.
 265 | 
 266 | #### 1.6 Trees
 267 | 
 268 | * A tree data structure has its root at the top and its leaves on the bottom.
 269 | * Three properties of tree:
 270 |     1. we start at the top of the tree and follow a path made of circles and arrows all the way to the bottom.
 271 |     2. all of the children of one node are independent of the children of another node.
 272 |     3. each leaf node is unique. 
 273 | * **binary tree:** each node in the tree has a maximum of two children.
 274 |     * A **balanced binary tree** has roughly the same number of nodes in the left and right subtrees of the root.
 275 | 
 276 | ##### 1.6.1 Tree Traversal: access the nodes of the tree
 277 | 
 278 | - Tree traversal is the foundation of all tree related problems.
 279 | - Here are a few different ways to traverse a tree:
 280 |     - BFS: Level-order
 281 |     - DFS: Pre-order, In-order, Post-order
 282 |     - [Details and Templates](./Templates/tree_traversal.md)
 283 | 
 284 | 
 285 | ##### 1.6.2 Binary Search Tree (BST)
 286 | 
 287 | - BST Property (left subtree < root < right subtree):
 288 |     1. The value in each node must be `greater than (or equal to)` any values stored in its left subtree.
 289 |     2. The value in each node must be `less than (or equal to)` any values stored in its right subtree.
 290 | - `Inorder traversal` in BST will be in `ascending order`. Therefore, the inorder traversal is the most frequent used traversal method of a BST.
 291 | - **successor:** the node that has the next-largest key in the tree
 292 |     - it has no more than one child
 293 | - You could go over the [Leetcode Binary Search Tree topic](https://leetcode.com/explore/learn/card/introduction-to-data-structure-binary-search-tree/) for details
 294 | 
 295 | ##### 1.6.3 Heap / Priority Queue / Binary Heap
 296 | 
 297 | - **Priority Queue:**
 298 |     - the logical order of items inside a queue is determined by their priority.
 299 |     - The highest priority items are at the front of the queue and the lowest priority items are at the back.
 300 | - **Binary Heap:** the classic way to implement a priority queue.
 301 |     - both enqueue and dequeue items are **O(logn)**
 302 |     - **min heap:** the smallest key is always at the front
 303 |     - **max heap:** the largest key value is always at the front
 304 |     - **complete binary tree:** a tree in which each level has all of its nodes (except the bottom level)
 305 |         - can be implemented using a single list
 306 |         - Because the tree is complete, the left child of a parent (at position **p**) is the node that is found in position **2p** in the list. Similarly, the right child of the parent is at position **2p+1** in the list.
 307 |         ![](http://interactivepython.org/runestone/static/pythonds/_images/heapOrder.png)
 308 |     - **heap order property:** In a heap, for every node **x** with parent **p**, the key in **p** is smaller than or equal to the key in **x**.
 309 |         * For example, the root of the tree must be the smallest item in the tree
 310 |     - When to use heap:
 311 |         - Priority Queue implementation
 312 |         - whenever need quick access to largest/smallest item
 313 |             - Instant access to the item
 314 |             - insertions are fast, allow in-place sorting
 315 |         - More details can be seen in [this discussion](https://stackoverflow.com/questions/749199/when-would-i-want-to-use-a-heap)
 316 | 
 317 | ##### 1.6.4 More Trees
 318 | 
 319 | - ***Parse tree*** can be used to represent real-world constructions like sentences or mathematical expressions.
 320 |     - A simple solution to keeping track of parents as we traverse the tree is to use a stack. 
 321 |     - When we want to descend to a child of the current node, we first push the current node on the stack. 
 322 |     - When we want to return to the parent of the current node, we pop the parent off the stack.
 323 | - ***AVL Tree***: a balanced binary tree. the AVL is named for its inventors G.M. Adelson-Velskii and E.M. Landis.
 324 |     - For each node: *balanceFactor* = *height(leftSubTree)* − *height(rightSubTree)*
 325 |     - a subtree is left-heavy if *balance_factor > 0*
 326 |     - a subtree is right-heavy if *balance_factor < 0*
 327 |     - a subtree is perfectly in balance if *balance_factor = 0*
 328 |     - For simplicity we can define a tree to be in balance if the balance factor is -1, 0, or 1. 
 329 |     - The number of nodes follows the pattern of *Fibonacci sequence*, as the number of elements get larger the ratio of Fi/Fi-1 closes to the golden ratio, so the time complexity is derived to be **O(log n)**
 330 | - ***Red-Black Tree***
 331 |     - [Details in Wiki](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree)
 332 | - ***B+ Tree***: N-array tree
 333 |     - [Details in Wiki](https://en.wikipedia.org/wiki/B%2B_tree)
 334 | - ***Trie***
 335 |     - *This is a common data structure in interviews*
 336 |     - [Template](./Templates/trie.md)
 337 | - ***Binary Index Tree (Fenwick Tree)***
 338 |     - [Binary Index Tree (Fenwick Tree)](https://www.geeksforgeeks.org/binary-indexed-tree-or-fenwick-tree-2/)
 339 |     - [315. Count of Smaller Numbers After Self](https://leetcode.com/problems/count-of-smaller-numbers-after-self/)
 340 | 
 341 | 
 342 | #### 1.7 Graph
 343 | 
 344 | ##### 1.7.1 Vocabulary and Definitions
 345 | 
 346 | - **Vertex (or Node):** the name is called "key" and the additional information is called "payload"
 347 | - **Edge (or arc):** it connects two vertices to show that there is a relationship between them. 
 348 |     - One way edge is called **directed graph (or digraph)**
 349 | - **Weight:** edges maybe weighted to show that there is a coset to fo from one vertex to another.
 350 | - **Path:** a sequence of vertices that are connected bny edges
 351 |     - Unweighted path length is the number of edges in the path, specifically n-
 352 |     - Weighted path is the sum of the weights of all the edges in the path
 353 | - **Cycle:** a path that starts and ends at the same vertex
 354 |     - A graph with no cycles is called an **acyclic graph**. 
 355 |     - A directed graph with no cycles is called a **directed acyclic graph (or DAG)**
 356 | - **Graph:** a graph (G) is composed with a set of vertices (V) and edges (E) Each edge is a tuple of vertex and weight (v,w). G=(V,E) where w,v∈V
 357 | 
 358 | ##### 1.7.2 Graph Representation
 359 | 
 360 | - Adjacency Matrix (2D matrix)
 361 |     - Good when number of edges is large
 362 |     - Each of the rows and columns represent a vertex in the graph. 
 363 |     - The value in the cell at the intersection of row v and column w indicates if there is an edge from vertex v to vertex w. It also represents the weight of the edge from vertex v to vertex w.
 364 |     - When two vertices are connected by an edge, we say that they are **adjacent**
 365 |         ![](http://interactivepython.org/runestone/static/pythonds/_images/adjMat.png)
 366 |     - **sparse:** most of the cells in the matrix are empty
 367 | - Adjacency List
 368 |     - space-efficient way for implementation
 369 |     - keep a master list of all the vertices in the Graph object. each vertex is an element of the list with the vertex as ID and a list of its adjacent vertices as value
 370 |         ![](http://interactivepython.org/runestone/static/pythonds/_images/adjlist.png)
 371 | 
 372 | ##### 1.7.3 Graph Algorithms
 373 | 
 374 | - Graph traversal: BFS & DFS
 375 |     - [Template](./Templates/graph_traversal.md)
 376 | - Graph Algorithms:
 377 |     - Shortest Path:
 378 |         - Dijkstra’s Algorithm (Single source point)
 379 |             - ***Essentially, this is a BFS using priority queue instead of queue***
 380 |             - [Template](./Templates/dijkstra.md)
 381 |         - Floyd Warshall Algorithm (Multiple source point)
 382 |     - Topological Sort
 383 |         - [Template](./Templates/topological_sort.md)
 384 |     - Strongly Connected Components
 385 |         - [More Info](./Templates/graph_SCC.md)
 386 |     - Prim’s Spanning Tree Algorithm
 387 |         - [More Info](./Templates/prim_spanning_tree.md)
 388 | 
 389 | ### Chapter 2: Common Algorithm Types
 390 | 
 391 | #### 2.1 Brute Force
 392 | 
 393 | - Most common algorithm
 394 | - Whenever you are facing a problem without many clues, you should solve it using brute force first, and observe the process and try to optimize your solution 
 395 | 
 396 | #### 2.2 Search
 397 | 
 398 | ##### 2.2.1 Sequential Search
 399 | 
 400 | - Sequential Search: visit the stored value in a sequence (use loop)
 401 | 
 402 | ##### 2.2.2 Binary Search
 403 | 
 404 | - Examine the middle item of an ordered list
 405 | - KEY is the search interval
 406 | - [Template](./Templates/binary_search.md)
 407 | 
 408 | #### 2.3 Sort
 409 | 
 410 | ##### 2.3.1 Bubble Sort
 411 | 
 412 | - Compares adjacent items and exchanges those that are out of order.
 413 | - **Short bubble:** stop early if it finds that the list has become sorted.
 414 | - time complexity: O(n2)
 415 | 
 416 | ##### 2.3.2 Selection Sort
 417 | 
 418 | - Looks for the largest value as it makes a pass and, after completing the pass, places it in the proper location.
 419 | - time complexity: O(n2)
 420 | 
 421 | ##### 2.3.3 Insertion Sort
 422 | 
 423 | - Maintains a sorted sub-list in the lower positions of the list. 
 424 | - Each new item is then “inserted” back into the previous sub-list such that the sorted sub-list is one item larger.
 425 | - time complexity: O(n2)
 426 | 
 427 | ##### 2.3.4 Shell Sort
 428 | 
 429 | - Breaks the original list into a number of smaller sub-lists, each of which is sorted using an insertion sort. 
 430 | 	- the shell sort uses an increment *i*, sometimes called the **gap**, to create a sub-list by choosing all items that are *i* items apart.
 431 | 	- After all the sub-lists are sorted, it finally does a standard insertion sort
 432 | 	- time complexity goes between O(n) and O(n2), by changing the increment, a shell sort can perform at O(n^(3/2)).
 433 | 
 434 | ##### 2.3.5 Merge Sort
 435 | 
 436 | - A recursive algorithm that continually splits a list in half. 
 437 | - [Details and Templates](./Templates/merge_sort.md)
 438 | 
 439 | ##### 2.3.6 Quick Sort
 440 | 
 441 | - First selects a value (**pivot value**), and then use this value to assist with splitting the list. 
 442 | - [Details and Templates](./Templates/quick_sort.md)
 443 | 
 444 | ##### 2.3.7 Heap Sort
 445 | 
 446 | - Use the property of heap to sort the list
 447 | 
 448 | #### 2.4 Recursion
 449 | 
 450 | **Recursion** is a method of solving problems that involves breaking a problem down into smaller and smaller sub-problems until you get to a small enough problem that it can be solved trivially. Usually recursion involves a function calling itself.
 451 | 
 452 | Three Laws of Recursion:
 453 | 
 454 | 1. A recursive algorithm must have a base case.
 455 | 2. A recursive algorithm must change its state and move toward the base case.
 456 | 3. A recursive algorithm must call itself, recursively.
 457 | 
 458 | Recursive visualization: Fractal tree
 459 | 
 460 | - A **fractal** is something that looks the same at all different levels of magnification.
 461 | - A fractal tree: a small twig has the same shape and characteristics as a whole tree. 
 462 | 
 463 | ##### 2.4.1 Recursive function in Python
 464 | 
 465 | * When a function is called in Python, a stack frame is allocated to handle the local variables of the function. 
 466 | * When the function returns, the return value is left on top of the stack for the calling function to access. 
 467 | * Even though we are calling the same function over and over, each call creates a new scope for the variables that are local to the function.
 468 | 
 469 | #### 2.5 Backtracking 
 470 | 
 471 | - a general algorithm for finding all (or some) solutions to constraint satisfaction problems (i.e. chess, puzzles, crosswords, verbal arithmetic, Sudoku, etc)
 472 | - [Template](./Templates/backtrack.md)
 473 | 
 474 | 
 475 | #### 2.6 Dynamic Programming
 476 | 
 477 | **Dynamic Programming (DP)** is an algorithm technique which is usually based on a recurrent formula and one (or some) starting states.
 478 | 	- A sub-solution of the problem is constructed from previously found ones.
 479 | 	- Usually used to find the extreme cases such as shortest path, best fit, smallest set, etc.
 480 | 
 481 | #### 2.7 Divide and Conquer
 482 | 
 483 | - **Divide**: break into non-overlapping sub-problems of the same type (of problem)
 484 | - **Conquer**: solve sub-problems
 485 | - the algorithm is to keep dividing and conquering, and finally combine them to get the solution
 486 | - the algorithm can be written in recursive or loop 
 487 | 
 488 | #### 2.8 Greedy
 489 | 
 490 | **Greedy algorithm:**
 491 | 
 492 | - find a safe move first
 493 | - prove safety
 494 | - solve subproblem (which should be similar to original problem)
 495 | - estimate running time
 496 | 
 497 | **Optimization:**
 498 | 
 499 | - assume everything is sorted (if not, maybe sort first)
 500 | - decide sort order
 501 | - the final running time can be O(n log n) (i.e. sort is O(log n), greedy move can be O(n))
 502 | 
 503 | - [More details](https://www.hackerearth.com/practice/algorithms/greedy/basics-of-greedy-algorithms/tutorial/)
 504 | 
 505 | #### 2.9 Branch and Bound
 506 | 
 507 | ### Chapter 3: Frequently Used Technics and Algorithms
 508 | 
 509 | #### 3.1 Must know for interview
 510 | 
 511 | - Matrix Traversal
 512 |     - Focusing on various ways of traversing 2D matrix
 513 |     - [Template](./Templates/matrix_traversal.md)
 514 | - Sliding Window
 515 |     - ***Fundamentally this is a two pointer approach***
 516 |     - [Template](./Templates/sliding_window.md)
 517 | - Union find
 518 |     - Essentially its a list representation for the joint data points
 519 |     - [Template](./Templates/union_find.md)
 520 | - Bit Manipulation
 521 | - Prefix Sum
 522 | - monotonic stack/queue
 523 |     - [Monotonic Stack template](./Templates/monotonic_stack.md)
 524 | 
 525 | #### 3.2 Good to know but can be skipped
 526 | 
 527 | - Segment Tree
 528 | - Kadane's algorithm
 529 | - Reservoir Sampling
 530 | - Line sweep
 531 | - KMP algorithm (pattern match)
 532 | - Manacher (Longest Palindromic Substring)
 533 | - Skip List
 534 | 
 535 | ### Summary
 536 | 
 537 | ---
 538 | 
 539 | ## Section 2: System Design
 540 | 
 541 | ### Chapter 4: System Design Interview Template
 542 | 
 543 | System design questions can be very difficult to prepare, because it covers a  wide range of areas.
 544 | 
 545 | Here is a template I use for the system design interview:
 546 | 
 547 | 1. Feature expectations (5 mins) - gather requirements:
 548 | 
 549 |    - Functional requirements:
 550 |        - Use cases
 551 |        - Scenarios that will NOT be covered
 552 |        - End-user (who will use it)
 553 |        - Capacity (how many people will use it, DAU (daily active user))
 554 |        - How to use it
 555 |    - None-Functional requirements:
 556 |        - Scalability
 557 |        - Availability
 558 |        - Performance/Latency
 559 |        - Consistency
 560 |        - Durability/Fault-tolerant
 561 | 
 562 | 2. Estimations (2-5 mins) - estimate scale:
 563 | 
 564 |    - Throughput (QPS for read and write queries)
 565 |    - Latency expected from the system (for read and write queries)
 566 |    - Read/Write ratio (heavy read, heavy write, or similar)
 567 |    - Traffic estimates (QPS for read and write)
 568 |    - Storage estimates (media files, text/photo/video)
 569 |    - Memory estimates
 570 |        - Cache: what is the kind of data we want to store in cache
 571 |        - How much RAM and how many machines
 572 |        - How much data stored on disk/ssd
 573 | 
 574 | 3. High Level Design (5-10 min) - discuss a very high level with the interviewer:
 575 | 
 576 |     - System components (load balancer, services, cache, database, etc)
 577 |     - Database schema
 578 |     - APIs for Read/Write scenarios for crucial components
 579 |     - Request flow process (from client to database)
 580 | 
 581 | 4. Deep Dive (15-20 mins) - focus on any part of the component:
 582 | 
 583 |     - Scaling individual component
 584 |         - Availability, Consistency and Scale story for each component
 585 |         - Consistency and availability patterns
 586 |     - Deep dive on any of the following component
 587 |         - DNS
 588 |         - CDN (Pull vs Push vs Hybrid)
 589 |         - Load Balancer/Reverse Proxy
 590 |             - LB types
 591 |             - LB algorithms
 592 |         - Application layer scaling (Microservice, Service Discovery, Service Mesh)
 593 |         - Database (RDBMS vs NoSQL)
 594 |             - RDBMS:
 595 |                 - Leader-follower, Multi-leader, Leaderless, Federation, Sharding, Denormalization, SQL Tuning
 596 |             - NoSQL:
 597 |                 - Key-Value, Wide-Column, Graph, Document
 598 |                 - RAM [Bounded size] => Redis, Memcached
 599 |                 - AP [Unbounded size] => Cassandra, RIAK, Voldemort
 600 |                 - CP [Unbounded size] => HBase, MongoDB, Couchbase, DynamoDB
 601 |         - Caches:
 602 |             - Client caching, CDN caching, Webserver caching, Database caching, Application caching, Cache at Query level, Cache at Object level
 603 |             - Cache Patterns:
 604 |                 - Cache aside
 605 |                 - Write through
 606 |                 - Write behind
 607 |                 - Refresh ahead
 608 |             - Eviction policies:
 609 |                 - LRU
 610 |                 - LFU
 611 |                 - FIFO
 612 |         - Asynchronism
 613 |             - Message queues
 614 |             - Task queues
 615 |             - Back pressure
 616 |         - Communication
 617 |             - TCP
 618 |             - UDP
 619 |             - REST
 620 |             - RPC
 621 | 
 622 | 5. Justify (5 mins):
 623 | 
 624 |     - Throughput of each layer
 625 |     - Latency caused between each layer
 626 |     - Overall latency justification
 627 | 
 628 | 
 629 | Notes:
 630 | 
 631 | - Treat the system design as an actual work project, for which you have to gather and clear all the requirements and then do the design, and treat your interviewer as your colleague to discuss the trade offs for your design
 632 | - Step 1 is the most important one, you'll need to know what you are about to build after all, and figure out all the requirements needed
 633 | - Step 2 should be asked, but most of the time you may be asked to design a system as a startup (i.e. you don't have many users), and then scale as you have more customers. So you don't have to give a detailed analysis at the beginning, unless is specifically asked.
 634 | - API design vs database schema design: you probably don't need to talk about both. DB design is asked more frequently in my experience.
 635 | - The key in system design is talking about trade offs, why you selected certain technologies over others and what are the draw backs.
 636 | 
 637 | 
 638 | Reference:
 639 | 
 640 | - [System Design Template](https://leetcode.com/discuss/career/229177/My-System-Design-Template)
 641 | - [System Design - InterviewBit](https://www.interviewbit.com/courses/system-design/)
 642 | - [SNAKE - System Design Principles to crack a system design in 5 steps \| Bowen's blog](https://bowenli86.github.io/2016/06/28/system%20design/SNAKE-System-Design-Principles-to-crack-a-system-design-in-5-steps/)
 643 | 
 644 | 
 645 | ### Chapter 5: System Design Components
 646 | 
 647 | Network systems will eventually comes down to these components and design patterns, so it is critical to understand these components and be able to discuss design decisions and trade offs for any component.
 648 | 
 649 | ![System Components](./images/System-Components.png)
 650 | 
 651 | - [Internet Protocol Suite](./SystemDesign/internet_protocol_suite.md)
 652 |     - OSI model
 653 |     - Internet protocol suite
 654 |     - TCP, UDP, QUIC, SCTP, TCP/IP
 655 |     - HTTP, HTTPS
 656 |     - socket, websocket, long-polling
 657 |     - REST, SOAP
 658 |     - HTTP response status codes
 659 | - [Load Balancer, Reverse Proxy, API Gateway](./SystemDesign/load_balancer.md)
 660 |     - LB types: layer 4 and layer 7
 661 |     - LB algorithms: least connection, least response time, least bandwidth, round robin, IP hash
 662 |     - Reverse Proxy
 663 |     - API Gateway
 664 |     - An example: The Architecture of Uber’s API gateway
 665 | - [Cache](./SystemDesign/cache.md)
 666 |     - Cache Usage Pattern
 667 |         - Cache Aside
 668 |         - Cache-as-SoR (system-of-record): Read through, write through, write behind
 669 |     - Cache Eviction Policies
 670 |     - Redis vs Memcached
 671 | - Data Store
 672 |     - Database Management Systems
 673 |         - Design Principals
 674 |             - [Replication & Partition](./SystemDesign/replication_partition.md)
 675 |                 - Leader-follower replication, Sync/Async replication
 676 |                 - Handling node outage
 677 |                 - Replication logs
 678 |                 - Eventual consistency
 679 |                 - Multi-leader replication Topology, write conflict resolve
 680 |                 - Leaderless replication, Quorum, sloppy quorum, hinted handoff
 681 |                 - Key-value store partition
 682 |                 - Local and Global index
 683 |                 - Rebalancing partition
 684 |                 - Coordination service, gossip protocol
 685 |             - [Transaction & Isolation](./SystemDesign/transaction_isolation.md)
 686 |                 - ACID
 687 |                 - Read committed 
 688 |                 - Read skew
 689 |                 - Snapshot isolation
 690 |                 - MVCC
 691 |                 - Lost update
 692 |                 - Write skew
 693 |                 - Phantom
 694 |                 - Two-phase locking (2PL): Shared lock, exclusive lock, predicate lock, index-range lock
 695 |                 - Serializable Snapshot Isolation (SSI)
 696 |             - [Consistency & Consensus](./SystemDesign/consistency_consensus.md)
 697 |                 - Linearizability
 698 |                 - CAP theorem
 699 |                 - Causal dependency, consistent with causality, causally consistent
 700 |                 - Total order, partially ordered
 701 |                 - Lamport timestamp
 702 |                 - Total Order Broadcast
 703 |                 - Fencing token: monotonically increasing number for lock
 704 |                 - epoch number: monotonically increasing number for election
 705 |                 - 2PC, 3PC, XA transaction
 706 |         - Major Types
 707 |             - [RDBMS](SystemDesign/RDBMS.md)
 708 |                 - Postgres vs MySQL
 709 |             - [NoSQL](./SystemDesign/nosql_db.md)
 710 |                 - NoSQL database types
 711 |                 - Cassandra vs MongoDB
 712 |     - [Data Storage Systems](./SystemDesign/storage_system.md)
 713 |         - File Storage
 714 |         - Block Storage
 715 |         - Object Storage
 716 |         - HDFS and Map Reduce
 717 | - Architectural Patterns
 718 | 
 719 | Now put them together, here is something you should know:
 720 | 
 721 | - [What happens when you navigate to an url](./SystemDesign/navigate_url.md)
 722 | - [How to scale web app from monolithic to distributed](./SystemDesign/scale_web_app.md)
 723 | 
 724 | ### Chapter 6: Classic Designs
 725 | 
 726 | - notification system
 727 | - rate limiter
 728 | - top k problem
 729 | - distributed message queue
 730 | - distributed cache
 731 | 
 732 | ### Chapter 7: System Design Case Study
 733 | 
 734 | - chat system (slack, etc)
 735 | - streaming system (youtube, etc)
 736 | - map system (google map, etc)
 737 | - booking system (ticket master, etc)
 738 | - notification system
 739 | - news feed 
 740 | - payment system
 741 | - top k (recommendation system, etc)
 742 | - url shortener
 743 | - distributed web crawler
 744 | - search auto-completion system
 745 | - file system (dropbox, google drive)
 746 | 
 747 | ---
 748 | 
 749 | ## Section 3: Transferrable Skills and Offer
 750 | 
 751 | ### Chapter 8: Behavioral Questions (BQ)
 752 | 
 753 | #### 8.1 Four things to remember for the BQ
 754 | 
 755 | - Behavioral Questions have been evaluated more and more in interviews, so make sure you are well prepared before you go to an interview
 756 | - There are many many articles online talking about behavioral questions, so if you are looking for an answer to a specific question, just go ahead and search that question on Google and YouTube. 
 757 | - Be prepared to TALK ABOUT YOUR RESUME
 758 |     - Make sure you can answer anything you put on your resume, technologies, projects, experience, etc
 759 | - use ***STAR*** to make your stories
 760 |     - situation: briefly describe the background
 761 |     - task: briefly describe what was needed to be done
 762 |     - action: describe what you did, focus on what YOU did
 763 |     - results: show the results, especially YOUR impact
 764 | 
 765 | #### 8.2 How to prepare for BQ
 766 | 
 767 | Follow these steps:
 768 | 
 769 | 1. Prepare to talk about your resume
 770 |     - know all the technologies you've listed on your resume
 771 |     - be ready to explain why you quit each of your job (at least why quitted the most recent job)
 772 |     - be ready to talk about the projects you listed in your resume
 773 |         - technologies
 774 |         - challenges
 775 |         - YOUR impact, what did you do
 776 |         - collaborations
 777 | 
 778 | 2. There are three questions you must prepare:
 779 |     1. Introduce yourself (a good way to prepare is the elevator pitch, google it if you don't know)
 780 |     2. Why ABC Company (i.e. why do you want to apply to/work for our company)
 781 |     3. Why do you want to quit (or quitted) your job (if you ever had a job)
 782 | 
 783 | 3. Go through the [Amazon leadership Principles](https://www.amazon.jobs/en/principles)
 784 |     - Prepare 2 - 3 stories for each principal, and you should be good for most of the interviews for ANY companies
 785 |     - [Amazon's 14 Leadership Principles Video via Jeff Bezo](https://www.youtube.com/watch?v=B-xdfQv3I1k) is really great 
 786 |     - This list: [Amazon asks these 35 questions in 95% of job interviews](https://www.youtube.com/watch?v=dse8OTDlRcM&list=PLLucmoeZjtMTarjnBcV5qOuAI4lE5ZinV&index=18) should give you enough details for the most common questions
 787 |     - Make sure you note down the stories you prepared, and practice to talk to others about them
 788 | 
 789 | 4. Make sure you give the following questions extra attention:
 790 |     - Your strength and Weakness
 791 |     - The most challenging problem you've solved or project you worked on
 792 |         - A follow up question could be if you are doing it again now, how would you do it differently
 793 |     - Conflict with your colleagues
 794 |     - Disagree with your colleagues/boss
 795 |     - Mistakes/Failures you made and what did you learn from it
 796 |     - Lead teams (if you are senior or manager)
 797 | 
 798 | 5. Now you are prepared to answer questions, but you'll also need to prepare questions to ASK
 799 |     - Ask good questions will show the interviewers that you are interested in the company, the position, the job itself, and possibly your professionalism 
 800 |     - Ask poking questions about the team and technology
 801 |         - Programming languages (Python, Java, etc)
 802 |         - Development technologies (Docker, K8s, etc)
 803 |         - Frameworks (Django, Spring, etc) and their versions (from the versions you'll know how up-to-date their tech stack is )
 804 |         - development tools (IDEs, OS, Cloud providers, etc)
 805 |     - Generic questions
 806 |         - What is a day like in your company (this may seem too generic, but is quite important). For example, what's the sprint like (do you have sprints), do you have standup (frequency and time), how many working hours per week, when do you start your day, and much more. Pick those that you are most interested in
 807 |         - What's the team like, what's the tech stack for the team, how many BE/FE/QA etc
 808 |         - These kinda questions show that you are really interested in the job and team. You'll need to know this info anyway if there will be an offer and you choose to accept it
 809 |     - Interview related questions
 810 |         - What's the interview process like (sometimes the HR/interviewer will let you know clearly, if not you should ask), how many rounds, etc
 811 |         - Its even possible to poke the potential questions: what areas will the interview cover (algorithms, system design, take home project, etc)
 812 |             - Its ok to ask, its your HR's decision whether to tell you
 813 |         - This article shows you [How to predict your interview questions](https://interviewgenie.com/blog-1/2020/5/4/how-to-predict-your-interview-questions)
 814 |     - Ask any questions that you might be interested through your interview
 815 |         - For example, when certain technologies were mention in your interview, you may ask your interview how those technologies are used in the company
 816 |     - **NOTE** this step is very important, it will not only show that you are interested in the company and the position, but also give you a chance to learn the company culture and tech stack, and then you can decide if you really want to work for this company or not.
 817 |         - From the technical perspective, ask the tech stack and even specific versions will let you know if the team has lots of tech debt. Discussing certain technologies will also show that you are strong in the area
 818 |         - From the company culture perspective, how your interviewers dealing with certain situations will give you a sneak peak of how company operates and what the company culture looks like
 819 | 
 820 | - Here are some really good resources for you to prepare:
 821 |     - AGAIN if you wanna prepare for certain questions, there are lots examples online, just google, and use Youtube
 822 |     - There are also interview tips from the big company's website, make sure you check them out 
 823 |     - [Leetcode Interview Thoughts Amazon and Google](https://leetcode.com/discuss/interview-question/455991/i-got-an-offer-from-amazon-sde-i-and-google-l3-heres-my-thoughts)
 824 |     - [How to sell yourself in interviews — Interview Genie](https://interviewgenie.com/blog-1/2018/6/6/how-to-sell-yourself-in-interviews)
 825 |     - [How to answer interview questions about the Amazon leadership principle “Frugality” — Interview Genie](https://interviewgenie.com/blog-1/2019/4/9/how-to-answer-amazon-frugality-interview-questions)
 826 | 
 827 | - **One more thing**. During your daily work, make notes on the achievements you had. Write down the details in the STAR format as mentioned above so you won't forget it when you need it, but make sure you don't leak any sensitive data!
 828 | 
 829 | ### Chapter 9: Offer Negotiation
 830 | 
 831 | - Congratulations you got an offer!! But should you accept it immediately?
 832 | - Let me put it this way, its for your own sake to negotiate the offer
 833 | - Offer Negotiation will not only show that you are seriously considering joining the company, but also will make you happy when you actually accept the offer
 834 | - It is really unlikely that the company revoke the offer if you negotiated, but it is possible. Frankly speaking, do you really want to work at a place where you can't ask for anything?
 835 | - There is really only ONE resource that I'd like to share: [Ten Rules for Negotiating a Job Offer](https://haseebq.com/my-ten-rules-for-negotiating-a-job-offer/). Read it carefully and thoroughly, and you are good to go
 836 | 
 837 | ---
 838 | 
 839 | ## Appendix 1: Question & Answer
 840 | 
 841 | **DISCLAIMER:** These QAs are my personal opinions and experience. They are not guaranteed to be the perfect solution to the question, but is something I found really helpful from my own experience.
 842 | 
 843 | **NOTE:** You should read these QAs first before jumping into the content and resources, since these answers may save you lots time preparing the interview and potentially help you ace the interview. 
 844 | 
 845 | ### A1.1 Technical Questions
 846 | 
 847 | #### A1.1.1 How to use LeetCode as a beginner
 848 | 
 849 | - First of all, if you don't know what LeetCode is, google it and thank me later.
 850 | 
 851 | - As a beginner or new to algorithm questions, LeetCode can get overwhelming because there are almost 2000 (at the time of writing) problems!
 852 | 
 853 | - If you are new to algorithm and data structure, go the "**Explore**" tab on the top navigation bar, and go to the "**Learn**" row and learn all of them.
 854 | 
 855 | - If you already know all the data structures, and would like to practice, do the questions from the tags, and do them from easy to hard
 856 |     - Note that most of the companies rarely test hard ones, but some highly frequent hard problems come up more often recently
 857 | 
 858 | - If you are really familiar with all the data structures and common algorithms, do the problems randomly, so you can think the best data structure for solving the problems most efficiently
 859 | 
 860 | - If you are time sensitive/critical (i.e. you have an interview in the near future or you are actively looking for jobs), do the company based questions (LeetCode premium feature)
 861 | 
 862 | #### A1.1.2 How to solve LeetCode problem EFFECTIVELY
 863 | 
 864 | **Rule of thumb: make every question count!**
 865 | 
 866 | What I mean is that you have to really understand the question after you've solved it.
 867 | 
 868 | It doesn't really matter if you solved it by yourself or looked at the answers. 
 869 | 
 870 | Here are a list of ***CRITICAL*** things you have to always think about when you are working on problems:
 871 | 
 872 | - What's the best data structure/s to solve this problem
 873 | - What's the time and space complexity (Big O's)
 874 | - What's the tradeoff for the current approach (i.e. more space or more time)
 875 | 
 876 | After solving the question (again whether you solved it yourself or looked at solution/discussion):
 877 | 
 878 | - Is your solution the best way to solve it, if not is there a way to optimize your solution
 879 | - If you can't solve it yourself, what was the reason?
 880 |     - Have you seen the data structure/algorithm before? 
 881 |         - If not you should stop working on more problems and study that immediately
 882 |         - If so you should practice more on this type of problems
 883 |     - Is there any tricks when solving the problem? 
 884 |         - If not just keep practicing
 885 |         - If so NOTE it!
 886 |     - Did you have no clue when seeing the problem? 
 887 |         - Practice more of this type of problems, and summarize the solution for each problem solved
 888 | 
 889 | If you strictly do above things when you are working on the problem and after you solved it, its just a matter of time until you are an expert.
 890 | 
 891 | #### A1.1.3 How to solve LeetCode problem EFFICIENTLY
 892 | 
 893 | **Rule of thumb: Don't work on a single problem for too long, and don't be afraid to look at the solution!**
 894 | 
 895 | I know many people don't wanna look for a solution if they can't solve the problem, but spending too much time (i.e hours) on a single problem isn't efficient at all! After all you only have 24 hours a day.
 896 | 
 897 | So here is what you should do:
 898 | 
 899 | - If you have no clue at all after reading the question, look at solution directly
 900 |     - It may sounds a little cheese but this is the most efficient way cause you'll probably have no clue after 1 hour. Once you have solved enough questions this won't happen
 901 | - If you have some clue but not sure how to do it, then spend some time work on it
 902 |     - Normally spend like 15 - 30 mins, and if you still can't solve it, look at the solution
 903 | - If you have an idea on how to solve it, do it!
 904 |     - For this case, spend as much time as you need, even its one hour!
 905 |     - The reason for that is you know how to solve it, but you are not really familiar on the approach so you need more practice. By solving it on your own after many try and errors, you should be very familiar with this question and you should be able to solve it very quickly next time
 906 | 
 907 | To summarize, there are really just two points:
 908 | 
 909 | 1. Don't be afraid at looking at the solution
 910 | 2. If you are blocked, see point 1
 911 | 
 912 | #### A1.1.4 Pay close attention to these when solving problem (gain max value of leetcode problem)
 913 | 
 914 | **Rule of thumb: Consider edge cases, explain it step by step, analyze complexities, walk through code for test case**
 915 | 
 916 | To gain the max value of leetcode problems, you need to do more than just solve the problem. 
 917 | 
 918 | Here are a few things you need to pay close attention when solving the problem, because doing so will get you prepared for an interview better:
 919 | 
 920 | - When first see the problem, ask many questions like boundaries, some edge case etc.
 921 |     - Leetcode problems are quite straightforward, it shows you pretty much everything. However in the interview you'll have to work with the interviewer to get all the details of the problem you are about to solve. Make sure you fully understand the question and is aware of the boundaries and any possible edge cases
 922 | - When solving the problem, don't just jump right in writing the code, but try to explain what you are about to do first by writing some pseudo code to illustrate your thinking process. 
 923 |     - Doing so will allow your interviewer to understand your approach, and possibly correct you (or guide you) to the right path
 924 |     - You can actually follow this process when working on the leetcode problems. For example, you can first write down the pseudo code as comments, and then fill in the actual code
 925 |     - Since communication is also a really important factor in the interview process, this explanation step will greatly prepare you for an actual interview
 926 | - You can ask for the space and time complexities prior to solving the problem, but most of the time you should mention this after finishing the problem. 
 927 |     - Make sure you analyze both time and space complexity
 928 | - Last but not least, once you have the solution, make sure to walk through your code with an test case. 
 929 |     - Believe it or not, a lot people cannot debug their own code!
 930 |     - Doing so will also show a sign that you review your code first before pushing it out, which is something you should do on your daily job 
 931 | 
 932 | #### A1.1.5 Why you should use a template for the algorithm and data structures
 933 | 
 934 | Should you use templates? Many people have asked this question, and the answer is always YES.
 935 | 
 936 | Here are the reasons:
 937 | 
 938 | - Interview is a stress process and time restricted, remember its time restricted, so knowing the template will enable you to focus on solving the problem and communicating with the interviewer about your thinking process
 939 | - Some the algorithms may look easy, but really difficult to implement correct due to the various boundary/edge cases (such as binary search), so knowing a template will enable you to write some bug-free code more easily
 940 | - Templates are summarized from solving many problems, so it's easier and more efficient to learn from the templates
 941 | - It is also easier to pick up and get prepared for interview if you haven't had one for more than a year when you have some templates
 942 | - Some algorithms are difficult to implement, or at least to implement nicely. Having a good template will make your code looks much better
 943 | - Templates are the main reason for this repo :)
 944 | 
 945 | #### A1.1.6 What should I do if I lost confident when practicing leetcode
 946 | 
 947 | I know it can be super frustrating when you first started on the algorithm and data structure problems. 
 948 | 
 949 | I've been there and I know how it feels.
 950 | 
 951 | Here is the things you must know:
 952 | 
 953 | - Any algorithms with names, they are not meant to be figured out by yourself easily. Check out those algorithms on wiki, you can see the history. This is why we need the template, and why we need to study for them.
 954 | - Think of the algorithms and data structures as a math course, and the interview to be a final exam. You need to learn the formula (i.e. each algorithm and data structure) one by one, and interview is just a way to test some of them. It is totally fine not to know all of them at first, what's important is to learn them step by step
 955 | - Build your confidence step by step
 956 |     - this is why you need to solve problems based on TAGs, and solve them from easy to hard
 957 |     - know a template, so you can start programming at least
 958 | - You are not alone, we all feel the same, so don't worry, just keep working!
 959 | 
 960 | You should also refer to other QA questions for more details.
 961 | 
 962 | #### A1.1.7 I still can't solve new problems even if I finished x number of problems on LeetCode
 963 | 
 964 | This can be a common thing, and it means you weren't using Leetcode very effectively and efficiently.
 965 | 
 966 | Basically you should refer to other QA questions for more details.
 967 | 
 968 | Here is a short summary:
 969 | 
 970 | - Make notes on the problem you solved, and revisit them later
 971 | - Its ok to work on the same problem multiple times in different time, practice makes perfect after all
 972 | - AGAIN, solve problems by tags, so you can master one type at a time
 973 | - Start with templates, and go beyond the templates so you can solve slightly varied problems but in the same category
 974 | - Once you are familiar with the basic algorithms and data structures, medium and hard problems and just combination of two or three algorithms/data structures 
 975 | 
 976 | ### A1.2 Interview Questions
 977 | 
 978 | #### A1.2.1 What's the interview process look like
 979 | 
 980 | The interview process normally follows the following steps, but of course each company is different:
 981 | 
 982 | 1. You'll get an email asking you to finish an Online Assessment (OA) quiz, normally algorithm questions or simple knowledge test for the company's tech stack
 983 |     - Note this is mostly used by large companies (and some mid-size companies). Small companies rarely has it.
 984 | 2. You got contacted by the company that there will be a screening phone interview
 985 |     - This step normally is just talking about yourself and go over your resume/experience, and learn more about the company and position
 986 | 3. A technical phone interview. 
 987 |     - This could be anything, from algorithm questions, to language/framework features.
 988 |     - Each company is different.
 989 | 4. Another round of technical phone interview or take home quiz/project
 990 |     - If this is not a take home quiz/project, then its referred as the "On-site" where you spend many hours to finish many rounds of interviews (3-5 rounds depending on the company and experience level)
 991 | 5. This step varies, could be another technical interview, or company culture interview
 992 |     - Again not all company has this step, but if you got one then its mostly just to check the candidate with the company culture, or some high level open technical discussions.
 993 | 6. Offer!
 994 | 
 995 | Above steps is just a summary from my own experience (and from what I learnt from my friends). 
 996 | 
 997 | DO ask the interviewer/hr the interview process when you got contacted!
 998 | 
 999 | #### A1.2.2 How to write an effective resume
1000 | 
1001 | Well it's all about years of experience (YOE). Here are some tips you should know:
1002 | 
1003 | - If you have less experience, you should keep your resume in 1 page. If you have lots of experience, you should keep it in 2 pages. just don't go over two pages.
1004 | - You can have more details in the more recent experience, and less details in the early-year experience.
1005 | - Try tailor your resume towards the job posting requirements, especially making sure you have the KEYWORDS aligned with the job posting.
1006 | - Make sure you emphasize what YOU did and YOUR contribution to the projects listed
1007 | - I know it's often hard, but try give specific analytical numbers (i.e. improved efficiency by 200%)
1008 | - Highlight your skills and achievements
1009 | - DOUBLE CHECK SPELLING AND GRAMMAR, have another person proofread for you
1010 | 
1011 | #### A1.2.3 I have applied to many jobs but still no interview
1012 | 
1013 | - If you are a new grad, this is common. It's difficult for a new grad to find job pretty much in any industries. My advise is to work on your resume as much as you can, make sure each resume is targeted to the job posting really well, write cover letter to further show your enthusiastic about the job, and keep applying.
1014 | - If you have a few years of experience, make sure your resume is great. 
1015 |     - Make sure your resume is up-to-date, no silly spell and grammar issues
1016 |     - Make sure you have concentrated on your contributions in your resume
1017 |     - Make sure you have highlighted your skills and achievements
1018 |     - Maybe find a professional service to help you with your resume if it's still not as polished as you want it to be
1019 |     - keep applying, you'll never know if there is a good opportunity waiting for you
1020 | 
1021 | #### A1.2.4 How to solve an algorithm/data structure problem in interview
1022 | 
1023 | In other QAs I explained how to grind Leetcode efficiently and effectively. 
1024 | 
1025 | With enough practice, you got an interview, here are some tips that can increase your success rate.
1026 | 
1027 | - Make sure you ask clarifications at the beginning. What are the inputs, input types, what are given, what should be the output etc
1028 | - Before writing code, THINK OUT LOUD
1029 |     - Discuss your thinking process with your interviewer
1030 |     - what data structures you choose, why you choose them, what are the steps to implement
1031 |     - During the process, write them down. For example, write them as comments, step by step
1032 | - Once you interviewer knows your thoughts, ask if its ok to code, then code. Sometime you may even able to skip the coding entirely.
1033 | - Be ready to discuss time and space complexity
1034 | - Be ready for the follow up questions
1035 | 
1036 | ### A1.3 General Questions
1037 | 
1038 | #### A1.3.1 Large Company VS Small Company
1039 | 
1040 | The large companies such as the "Big Five" (Google, Amazon, Facebook, Apple, Microsoft) or the FAANG (Facebook, Amazon, Apple, Netflix, Google) is very attractive to developers, but there are also many smaller companies and startups which makes the majority of the market. 
1041 | 
1042 | There are also FAANGMULA or FAANGULTAD, feel free to look them up.
1043 | 
1044 | The debate of whether working for a large company or a small company is ongoing and probably will never stop, since they can be quite different.
1045 | 
1046 | The short conclusion is that:
1047 | 
1048 | - Large company
1049 |     - Large company is a big platform where you can build your network very effectively and gain an insight of how the large company and complex architecture works, 
1050 |     - big companies have their own tech stack that you'll have to learn, and can't use them elsewhere so its important to know how they work under the hood
1051 |     - the down side is that there might be micromanagement and office politics (to some degree at least), it is difficult to learn the full development cycle, software development process maybe long and cumbersome, and promotion could take long time but has a clear path 
1052 | - Small company 
1053 |     - small company allows you to learn the full development cycle, gain a lot of project experience quickly, and release a complete product from idea to production.
1054 |     - You'll be spending the majority of your time coding and reviewing code, so the development cycle and shipping is pretty fast. 
1055 |     - small companies mostly use open source tools and frequently keep the tech stack updated with latest technologies. 
1056 |     - the scope of the projects may be smaller, and the promotion may be faster, but the career path may not be very clear
1057 |     - The down side is that your company is not well know so your skills could potentially be questioned, building connections is a bit more difficult, and the company structure and process may be a chaos for a period of time 
1058 | 
1059 | I've created a table for you. (It is subjective and depending on different teams, numbers are not accurate but to give you an idea)
1060 | 
1061 | |     | Large Company | Small Company |
1062 | | :-: | :-----------: | :-----------: |
1063 | | Networking | Easy to connect to highly talented people | Have to work on networking |
1064 | | Programming (varies with different comp) | less | more |
1065 | | Feature release process | could have many rounds of review and approval process | normally just peer review |
1066 | | tech stack | depends on different teams, lots of internal tools | depends on project and the company culture, frequently seen popular open-source tools |
1067 | | Interview process | Many rounds, from 5 to 10+ | normally 3-5 rounds, sometimes have a take home project |
1068 | | Interview Content | Mostly focus on algorithm/data structure problems | Mostly about hands on experience on the company tech stack |
1069 | | Career Path | Clear but take longer time | Not super clear but could be promoted to high level (c-level is also possible) |
1070 | 
1071 | 
1072 | 
1073 | #### A1.3.2 How to get your FIRST job! (How to become more competitive among the candidates)
1074 | 
1075 | To begin with, this is not limited to how to get the first job, but is meant to show you how to stand out of the crowd.
1076 | 
1077 | **Rule of thumb: build your reputation, gain more project experience, networking!!**
1078 | 
1079 | Essentially, you'll need to build your reputation. How to do that? There are a few ways:
1080 | 
1081 | - **Attending meetups!** First and for most!
1082 |     - Meetups, especially local meetups give you the opportunities to directly talk to a person who has a job, and whose company might be hiring. So if you go to meetups frequently and make connections, job opportunities will come to you!
1083 | 
1084 | - Make sure you resume and **LinkedIn profile** (especially this one!) is up-to-date!
1085 |     - The recruiters and agents are quite active on LinkedIn, and I received a lot of interests from them.
1086 | 
1087 | - Gain more project experience. **Either contribute to open source project or make your own personal project.**
1088 |     - This is not only for new grads, but also for those who have a few years of experience but has no project to show.
1089 |     - I'm sure you all have lots of project experience from work, but I'm afraid you can't show them to the interviewer! The personal project is a way to SHOW OFF your skills and codes, and a way to convince the HR/interviewer that you are a strong candidate
1090 |     - NOTE: most of the time HR/interviewer will not look at your pet project, but you can still let them know and probably brag about it during your interview, it'll always add scores!
1091 |     - At last, make sure your pet project has quality code and follow standards! Otherwise it won't help.
1092 | 
1093 | - Apply to many jobs and **don't shy to ask for help**.
1094 |     - **Don't restrict yourself!** For first job, make sure you are fully prepared and apply to all companies that you can apply to, don't limit yourself to only large companies or certain companies. After all, once you had your first job, it'll be easier to find the second one, third one, etc
1095 |     - Ask your network for potential job opportunities. Don't be shy to ask around, but make sure you are polite and not harassing people. It doesn't hurt to ask and sometimes the result may surprise you.
1096 | 
1097 | - **Don't give up!** If you can't find a job or even receive any responses, **its not your fault**.
1098 |     - Most companies won't response if you are not a good match nowadays. I know it doesn't feel good and you feel like you are just hanging there, but be aware that its not your fault.
1099 | 
1100 | A lot times finding jobs is more of luck than anything else, so be prepared, be patient, and don't give up!
1101 | 
1102 | Good luck!!!
1103 | 
1104 | 
1105 | ## Appendix 2: Resources
1106 | 
1107 | ### A2.1 Learning Experience
1108 | 
1109 | - [Everything About Python — Beginner To Advanced](https://medium.com/fintechexplained/everything-about-python-from-beginner-to-advance-level-227d52ef32d2)
1110 | - [Coding Interview University](https://github.com/jwasham/coding-interview-university)
1111 | - [GitHub - TheAlgorithms/Python: All Algorithms implemented in Python](https://github.com/TheAlgorithms/Python)
1112 | - [VisuAlgo - visualising data structures and algorithms through animation](https://visualgo.net/en)
1113 | 
1114 | #### A2.1.1 Online MOOC courses
1115 | 
1116 | - [CS 61A: Structure and Interpretation of Computer Programs](https://cs61a.org/)
1117 | - [CS 61B: Data Structures Spring 2019](https://sp19.datastructur.es/)
1118 | - [CS 61C: Computer Architecture (Machine Structures)](https://cs61c.org/)
1119 | - [MIT6.046: Design and Analysis of Algorithms](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-design-and-analysis-of-algorithms-spring-2015/)
1120 | - [MIT6.824: Distributed Systems](https://pdos.csail.mit.edu/6.824/)
1121 | 
1122 | 
1123 | ### A2.2 How to solve Algorithm Questions
1124 | 
1125 | - [Fucking Algorithm - Algorithm template - Java, Python - English, Chinese](https://github.com/labuladong/fucking-algorithm/tree/english)
1126 | - [Algorithm Pattern - Algorithm template - Go - Chinese](https://github.com/greyireland/algorithm-pattern)
1127 | - [Hello Algorithm - Leetcode Solution - Java - English, Chinese](https://github.com/geekxh/hello-algorithm)
1128 | - [LeetCode Topics and Interview Questions Collection - Leetcode Solution - Java - English, Chinese](https://github.com/yuanguangxin/LeetCode)
1129 | - [Top 10 algorithms in Interview Questions](https://www.geeksforgeeks.org/top-10-algorithms-in-interview-questions/#algo3)
1130 | - [CHEATSHEET: LEETCODE COMMON TEMPLATES & COMMON CODE PROBLEMS - English](https://cheatsheet.dennyzhang.com/cheatsheet-leetcode-a4)
1131 | - [LeetCode 101 - A LeetCode Grinding Guide (C++ Version) - Chinese](https://github.com/changgyhub/leetcode_101/blob/master/LeetCode%20101%20-%20A%20LeetCode%20Grinding%20Guide%20(C%2B%2B%20Version).pdf)
1132 | 
1133 | ### A2.3 OOD (Object Oriented Design)
1134 | 
1135 | #### A2.3.1 SOLID Principals
1136 | 
1137 | - [S.O.L.I.D. Principles of Object-Oriented Design - A Tutorial on Object-Oriented Design](https://www.youtube.com/watch?v=GtZtQ2VFweA&ab_channel=LaraconEU)
1138 | - [Understanding the Single Responsibility Principle](https://www.youtube.com/watch?v=L2m-S0Pj_Xk&ab_channel=edutechional)
1139 | - [Understanding the Open Closed Principle](https://www.youtube.com/watch?v=Ryhy7333mqQ&ab_channel=edutechional)
1140 | - [Understanding the Liskov Substitution Principle](https://www.youtube.com/watch?v=Mmy1EUKC_iE&ab_channel=edutechional)
1141 | - [OOP Design Principles: Interface Segregation Principle](https://www.youtube.com/watch?v=Ye1h3zKl1lg&ab_channel=edutechional)
1142 | - [OOP Design Principles: Dependency Inversion Principle](https://www.youtube.com/watch?v=qL2-5g_lJTs&ab_channel=edutechional)
1143 | - [Refactoring From Trash to SOLID](https://medium.com/swlh/refactoring-from-trash-to-solid-74b10005ccd3)
1144 | 
1145 | #### A2.3.2 Clean Code - Uncle Bob lessons
1146 | 
1147 | Uncle Bob, whose a software engineer and introduced teh S.O.L.I.D. principals for clean code writing. 
1148 | 
1149 | Here is a recent series of his open talks and I feel its valuable to spend time watching them at least once.
1150 | 
1151 | If you don't want to read the book, at least you should watch this series
1152 | 
1153 | - [Clean Code - Uncle Bob / Lesson 1: SOLID principals, refactor, DRY](https://www.youtube.com/watch?v=7EmboKQH8lM&ab_channel=UnityCoin)
1154 | - [Clean Code - Uncle Bob / Lesson 2: Comments, docs, naming,  reviews](https://www.youtube.com/watch?v=2a_ytyt9sf8&ab_channel=UnityCoin)
1155 | - [Clean Code - Uncle Bob / Lesson 3: Software growth, QA, teamwork](https://www.youtube.com/watch?v=Qjywrq2gM8o&ab_channel=UnityCoin)
1156 | - [Clean Code - Uncle Bob / Lesson 4: TDD](https://www.youtube.com/watch?v=58jGpV2Cg50&ab_channel=UnityCoin)
1157 | - [Clean Code - Uncle Bob / Lesson 5: Architecture, project development](https://www.youtube.com/watch?v=sn0aFEMVTpA&ab_channel=UnityCoin)
1158 | - [Clean Code - Uncle Bob / Lesson 6: project management](https://www.youtube.com/watch?v=l-gF0vDhJVI&ab_channel=UnityCoin)
1159 | 
1160 | ### A2.4 Design Patterns
1161 | 
1162 | - [Design Patterns](https://www.tutorialspoint.com/design_pattern/filter_pattern.htm)
1163 |     - You all need to learn the design patterns eventually
1164 | - [Design Patterns in Python](https://github.com/faif/python-patterns)
1165 | 
1166 | 
1167 | ### A2.5 Async in Python
1168 | 
1169 | - [Demystifying Python's Async and Await Keywords](https://www.youtube.com/watch?v=F19R_M4Nay4&ab_channel=JetBrainsTV)
1170 | - [Thinking Outside the GIL with AsyncIO and Multiprocessing - PyCon 2018](https://www.youtube.com/watch?v=0kXaLh8Fz3k&ab_channel=PyCon2018)
1171 | - [Advanced asyncio: Solving Real-world Production Problems - PyCon 2019](https://www.youtube.com/watch?v=bckD_GK80oY&ab_channel=PyCon2019)
1172 | 
1173 | 
1174 | ### A2.6 System Design
1175 | 
1176 | - [System Design Interview](https://www.youtube.com/c/SystemDesignInterview/videos)
1177 |     - MUST see thoroughly, greatly explained by a Senior Amazon engineer, this is what you should expect in an interview
1178 | - [The System Design Primer](https://github.com/donnemartin/system-design-primer)
1179 |     - The repo with explanations, examples, and study cases
1180 | - [Distributed systems for fun and profit](http://book.mixu.net/distsys/single-page.html)
1181 |     - About 100 pages free book
1182 | - [System Design Cheatsheet](https://gist.github.com/vasanthk/485d1c25737e8e72759f)
1183 | - [System Design Cheatsheet - Guvi - Medium](https://medium.com/guvi/system-design-cheatsheet-251c6fe7f20c)
1184 | - [CheatSheet: System Design For Job Interview – CheatSheet](https://cheatsheet.dennyzhang.com/cheatsheet-systemdesign-a4)
1185 | - [GitHub - puncsky/system-design-and-architecture: Learn how to design large-scale systems. Prep for the system design interview.](https://github.com/puncsky/system-design-and-architecture)
1186 | - [GitHub - checkcheckzz/system-design-interview: System design interview for IT companies](https://github.com/checkcheckzz/system-design-interview)
1187 | - [High Performance Browser Networking (O'Reilly)](https://hpbn.co/)
1188 | - [GitHub - binhnguyennus/awesome-scalability: The Patterns of Scalable, Reliable, and Performant Large-Scale Systems](https://github.com/binhnguyennus/awesome-scalability)
1189 | - [The Architecture of Open Source Applications (Volume 2): Scalable Web Architecture and Distributed Systems](http://www.aosabook.org/en/distsys.html)
1190 | 
1191 | This repo has the full list of company engineering blogs:
1192 | 
1193 | - [Engineering Blogs](https://github.com/kilimchoi/engineering-blogs)
1194 | 
1195 | Papers:
1196 | 
1197 | - [Google MapReduce: Simplified Data Processing on Large Clusters](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf)
1198 | - [The Google File System](https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf)
1199 | - [TAO: Facebook’s Distributed Data Store for the Social Graph](https://www.usenix.org/system/files/conference/atc13/atc13-bronson.pdf)
1200 | - [Dynamo: Amazon’s Highly Available Key-value Store](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)
1201 | 
1202 | ### A2.7 Machine Learning
1203 | 
1204 | - [100 Days of Machine Learning Coding](https://github.com/Avik-Jain/100-Days-Of-ML-Code)
1205 | 
1206 | 
1207 | ### A2.8 Reinforcement Learning
1208 | 
1209 | - [Reinforcement Learning Methods and Tutorials](https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow)
1210 | 
1211 | ---
1212 | 
1213 | ## Postface
1214 | 
1215 | The content ends here, but the learn never stops.
1216 | 
1217 | If you like my content, feel free to star/like/fork on github.
1218 | 
1219 | [Here is my patron page](https://www.patreon.com/CZTechHut), any support is much appreciated and will motivate me a lot in creating more content.
1220 | 
1221 | Thanks again and best wishes.
1222 | 


--------------------------------------------------------------------------------
/SystemDesign/RDBMS.md:
--------------------------------------------------------------------------------
 1 | # RDBMS (Relational Database Management System)
 2 | 
 3 | - Relational databases are normally row based
 4 | - Postgres and MySQL are most widely used
 5 | 
 6 | ## Postgres vs MySQL
 7 | 
 8 | - Postgres:
 9 |     - object-relational database
10 |     - open source, easy to install, highly extensible
11 |     - implements Multi-version Concurrency Control (MVCC) without read locks
12 |     - protecting data integrity at the transaction level
13 | - MySQL
14 |     - purely relational database
15 |     - most popular
16 |     - better performance on large scale of data (larger than millions)
17 | 
18 | 
19 | 
20 | Reference
21 | 
22 | - [MySQL vs PostgreSQL -- Choose the Right Database for Your Project \| Okta Developer](https://developer.okta.com/blog/2019/07/19/mysql-vs-postgres#:~:text=Postgres%20is%20an%20object%2Drelational,more%20closely%20to%20SQL%20standards.)
23 | - [Why Uber Engineering Switched from Postgres to MySQL \| Uber Engineering Blog](https://eng.uber.com/postgres-to-mysql-migration/)
24 | 
25 | 


--------------------------------------------------------------------------------
/SystemDesign/cache.md:
--------------------------------------------------------------------------------
 1 | # Cache
 2 | 
 3 | When talking about cache, here we are talking about web development related cache. 
 4 | 
 5 | [Here is a good article to start with](https://www.digitalocean.com/community/tutorials/web-caching-basics-terminology-http-headers-and-caching-strategies)
 6 | 
 7 | Here we are focusing on database cache.
 8 | 
 9 | - Cache Usage Pattern
10 |     - **Cache Aside**: application is responsible for reading and writing from the database and the cache doesn't interact with the database at all
11 |         - Application queries data from cache first
12 |         - if cache contains data return directly bypasses database
13 |         - if not fetch from database, then stores in cache
14 |         - The most common cache-aside systems are Memcached and Redis
15 |     - **Cache-as-SoR (system-of-record)**: the application treats cache as the main data store and reads data from it and writes data to it
16 |         - **Read through**
17 |             - the cache is configured with a loader component that knows how to load data from the database
18 |             - if an entry does not exist within the cache, the cache invokes the loader to retrieve the value from the database, then caches the value, then returns it to the caller.
19 |         - **Write through**
20 |             - the cache is configured with a writer component that knows how to write data to database
21 |             - When the cache is asked to store a value for a key, the cache invokes the writer to store the value in the SoR, as well as updating the cache.
22 |         - **Write behind**
23 |             - Similar to write behind, rather than writing to the database while the thread making the update waits (as with write-through), write-behind queues the data for writing at a later time.
24 | - Cache Eviction Policies
25 |     - LRU: least recently used
26 |     - LFU: least frequently used
27 |     - FIFO: first in first out
28 |     - LIFO: last in first out
29 |     - FILO: first in last out
30 |     - and many many more
31 | 
32 | 
33 | ## Redis 
34 | 
35 | ### Memory management 
36 | 
37 | - Redis only caches all the key information. Not all data storage occurs in memory
38 | - When the physical memory is full, Redis may swap values not used for a long time to the disk.
39 | - When the memory usage exceeds the threshold value, Redis will trigger the swap operation.
40 | - Redis calculates the values for the keys to be swapped to the disk based on *“swappability = age\*log(size_in_memory)”* 
41 | - The machine memory must keep all the keys and it will not swap all the data.
42 | - When Redis swaps the in-memory data to the disk, the main thread that provides services, and the child thread for the swap operation will share this part of memory.
43 |     - So, if you update the data you intend to swap, Redis will block this operation, preventing the execution of such a change until the child thread completes the swap operation.
44 | 
45 | ### Multi-threading
46 | 
47 | - Redis was known to be single threaded, but now its changed
48 | - After Redis 4.0, it also has background threads to process slow operations such as clean up, releasing useless connections, bulk delete, etc
49 | - Redis 6.0 version supporting multi-threading was finally released on 2020-05-02
50 |     - There are two main directions for optimization:
51 |         - To improve network IO performance, typical implementations such as using DPDK to replace the kernel network stack.
52 |         - Use multi-threading to make full use of multi-core, typical implementations such as Memcached.
53 | 
54 | 
55 | 
56 | ## Redis vs. Memcached: In-Memory Data Storage Systems
57 | 
58 | | Comparison | Redis | Memcached |
59 | | :---: | :--------: | :---: |
60 | | Data Types Supported | string, hash, list, set, sorted set | Hash Table with string and integers |
61 | | Server-end data operations | owns more data structures and supports richer data operations | need to copy the data to the client end for similar changes and then set the data back thus increases IO counts and data sizes |
62 | | Memory Management | Encapsulated malloc/free | Slab Allocation mechanism |
63 | | Memory use efficiency | Lower | higher memory utilization rate |
64 | | Data Persistence | RDB snapshot and AOF log | None |
65 | | Performance | single core so higher performance in small data storage | multiple cores so outperforms for storing data of 100k or above |
66 | 
67 | 
68 | Reference:
69 | 
70 | - [Caching for Resiliency](https://medium.com/the-cloud-architect/patterns-for-resilient-architecture-part-4-85afa66d6341#:~:text=There%20are%20two%20basic%20caching,%E2%80%94%20also%20called%20inline%2Dcache.)
71 | - [Database Caching](https://aws.amazon.com/caching/database-caching/)
72 | - [Cache Usage Patterns](https://www.ehcache.org/documentation/3.3/caching-patterns.html)
73 | - [Using Read-Through and Write-Through in Distributed Cache - DZone Database](https://dzone.com/articles/using-read-through-amp-write-through-in-distribute)
74 | - [Cache replacement policies](https://en.wikipedia.org/wiki/Cache_replacement_policies)
75 | - [Redis vs. Memcached: In-Memory Data Storage Systems](https://medium.com/@Alibaba_Cloud/redis-vs-memcached-in-memory-data-storage-systems-3395279b0941)
76 | - [Redis 6.0, which supports multi-threading, is finally released New features serial 13 questions - Programmer Sought](https://www.programmersought.com/article/30635498543/)
77 | 


--------------------------------------------------------------------------------
/SystemDesign/consistency_consensus.md:
--------------------------------------------------------------------------------
  1 | # Consistency and Consensus
  2 | 
  3 | ## Linearizability
  4 | 
  5 | - Imagine each operation (read or write) is marked with a vertical line at the time of execution, the requirement of linearizability is that the lines joining up the operation markers always move forward in time, never backward
  6 |     - once a new value has been written or read, all subsequent reads see the value that was
  7 |     - it doesn’t assume any transaction isolation: another client may change a value at any time
  8 | - essentially means “behave as though there is only a single copy of the data, and all operations on it are atomic”
  9 | - Used for leader election in coordination service (such as Zookeeper)
 10 | - Used for uniqueness constraints in database
 11 | 
 12 | ### Linearizability vs Serializability
 13 | 
 14 | - Serializability
 15 |     - an isolation property of transactions
 16 |     - It guarantees that transactions behave the same as if they had executed in some serial order
 17 |     - It is okay for that serial order to be different from the order in which transactions were actually run
 18 | - Linearizability
 19 |     - a recency guarantee on reads and writes of a register (object)
 20 |     - It doesn’t group operations together into transactions, so it does not prevent problems such as write skew
 21 | 
 22 | ## CAP theorem
 23 | 
 24 | - **CAP (Consistency, Availability, Partition tolerance) theorem**: you can only pick 2 out of 3
 25 | - A better way to describe is **either Consistent or Available when Partitioned**
 26 | 
 27 | ## Ordering Guarantees
 28 | 
 29 | - *causal dependency*: i.e. question and the answer, git commit history and brunches
 30 | - *consistent with causality*: the effects of all operations that happened causally before that point in time are visible, but no operations that happened causally afterward can be seen.
 31 |     - For example, if the snapshot contains an answer, it must also contain the question being answered
 32 | - *causally consistent*: system obeys the ordering imposed by causality
 33 | - A *total order* always allows any two elements to be compared (i.e. unique sequence number)
 34 | - *partially ordered*: in some cases one set is greater than another, but in other cases they are incomparable.
 35 | - Tracking causal dependency: 
 36 |     - Explicitly tracking all the data that has been read would mean a large overhead.
 37 |     - a better way could be to use sequence numbers or timestamps to order events
 38 |     - **Lamport timestamp**: a simple method for generating sequence numbers that is consistent with causality
 39 |         - Each node has a unique identifier, and each node keeps a counter of the number of operations it has processed
 40 |         - The Lamport timestamp is then simply a pair of (counter, node ID)
 41 |         - Lamport timestamp bears no relationship to a physical time-of-day clock, but it provides total ordering
 42 | - Total Order Broadcast: protocol for exchanging messages between nodes
 43 |     - two safety properties should always be satisfied
 44 |         - *Reliable delivery*: No messages are lost: if a message is delivered to one node, it is delivered to all nodes.
 45 |         - *Totally ordered delivery*: Messages are delivered to every node in the same order.
 46 |     - Implementation:
 47 |         - assume that every time the lock server grants a lock or lease, it also returns a **fencing token**, which is a number that increases every time a lock is granted
 48 |         - every time a client sends a write request to the storage service, it must include its current fencing token
 49 |         - For example, if a node has delivered message 4 and receives an incoming message with a sequence number of 6, it knows that it must wait for message 5 before it can deliver message 6.
 50 |     - Usages: 
 51 |         - Consensus services (i.e. Zookeeper)
 52 |         - Logs (replication log, transaction log, or write-ahead log)
 53 | 
 54 | 
 55 | ## Consensus: get several nodes to agree on something
 56 | 
 57 | ### Two-Phase Commit (2PC)：transaction commit across multiple nodes
 58 | 
 59 | - with help of ***transaction manager (coordinator)***, *participants* (nodes that participates in read/write for the transaction) 
 60 | - Phase 1: coordinator sends prepare request to each participants, ask if they are able to commit.
 61 | - Phase 2: if all are good to commit, commit is performed. Otherwise abort.
 62 | - process:
 63 |     1. When the application wants to begin a distributed transaction, it requests a **globally unique transaction ID** from the coordinator 
 64 |     2. The application begins a single-node transaction on each of the participants, and attaches the globally unique transaction ID to the single-node transaction. All reads/writes are done in one of these node atomically
 65 |     3. When the application is ready to commit, the coordinator sends a prepare request to all participants with the global transaction ID. If any one failed, the coordinator sends an abort request for that transaction ID to all participants.
 66 |     4. When a participant receives the prepare request, it makes sure that it can definitely commit the transaction under all circumstances. The participant will not abort the transaction, but without actually committing it.
 67 |     5. Once the coordinator’s decision has been written to disk, the commit or abort request is sent to all participants. If this request fails or times out, the coordinator must retry forever until it succeeds.
 68 | - two crucial “points of no return” (ensure the atomicity of 2PC):
 69 |     1. when a participant votes “yes,” it promises that it will definitely be able to commit later (although the coordinator may still choose to abort)
 70 |     2. once the coordinator decides, that decision is irrevocable
 71 | - coordinator failed
 72 |     - The only way 2PC can complete is by waiting for the coordinator to recover. 
 73 |     - This is why the coordinator must write its commit or abort decision to a transaction log on disk before sending commit or abort requests to participants
 74 |     - when the coordinator recovers, it determines the status of all in-doubt transactions by reading its transaction log. 
 75 |     - Any transactions that don’t have a commit record in the coordinator’s log are aborted. 
 76 |     - Thus, the commit point of 2PC comes down to a regular single-node atomic commit on the coordinator.
 77 | - 2PC is thus called a ***blocking atomic commit protocol***
 78 | - ***Three-phase commit (3PC)*** is called ***nonblocking atomic commit*** and requires a ***perfect failure detector*** to ensure nonblocking
 79 | 
 80 | 
 81 | ### Exactly-once message processing
 82 | 
 83 | - implemented by atomically committing the message acknowledgment and the database writes in a single transaction
 84 | - If either the message delivery or the database transaction fails, both are aborted, so it can safely be retried later
 85 | - all systems affected by the transaction are required to use the same atomic commit protocol
 86 | 
 87 | 
 88 | ### XA (eXtended Architecture) transactions
 89 | 
 90 | - XA is not a network protocol but merely a C API for interfacing with a transaction coordinator
 91 | - XA assumes that your application uses a network driver or client library to communicate with the participant databases or messaging services
 92 | - The standard does not specify how it should be implemented, but in practice the coordinator is often a library that is loaded into the **same process** as the application issuing the transaction
 93 |     - It keeps track of the participants in a transaction, collects their responses after asking them to prepare (via a callback into the driver), and uses a log on the local disk to keep track of the commit/abort decision for each transaction.
 94 |     - If the application process crashes, the coordinator goes with it. Since logs are on application server's local disk, the server must be restarted. 
 95 |     - The database server cannot contact the coordinator directly, since all communication must go via its client library.
 96 | - If the logs are corrupted so that the in-doubt transactions cannot decide the outcome, only manual intervene (by admin) can resolve it, otherwise it'll lock forever
 97 |     - one solution: ***heuristic decisions***: allowing a participant to unilaterally decide to abort or commit an in-doubt transaction without a definitive decision from the coordinator, but this MAY break atomicity. Thus, it is only intended only for getting out of catastrophic situations, and not for regular use.
 98 | - Limitations:
 99 |     - Coordinator can become a single point of failure
100 |     - when the coordinator is part of a **stateless** application server, it changes the nature of the deployment, since the coordinator’s logs become a crucial part, the server is no longer stateless
101 |     - Compatibility: XA cannot detect deadlocks across different systems (since that would require a standardized protocol for systems to exchange information on the locks that each transaction is waiting for), and it does not work with SSI, since that would require a protocol for across different systems.
102 | 
103 | ### fault tolerance consensus
104 | 
105 | - a fault tolerance consensus algorithm must satisfy the following properties:
106 | 
107 |     - Uniform agreement: No two nodes decide differently.
108 |     - Integrity: No node decides twice.
109 |     - Validity: If a node decides value v, then v was proposed by some node.
110 |     - Termination: Every node that does not crash eventually decides some value.
111 |         - The system model of consensus assumes that when a node “crashes,” it suddenly disappears and never comes back.
112 |         - any consensus algorithm requires at least a majority of nodes to be functioning correctly in order to assure termination. That majority can safely form a quorum.
113 | - The best-known fault-tolerant consensus algorithms are Viewstamped Replication (VSR), Paxos, Raft, and Zab.
114 | 
115 | - ***epoch number***: a number such that within each epoch, the leader is unique
116 |     - each time the leader is dead, nodes will start a vote to elect a new leader
117 |     - the election is given an incremented epoch number (totally ordered)
118 |     - if there is conflict, the leader with higher epoch number wins
119 |     - before the selected leader does anything, need to make sure there is no leader with higher epoch number: collect votes from a quorum of nodes
120 |     - so there will be two rounds of voting for the process, one is to select leader, one is to vote on the leader's proposal. If the results of two votes are inconsistent, leader cannot be promoted 
121 | 
122 | - Limitations of consensus
123 |     - The process by which nodes vote on proposals before they are decided is a kind of synchronous replication, some committed data can potentially be lost on failover
124 |     - Consensus systems always require a strict majority to operate (you need at least 3 nodes)
125 |     - Most consensus algorithms assume a fixed set of nodes that participate in voting, so it's difficult to scale
126 |     - Consensus systems generally rely on timeouts to detect failed nodes
127 | 
128 | 
129 | ## Reference:
130 | 
131 | - [Designing Data-Intensive Applications](https://www.amazon.ca/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321/ref=sr_1_1?dchild=1&gclid=Cj0KCQjw_8mHBhClARIsABfFgpg5q5IQvE2s5OBULx6LQFDETV41haS67EE3JAfvobPADJUJHN7dUbsaAjjrEALw_wcB&hvadid=285888202784&hvdev=c&hvlocphy=9001327&hvnetw=g&hvqmt=e&hvrand=12070511852976413586&hvtargid=kwd-407664346480&hydadcr=16109_9598899&keywords=design+data+intensive+application&qid=1626587742&sr=8-1)
132 | 


--------------------------------------------------------------------------------
/SystemDesign/internet_protocol_suite.md:
--------------------------------------------------------------------------------
  1 | # Internet Protocol Suite
  2 | 
  3 | When talking about network communications, the first thing you need to understand is the `Open Systems Interconnection Model (OSI Model)`
  4 | 
  5 | ![OSI model](https://img.router-switch.com/media/wysiwyg/Help-Center-FAQ/Router/osi-model.png)
  6 | 
  7 | For the system design purpose, we don't need to go lower than layer 4.
  8 | 
  9 | The most commonly used protocols are summarized in this table:
 10 | 
 11 | ![Internet Protocol Suite](http://2.bp.blogspot.com/-8spz6AylxBQ/UWKFo86yYjI/AAAAAAAAANI/XKyMikMWn7c/s1600/tcpip.jpg)
 12 | 
 13 | It is definitely not a complete table, and we are particularly interested in the following areas:
 14 | 
 15 | - TCP vs UDP:
 16 |     - TCP: 
 17 |         - connection-oriented protocol
 18 |         - connection established between the peer entities prior to transmission
 19 |         - transmission flow is controlled such that a fast sender does not overwhelm a slow receiver.
 20 |     - UDP:
 21 |         - message-oriented protocol 
 22 |         - basically broadcasting messages, no connection/sequence guaranteed 
 23 | 
 24 | - Other transport layer protocols:
 25 |     - QUIC: Based on UDP, initially designed by google.
 26 |     - SCTP: Combination of TCP and UDP, used for telephony over the Internet.
 27 | 
 28 | - TCP/IP:
 29 |     - Sometimes people also talking about TCP/IP, it means a protocol stack which contains different protocols required for the data transfer from sender to receiver
 30 |     - Details can be seen [here](https://stackoverflow.com/questions/31473578/tcp-ip-and-tcp-and-ip-difference) and [here](https://www.fortinet.com/resources/cyberglossary/tcp-ip)
 31 | 
 32 | - HTTP: How does it work when a client wants to communicate with a server
 33 |     - Open a TCP connection
 34 |     - Send an HTTP message
 35 |     - Read the response sent by the server
 36 |     - Close connection (or reuse connection for further communication)
 37 | 
 38 | - HTTPS:
 39 |     - Extension of HTTP, but more secure
 40 |     - Use SSL/TLS to ensure security of data transportation
 41 | 
 42 | - socket:
 43 |     - A socket is one endpoint of a two-way communication link between two programs running on the network. 
 44 |     - A socket is bound to a port number so that the TCP layer can identify the application that data is destined to be sent to.
 45 | 
 46 | - websocket:
 47 |     - A WebSocket is a persistent connection between a client and server
 48 |     - WebSockets provide a bidirectional, full-duplex communications channel that operates over HTTP through a single TCP/IP socket connection
 49 | 
 50 | - HTTP vs Long-polling vs websocket:
 51 |     - HTTP is a strictly unidirectional protocol
 52 |     - Long-polling is an HTTP request with a long timeout period
 53 |         - resources on the server are tied up throughout the length of the long-poll, even when no data is available to send.
 54 |     - Websocket: allow for sending message-based data, similar to UDP, but with TCP
 55 |         - uses HTTP as the initial transport mechanism (i.e. HTTP request headers), but keeps the TCP connection alive after the HTTP response is received
 56 |         - Once TCP connection is established, it uses websocket protocol to communicate
 57 |             - WebSocket is a framed protocol, meaning that a chunk of data (a message) is divided into a number of discrete chunks, with the size of the chunk encoded in the frame. 
 58 |             - The frame includes a frame type, a payload length, and a data portion.
 59 |     - More comparison between websocket and http can be seen [here](https://www.geeksforgeeks.org/what-is-web-socket-and-how-it-is-different-from-the-http/)
 60 | 
 61 | - REST: 
 62 |     - a software architectural style that was created to guide the design and development of the architecture for the World Wide Web
 63 |     - Any web service that obeys the REST constraints is informally described as **RESTful**
 64 |     - The goal of REST is to increase performance, scalability, simplicity, modifiability, visibility, portability, and reliability.
 65 |     - Six guiding constraints define a RESTful system:
 66 |         - Client–server architecture
 67 |             - client application and server application MUST be able to evolve separately without any dependency on each other
 68 |         - Statelessness
 69 |             - The server will not store anything about the latest HTTP request the client made. It will treat every request as new. No session, no history.
 70 |         - Cacheability
 71 |             - caching shall be applied to resources when applicable
 72 |             - Caching can be implemented on the server or client-side.
 73 |         - Layered system
 74 |             - allows you to use a layered system architecture where you deploy the APIs on server A, and store data on server B and authenticate requests in Server C
 75 |         - Uniform interface
 76 |             - A resource in the system should have only one logical URI, and that should provide a way to fetch related or additional data.
 77 |         - Code on demand (optional)
 78 |             - you are free to return executable code to support a part of your application
 79 | 
 80 | - REST vs SOAP
 81 |     - REST is an architectural style, while SOAP is a protocol
 82 |     - REST is not a standard in itself, but RESTful implementations make use of standards
 83 | 
 84 | - HTTP response status codes
 85 |     - For a full list please see [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status)
 86 |     - Some common ones:
 87 |         - 200: ok/success
 88 |         - 201: created
 89 |         - 202: accepted
 90 |         - 204: No content
 91 |         - 300: more than one possible response
 92 |         - 301: permanent redirect
 93 |         - 302: temporarily redirect
 94 |         - 400: The server could not understand the request due to invalid syntax.
 95 |         - 401: unauthenticated
 96 |         - 403: Permission denied
 97 |         - 404: The server can not find the requested resource (URL not recognized)
 98 |         - 500: Unhandled error on server
 99 |         - 502: Server got an invalid response
100 | 
101 | 
102 | Reference:
103 | 
104 | - [Network Layers & Network Layer in OSI Model](https://www.router-switch.com/faq/network-layers-in-osi-model-features-of-osi.html)
105 | - [Application Layer (Internet protocol Suite) ~ Networking Space](http://walkwidnetwork.blogspot.com/2013/04/application-layer-internet-protocol.html)
106 | - [The Internet protocol suite (article) \| Khan Academy](https://www.khanacademy.org/computing/computers-and-internet/xcae6f4a7ff015e7d:the-internet/xcae6f4a7ff015e7d:the-internet-protocol-suite/a/the-internet-protocols)
107 | - [QUIC - Wikipedia](https://en.wikipedia.org/wiki/QUIC)
108 | - [An overview of HTTP - HTTP \| MDN](https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview)
109 | - [What Is a Socket? (The Java™ Tutorials > Custom Networking > All About Sockets)](https://docs.oracle.com/javase/tutorial/networking/sockets/definition.html)
110 | - [WebSockets - A Conceptual Deep Dive \| Ably Realtime](https://ably.com/topic/websockets)
111 | - [How Do Websockets Work? - Kevin Sookocheff](https://sookocheff.com/post/networking/how-do-websockets-work/)
112 | - [Short Polling vs Long Polling vs WebSockets - System Design](https://www.youtube.com/watch?v=ZBM28ZPlin8&ab_channel=BeABetterDev)
113 | - [Representational state transfer - Wikipedia](https://en.wikipedia.org/wiki/Representational_state_transfer)
114 | - [REST Principles and Architectural Constraints](https://restfulapi.net/rest-architectural-constraints/)
115 | 


--------------------------------------------------------------------------------
/SystemDesign/load_balancer.md:
--------------------------------------------------------------------------------
 1 | # Load Balancer
 2 | 
 3 | - A load balancer distributes incoming client requests among a group of servers, in each case returning the response from the selected server to the appropriate client.
 4 | - A load balancer can also enhance the user experience by reducing the number of error responses the client sees.
 5 | - Session persistence (sending all requests from a particular client to the same server) is also available for some load balancers
 6 | 
 7 | ## Types of Load Balancers: Layer 4 and Layer 7
 8 | 
 9 | Layer 4 load balancing:
10 | 
11 | - “Layer 4 load balancing” most commonly refers to a deployment where the load balancer’s IP address is the one advertised to clients for a web site or service (via DNS, for example)
12 | - Layer 4 load balancing operates at the intermediate transport layer, which deals with delivery of messages with no regard to the content of the messages.
13 | - When it receives a request and makes the load balancing decision, it also performs Network Address Translation (NAT) on the request packet, changing the recorded destination IP address from its own to that of the content server it has chosen on the internal network
14 | - Before forwarding server responses to clients, the load balancer changes the source address recorded in the packet header from the server’s IP address to its own.
15 | - Layer 4 load balancing was a popular architectural approach to traffic handling when commodity hardware was not as powerful as it is now, and the interaction between clients and application servers was much less complex.
16 | 
17 | Layer 7 load balancing:
18 | 
19 | - Layer 7 load balancing operates at the high‑level application layer, which deals with the actual content of each message. 
20 | - HTTP is the predominant Layer 7 protocol for website traffic on the Internet.
21 | - It terminates the network traffic and reads the message within. 
22 | - It can make a load‑balancing decision based on the content of the message (the URL or cookie, for example). 
23 | - It then makes a new TCP connection to the selected upstream server (or reuses an existing one, by means of HTTP keepalives) and writes the request to the server.
24 | - Layer 7 load balancing is more CPU‑intensive than packet‑based Layer 4 load balancing
25 | 
26 | ## Load Balancing Algorithms
27 | 
28 | - Least connection
29 |     - Selects the server with least active connection
30 | - Weighted Least Connection
31 |     - Similar to least connection but with weight
32 | - Least response time
33 |     - Selects the server with least response time
34 | - Weighted Least response time
35 |     - Similar to least response time but with weight
36 | - Least bandwidth
37 |     -  Selects the server with least bandwidth
38 | - Round robin
39 |     - Checks each server one by one
40 | - Weighted round-robin
41 |     - Checks each server one by one based one the weight that admin assigns on the server
42 | - IP hash
43 |     - combines source and destination IP addresses of the client and server to generate a unique hash key, which is used to allocate the client to a particular server.
44 |     - the client request is directed to the same server it was using previously.
45 | 
46 | # Reverse Proxy
47 | 
48 | - A reverse proxy accepts a request from a client, forwards it to a server that can fulfill it, and returns the server’s response to the client.
49 | - Increased security – No information about your backend servers is visible outside your internal network
50 | - Has security features to help protect backend servers from distributed denial-of-service (DDoS) attacks (i.e. IP address blacklisting, etc)
51 | - Increased scalability and flexibility – Because clients see only the reverse proxy’s IP address, you are free to change the configuration of your backend infrastructure.
52 | - Increased performance on response time
53 |     - Compression (of server responses to reduce bandwidth)
54 |     - SSL termination (Encryption)
55 |     - Caching (server response for same requests)
56 | 
57 | # API Gateway
58 | 
59 | - An API Gateway is the element that coordinates and orchestrates how all the requests are processed in a Microservices architecture
60 | - An API Gateway includes an HTTP server where routes are associated with a Microservice or with a FaaS function
61 | - When an API Gateway receives a request, it looks up for the Microservice which can serve the request and delivers it to the relevant part.
62 | - Besides this pure routing task, an API gateway can also be the part that performs **authentication**, **input validation**, **load balancing** and **centralized middleware** functionality, among other tasks.
63 | - It often makes sense to deploy a reverse proxy even with just one web server or application server.
64 | - Drawbacks for API gateway are:
65 |     - It creates a tight coupling between the client and the backend.
66 |     - It has limited choice of communication protocols for services.
67 |     - It could becomes a bottleneck for your application
68 | 
69 | ## An example: The Architecture of Uber’s API gateway
70 | 
71 | Components in a request lifecycle:
72 | 
73 | 1. **Protocol manager**: provides the ability to implement APIs that can ingest any type of relevant protocol payload
74 | 2. **Middleware**: implements composable logic before the endpoint handler is invoked
75 |     - Middleware implements cross-cutting concerns, such as authentication, authorization, rate limiting, circuit breaking, etc.
76 | 3. **Endpoint handler**: responsible for request validation, payload transformation, and converting the endpoint request object to the client request object.
77 | 4. **Client**: performs a request to a back-end service
78 | 
79 | 
80 | # Reference:
81 | 
82 | - [System Design: What is Load Balancing? - YouTube](https://www.youtube.com/watch?v=gMIslJN44P0&ab_channel=BeABetterDev)
83 | - [System Design — Load Balancing. Concepts about load balancers and… \| by Larry | Peng Yang | Computer Science Fundamentals | Medium](https://medium.com/must-know-computer-science/system-design-load-balancing-1c2e7675fc27)
84 | - [What Is Layer 4 Load Balancing? \| NGINX Load Balancer](https://www.nginx.com/resources/glossary/layer-4-load-balancing/)
85 | - [Benefits of Layer 7 Load Balancing \| NGINX Load Balancer](https://www.nginx.com/resources/glossary/layer-7-load-balancing/)
86 | - [Load Balancing Algorithms, Types and Techniques](https://kemptechnologies.com/load-balancer/load-balancing-algorithms-techniques/)
87 | - [What is a Proxy? \| System Design - YouTube](https://www.youtube.com/watch?v=xiUmXVcLdCw&ab_channel=BeABetterDev)
88 | - [What is a Reverse Proxy vs. Load Balancer? - NGINX](https://www.nginx.com/resources/glossary/reverse-proxy-vs-load-balancer/)
89 | - [Stupid question of the day: What is an API Gateway and what it has to do with a Serverless model? \| by Gabry Martinez | Medium](https://gabrymartinez.medium.com/stupid-question-of-the-day-what-is-an-api-gateway-and-what-it-has-to-do-with-a-serverless-model-2acee3e3eeba)
90 | - [What is API Gateway?. In microservices architecture, there… \| by Vivek Kumar Singh | System Design Blog | Medium](https://medium.com/system-design-blog/what-is-api-gateway-68a11d4ab322)
91 | - [The Architecture of Uber's API gateway \| Uber Engineering Blog](https://eng.uber.com/architecture-api-gateway/)
92 | 


--------------------------------------------------------------------------------
/SystemDesign/navigate_url.md:
--------------------------------------------------------------------------------
 1 | # What really happens when you enter a url
 2 | 
 3 | 1. enter url in browser
 4 | 2. browser looks up IP address for domain name
 5 |     - DNS look up process:
 6 |         - **Browser cache**: browser caches DNS record
 7 |         - **OS cache**: if browser cache doesn't have the desired record, OS has its own cache
 8 |         - **Router cache**: request goes to router, has its own DNS cache
 9 |         - **ISP DNS cache**: next check step is the cache ISP’s DNS server
10 |         - **Recursive search**: ISP's DNS server makes recursive search
11 |     - DNS search bottleneck solutions:
12 |         - **Round-robin DNS**: DNS lookup returns multiple IP addresses, rather than just one.
13 |         - **Load-balancer**: the piece of hardware that listens on a particular IP address and forwards the requests to other servers
14 |         - **Geographic DNS**: mapping a domain name to different IP addresses, depending on the client’s geographic location.
15 |         - **Anycast**: a routing technique where a single IP address maps to multiple physical servers. However, it doesn't `fit well with TCP`.
16 | 3. Browser sends HTTP request to the web server, use `GET`
17 | 4. The web server (i.e. facebook) responds with a permanent redirect
18 |     - The redirect may due to search engine
19 | 5. Browser follows redirect
20 | 6. Server handles the request
21 | 7. Server sends back HTML response
22 | 8. Browser begins rendering the HTML
23 | 9. Browser sends request for HTML embedded objects (i.e. images, styles, etc)
24 | 10. Browser sends further asynchronous (AJAX) request
25 | 
26 | ### Reference
27 | 
28 | - [What really happens when you navigate to a URL](http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/comment-page-3/)
29 | 


--------------------------------------------------------------------------------
/SystemDesign/nosql_db.md:
--------------------------------------------------------------------------------
 1 | # NoSQL databases
 2 | 
 3 | ## NoSQL database types
 4 | 
 5 | - Column-oriented database: Cassandra, HBase, Hypertable, BigTable
 6 | - Key-value stores: Redis, Voldemort, Riak, and Amazon’s Dynamo
 7 | - Document stores: MongoDB and CouchDB
 8 | - Graph database: Neo4j 
 9 | 
10 | 
11 | ## Cassandra
12 | 
13 | - Cassandra is de-centralized, which means all nodes are the same (its called **server-symmetry**). There is no leader-follower structure, all the nodes follows P2P (peer to peer) gossip protocol, so its **highly available**
14 | - Cassandra is **highly scalable**, new node will automatically be discovered and there is no reboot needed
15 | 
16 | ## Cassandra vs MongoDB
17 | 
18 | | Difference | Cassandra | MongoDB |
19 | | :--------: | :-------: | :-----: |
20 | | DB structure | Unstructured data | JSON-like documents |
21 | | Index | Primary Key, but no secondary indexes supported | Index, if no index then each file is searched which could slow down read times |
22 | | Query | CQL (like SQL) | More like programming |
23 | | Replication | leader-leader | leader-follower with auto-election |
24 | 
25 | 
26 | 
27 | ## Reference
28 | 
29 | - [NoSQL Database Types - DZone Database](https://dzone.com/articles/nosql-database-types-1)
30 | - [A Comprehensive Guide to Cassandra Architecture](https://www.instaclustr.com/cassandra-architecture/)
31 | - [Cassandra vs MongoDB in 2018](https://blog.panoply.io/cassandra-vs-mongodb)
32 | - [Cassandra vs. MongoDB vs. Hbase: A Comparison of NoSQL Databases \| Logz.io](https://logz.io/blog/nosql-database-comparison/)
33 | - [Introduction to Amazon DynamoDB for Cassandra developers \| AWS Database Blog](https://aws.amazon.com/blogs/database/introduction-to-amazon-dynamodb-for-cassandra-developers/)
34 | - [Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database](https://www.youtube.com/watch?v=yvBR71D0nAQ&ab_channel=AmazonWebServices)
35 | - [Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB](https://www.youtube.com/watch?v=HaEPXoXVf2k&ab_channel=AmazonWebServices)
36 | - [HBase Tutorial for Beginners: Learn in 3 Days!](https://www.guru99.com/hbase-tutorials.html)
37 | 


--------------------------------------------------------------------------------
/SystemDesign/replication_partition.md:
--------------------------------------------------------------------------------
  1 | # Replication & Partition
  2 | 
  3 | ## Replication
  4 | 
  5 | ### Leader-follower (single-leader)
  6 | 
  7 | - Writes are only accepted on the leader, followers are read only
  8 | - Synchronous Replication: the leader waits until follower has confirmed that it received the write before reporting success to the user, and before making the write visible to other clients
  9 | - Asynchronous Replication: the leader sends the message, but doesn’t wait for a response from the follower.
 10 | - Setting Up New Followers: 
 11 |     - Take a consistent snapshot of the leader’s database at some point in time
 12 |     - Copy the snapshot to the new follower node
 13 |     - The follower connects to the leader and requests all the data changes that have happened since the snapshot was taken (according to leader’s replication log)
 14 | - Handling Node Outages
 15 |     - Follower failure: Catch-up recovery
 16 |         - Similar to setting up new followers, each follower keeps a log of the data changes it has received from the leader, so we know where it was stopped
 17 |     - Leader failure: Failover - one of the followers needs to be promoted to be the new leader manually or automatically
 18 |         - Determining that the leader has failed (i.e. timeout: if a node doesn’t respond for some period of time).
 19 |         - Choosing a new leader: the best candidate is usually the replica with the most up-to-date data changes from the old leader
 20 |         - Reconfiguring the system to use the new leader
 21 | - Implementation of Replication Logs
 22 |     - Statement-based replication: leader logs every write request (statement) that it executes and sends that statement log to its followers
 23 |         - Random, time, custom functions may go wrong 
 24 |     - Write-ahead log (WAL) shipping: every write is appended to a log
 25 |         - Replication is closely coupled to the storage engine (of the logs)
 26 |     - Logical (row-based) log replication: use different log formats for replication and for the storage engine
 27 |     - Trigger-based replication: only replicate a subset of the data
 28 | - **Eventual consistency**: if an application reads from an asynchronous follower, it may see outdated information if the follower has fallen behind, but the followers will eventually catch up and become consistent with the leader
 29 | 
 30 | ### Multi-leader
 31 | 
 32 | - Multi-leader replication
 33 |     - Circular topology
 34 |     - Star topology
 35 |     - All-to-all topology
 36 | - Write conflict resolve
 37 |     - make sure all writes for a particular record go through the same leader
 38 |     - Unique-write ID: last write win (LWW)
 39 |     - Unique-replica ID: writes with higher numbered replica wins
 40 |     - Somehow merge conflicts
 41 |     - Record and solve/report later
 42 |     - Custom conflict resolution:
 43 |         - On-write: db detects conflict in the log and calls the conflict handler
 44 |         - On-read: all conflicting writes are stored, next time data is read all versions are returned to the application for user to solve or automatically resolve, and write the result back to db.
 45 |     - Automatic conflict resolve
 46 |         - Conflict free replicated datatypes (CRDT)
 47 |         - Mergeable persistent data structures
 48 |         - Operational transformation (google doc)
 49 | 
 50 | 
 51 | ### Leaderless
 52 | 
 53 | - How to catch up when node comes back: user versions
 54 |     - Read repair: repair node when clients read based on version number
 55 |     - Anti-entropy process: background process constantly look for difference
 56 | - **Quorum Consistency**
 57 |     - For ***N*** nodes, its considered as successful if
 58 |     - each write has ***W*** writes confirmed successful
 59 |     - there is at lease ***R*** nodes for read
 60 |     - such that ***W + R > N***
 61 | - **Sloppy quorum**: writes and reads still require *w* and *r* successful responses, but those may include nodes that are among the designated *n*  nodes for a value
 62 | - **Hinted handoff**: Once the network interruption is fixed, any writes that one node temporarily accepted on behalf of another node are sent to the appropriate “home” nodes
 63 | 
 64 | 
 65 | ## Partition (sharding): query throughput can be scaled by adding more nodes
 66 | 
 67 | - For key-value store
 68 |     - Partition by key range (i.e. alphabetical) and keep sorted in each partition
 69 |         - may result in skew or hot spot
 70 |             - Solution: Could use a combined key
 71 |             - For example, you could prefix each timestamp with the sensor name so that the partitioning is first by name and then by time
 72 |     - Partition by hash of key (uniformly distributed)
 73 |         - Looses the ability to do efficient range queries
 74 |             - Solution: **compound primary key** consisting of several columns
 75 |             - For example, only the first part of that key is hashed to determine the partition, the other columns are used as a concatenated index for sorting the data
 76 | 
 77 | - Handle skew and hot spots: responsibility of the application
 78 |     - if one key is known to be very hot, a simple technique is to add a random number to the beginning or end of the key
 79 |         - it only makes sense to append the random number for the small number of hot keys
 80 |         - need some way of keeping track of which keys are being split
 81 | 
 82 | - Partitioning and Secondary Indexes
 83 |     - Partitioning Secondary Indexes by Document (***local index***)
 84 |         - each partition is completely separate: each partition maintains its own secondary indexes, covering only the documents in that partition
 85 |         - doesn’t care what data is stored in other partitions
 86 |         - **scatter/gather** problem: read queries on secondary indexes on partitioned database needs to query to all partitions
 87 |     - Partitioning Secondary Indexes by Term (***global index***)
 88 |         - a global index that covers data in all partitions, and the global index must also be partitioned but can be different from the primary key index
 89 |         - We call this kind of index term-partitioned, because the term we’re looking for determines the partition of the index
 90 |         - Reads faster but writes slower and complex
 91 |         - updates to global secondary indexes are often asynchronous
 92 | 
 93 | - Strategies for rebalancing partitions
 94 |     - Do NOT use Hash mod N since it can result in different result
 95 |     - Use fixed number of partition, create many more partitions than nodes and assign several partitions to each node
 96 |     - Dynamic partitioning, auto-partition when 
 97 |         - existing partition reach a configured size
 98 |         - the number of partitions is proportional to the size of the dataset
 99 |         - the number of partitions proportional to the number of nodes
100 | 
101 | - Request Routing among partitions and nodes
102 |     - Coordination service (keep track of cluster metadata) to route request (i.e. zookeeper)
103 |     - **gossip protocol**: Requests can be sent to any node, and that node forwards them to the appropriate node for the requested partition
104 | 
105 | 
106 | Reference:
107 | 
108 | - [Designing Data-Intensive Applications](https://www.amazon.ca/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321/ref=sr_1_1?dchild=1&gclid=Cj0KCQjw_8mHBhClARIsABfFgpg5q5IQvE2s5OBULx6LQFDETV41haS67EE3JAfvobPADJUJHN7dUbsaAjjrEALw_wcB&hvadid=285888202784&hvdev=c&hvlocphy=9001327&hvnetw=g&hvqmt=e&hvrand=12070511852976413586&hvtargid=kwd-407664346480&hydadcr=16109_9598899&keywords=design+data+intensive+application&qid=1626587742&sr=8-1)
109 | 


--------------------------------------------------------------------------------
/SystemDesign/scale_web_app.md:
--------------------------------------------------------------------------------
 1 | # Web app scale from monolithic to distributed 
 2 | 
 3 | ## 1. Single Server + Database
 4 | 
 5 | ![01-initial-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/01-initial-700.png)
 6 | 
 7 | MVC Design:
 8 | - Advantage:
 9 |     - Easy to debug
10 |     - Easy to deploy
11 | 
12 | - Disadvantage:
13 |     - very difficult to scale. Each request is handled by the server and one page can have many requests which will consume lots of server resources
14 |     - whenever there is a server issue, entire site goes down
15 |     - deployment goes with both front end or backend end
16 | 
17 | ## 2. Adding a Reverse Proxy
18 | 
19 | ![02-reverse-proxy-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/02-reverse-proxy-700.png)
20 | 
21 | - **Health Checks** make sure that our actual server is still up and running
22 | - **Routing** forwards a request to the right endpoint
23 | - **Authentication** makes sure that a user is actually permitted to access the server
24 | - **Firewalling** ensure that users only have access to the parts of our network they are allowed to use ... and more
25 | 
26 | ## 3. Add a Load Balancer with multiple servers
27 | 
28 | ![03-load-balancer-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/03-load-balancer-700.png)
29 | 
30 | - Reverse Proxy can act as load balancer
31 | 
32 | ## 4. Add more database to each server
33 | 
34 | ![04-database-scale-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/04-database-scale-700.png)
35 | 
36 | How to ensure data consistency:
37 | - **Master/Slave setup or Write with Read-replicas**: split into multiple parts where each part does its own thing
38 | 
39 | ## 5. Microservices
40 | 
41 | ![05-microservices-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/05-microservices-700.png)
42 | 
43 | - each service can be scaled individually, enabling us to better adjust to demand
44 | - development teams can work independently, each being responsible for their own microservice's lifecycle (creation, deployment, updating etc.)
45 | - each microservice can use its own resources
46 | 
47 | ## 6. Caching & CDN(Content Delivery Networks)
48 | 
49 | ![06-cdn-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/06-cdn-700.png)
50 | 
51 | - cache the static content
52 | 
53 | ## 7. Message Queues
54 | 
55 | ![07-message-queue-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/07-message-queue-700.png)
56 | 
57 | Advantages:
58 | - it decouples tasks and processors. Sometimes a lot of images need to be processed, sometimes only a few. Sometimes a lot of processors are available, sometimes its just a couple. By simply adding tasks to a backlog rather than processing them directly we ensure that our system stays responsive and no tasks get lost.
59 | - it allows us to scale on demand. Starting up more processors takes time - so by the time a lot of users tries to upload images, it's already too late. By adding our tasks to a queue we can defer the need to provision additional capacities to process them
60 | 
61 | ## 8. Sharding
62 | 
63 | ![08-sharding-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/08-sharding-700.png)
64 | 
65 | - Sharding can be based on any number of factors, e.g. letters, location, usage frequency (power-users are routed to the good hardware) and so on. 
66 | - You can shard servers, databases or almost any aspect of your stack this way, depending on your needs.
67 | 
68 | ## 9. Load-balancing the load-balancer
69 | 
70 | ![09-dns-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/09-dns-700.png)
71 | 
72 | - DNS (Domain Name System): allows us to specify multiple IPs per domain name, each leading to a different load balancer.
73 | 
74 | ## Reference:
75 | 
76 | - [Scaling webapps for newbs & non-techies](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/)
77 | 
78 | ## More resources
79 | 
80 | - [Scaling Up to Your First 10 Million Users](https://www.youtube.com/watch?v=Ma3xWDXTxRg&ab_channel=AmazonWebServices)
81 | - [Web Scalability for Startup Engineers](https://www.amazon.ca/Scalability-Startup-Engineers-Artur-Ejsmont/dp/0071843655)
82 | - [A Beginner's Guide to Scaling to 11 Million+ Users on Amazon's AWS - High Scalability -](http://highscalability.com/blog/2016/1/11/a-beginners-guide-to-scaling-to-11-million-users-on-amazons.html)
83 | 


--------------------------------------------------------------------------------
/SystemDesign/storage_system.md:
--------------------------------------------------------------------------------
  1 | # File Storage, Block Storage, Object Storage, and HDFS
  2 | 
  3 | ## File storage
  4 | 
  5 | - Data is stored as a single piece of information inside a folder (hierarchical directories structure)
  6 |     - Just like you’d organize pieces of paper inside a manila folder
  7 | - When you need to access that piece of data, your computer needs to know the path to find it (i.e. `/home/images/beach.jpeg`)
  8 | 
  9 | Pros:
 10 | 
 11 | - oldest and most widely used data storage system for direct and network-attached storage (NAS) systems
 12 | - has broad capabilities and can store just about anything
 13 | - great for storing an array of complex files and is fairly fast for users to navigate
 14 | 
 15 | Cons:
 16 | 
 17 | - File-based storage systems must scale out by adding more systems, rather than scale up by adding more capacity.
 18 | 
 19 | ## Block storage
 20 | 
 21 | - Data is chopped into blocks, each block of data is given a unique identifier, which allows a storage system to place the smaller pieces of data wherever is most convenient.
 22 | - Some data can be stored in a Linux environment and some can be stored in a Windows unit.
 23 | - When data is requested, the underlying storage software reassembles the blocks of data from these environments and presents them back to the user. 
 24 | 
 25 | Pros:
 26 | 
 27 | - doesn’t rely on a single path to data, so its fast
 28 | - gives the user complete freedom to configure their data
 29 | - easy to use and manage, efficient and reliable
 30 | - the more data you need to store, the better off you’ll be with block storage.
 31 | 
 32 | Cons:
 33 | 
 34 | - Can be expensive
 35 | - limited capability to handle metadata
 36 | 
 37 | ## Object storage
 38 | 
 39 | - A flat structure in which files are broken into pieces and spread out among hardware.3
 40 | - Data is broken into discrete units called objects and is kept in a single repository, instead of being kept as files in folders or as blocks on servers.
 41 | - Object storage volumes work as modular units:
 42 |     - each is a self-contained repository that owns the data and the metadata that describes the data
 43 |     - each has a unique identifier that allows the object to be found over a distributed system
 44 | - To retrieve the data, the storage operating system uses the metadata and identifiers
 45 | - Can be extremely detailed (i.e. video, photo meta data, etc)
 46 | - Object storage requires a simple HTTP API
 47 | 
 48 | Pros:
 49 | 
 50 | - Cost efficient, pay what you use
 51 | - well suited for static data
 52 | - can scale to extremely large quantities of data
 53 | - good at storing unstructured data.
 54 | 
 55 | Cons:
 56 | 
 57 | - Objects can’t be modified
 58 | - doesn’t work well with traditional databases, because writing objects is a slow process and writing an app to use an object storage API isn’t as simple as using file storage
 59 | 
 60 | ## Hadoop Distributed File System (HDFS)
 61 | 
 62 | - A distributed file system designed to run on commodity hardware.
 63 | - It stores each file as a sequence of blocks
 64 |     - all blocks in a file except the last block are the same size. 
 65 |     - The blocks of a file are replicated for fault tolerance (HDFS requires Block Storage)
 66 |     - The block size and replication factor are configurable per file.
 67 | - Files in HDFS are write-once and have strictly one writer at any time.
 68 | 
 69 | Pros:
 70 | 
 71 | - highly fault-tolerant and is designed to be deployed on low-cost hardware
 72 | - provides high throughput access to application data and is suitable for applications that have large data sets
 73 | - Enables streaming access to file system data
 74 | 
 75 | Cons:
 76 | 
 77 | - Problems with small files
 78 | - the data read or write done from the disk which makes it difficult to perform in-memory calculation and lead to processing overhead or High up processing.
 79 | - Supports Only Batch Processing
 80 | 
 81 | ### Map Reduce
 82 | 
 83 | - MapReduce is a programming model or pattern within the Hadoop framework that is used to access big data stored in the HDFS.
 84 | - It is a core component, integral to the functioning of the Hadoop framework.
 85 | - With MapReduce, rather than sending data to where the application or logic resides, the logic is executed on the server where the data already resides, to expedite processing.
 86 | 
 87 | How does it work:
 88 | 
 89 | - Essentially there are two functions: **Map** and **Reduce**
 90 | - The **Map** function takes input from the disk as `<key,value>` pairs, processes them, and produces another set of intermediate `<key,value>` pairs as output.
 91 | - The **Reduce** function also takes inputs as `<key,value>` pairs, and produces `<key,value>` pairs as output.
 92 | - **Combine** is an optional process. 
 93 |     - The combiner is a reducer that runs individually on each mapper server. 
 94 |     - It reduces the data on each mapper further to a simplified form before passing it downstream.
 95 | - **Partition** is the process that translates the `<key, value>` pairs resulting from mappers to another set of `<key, value>` pairs to feed into the reducer. 
 96 |     - It decides how the data has to be presented to the reducer and also assigns it to a particular reducer.
 97 | 
 98 | ## Database vs Storage Systems
 99 | 
100 | Conclusion: you shouldn't use database for file storage, mainly because of performance
101 | 
102 | - [Database vs File system storage - Stack Overflow](https://stackoverflow.com/questions/38120895/database-vs-file-system-storage)
103 | - [Is it a bad practice to store large files (10 MB) in a database? - Software Engineering Stack Exchange](https://softwareengineering.stackexchange.com/questions/150669/is-it-a-bad-practice-to-store-large-files-10-mb-in-a-database)
104 | 
105 | 
106 | ## Reference:
107 | 
108 | - [File storage, block storage, or object storage?](https://www.redhat.com/en/topics/data-storage/file-block-object-storage)
109 | - [System Design — Storage. Storage concepts and considerations in… \| by Larry | Peng Yang | Computer Science Fundamentals | Medium](https://medium.com/must-know-computer-science/system-desing-storage-d8ef4a8d952c)
110 | - [Hadoop - Pros and Cons - GeeksforGeeks](https://www.geeksforgeeks.org/hadoop-pros-and-cons/)
111 | - [Hadoop In 5 Minutes \| What Is Hadoop? | Introduction To Hadoop | Hadoop Explained |Simplilearn - YouTube](https://www.youtube.com/watch?v=aReuLtY0YMI&ab_channel=Simplilearn)
112 | - [MapReduce 101: What It Is & How to Get Started - Talend](https://www.talend.com/resources/what-is-mapreduce/)
113 | 


--------------------------------------------------------------------------------
/SystemDesign/transaction_isolation.md:
--------------------------------------------------------------------------------
  1 | # Transaction & Isolation
  2 | 
  3 | ## Transaction
  4 | 
  5 | - Transaction
  6 |     - A transaction is a way for an application to group several reads and writes together into a logical unit.
  7 |     - all the reads and writes in a transaction are executed as one operation: either the entire  transaction succeeds (commit) or it fails (abort, rollback).
  8 | 
  9 | ### ACID (Atomicity, Consistency, Isolation, and Durability)
 10 | 
 11 | - Atomicity
 12 |     - describes what happens if a client wants to make several writes, but a fault occurs after some of the writes have been processed (gives all-or-nothing guarantee)
 13 |         - if the writes are grouped together into an atomic transaction, and the transaction cannot be completed (committed) due to a fault, then the transaction is aborted and the database must discard or undo any writes it has made so far in that transaction.
 14 |     - Without atomicity, if an error occurs, it’s difficult to know which changes have taken effect and which haven’t.
 15 | 
 16 | - Consistency
 17 |     - the idea is that you have certain statements about your data (invariants) that must always be true
 18 |     - However, this idea of consistency depends on the application’s notion of invariants (This is not something that the database can guarantee)
 19 | 
 20 | - Isolation
 21 |     - Most databases are accessed by several clients at the same time, isolation is used to handle concurrency problems (race conditions)
 22 |     - it means that concurrently executing transactions are isolated from each other: they cannot step on each other’s toes
 23 |     - In theory textbooks formalize isolation as serializability, which mean transactions happen one by one
 24 |     - In practice, serializable isolation is rarely used, because it carries a performance
 25 | penalty
 26 | 
 27 | - Durability
 28 |     - ensures that once a transaction has committed successfully, any data it has written will not be  forgotten (regardless of failures and crashes).
 29 |     - In a single-node database, durability typically means that the data has been written to nonvolatile storage such as a hard drive or SSD.
 30 |         - It usually has a write-ahead log (or similar) for recovery
 31 |     - In a replicated database, durability may mean that the data has been successfully copied to some number of nodes.
 32 |         - a database must wait until these writes or replications are complete before reporting a transaction as successfully committed
 33 |     - Replication durability cases:
 34 |         - if write to disk and machine dies, data is inaccessible until you fix it, but replicated system remain available
 35 |         - If a node crashes on a particular input, all replicas can be down, in memory data will be lost
 36 |         - In an asynchronously replicated system, recent writes may be lost when the leader becomes unavailable
 37 |         - Hardware disks may not be reliable (SSD due to power cut, firmware issues, etc)
 38 |         - File maybe corrupted after a crash due to software (file system, storage engine, etc) bugs
 39 | 
 40 | ## Isolation
 41 | 
 42 | ### Weak Isolation
 43 | 
 44 | - **Read Committed**: 
 45 |     - Two guarantees
 46 |         1. No dirty reads: When reading from the database, you will only see data that has been committed.
 47 |             - dirty reads: read uncommitted data from other transaction
 48 |         2. No dirty writes: When writing to the database, you will only overwrite data that has been committed.
 49 |             - dirty writes: later write overwrites an (earlier) uncommitted value
 50 |     - Implementation
 51 |         - Row level locks (to prevent dirty write): 
 52 |             - acquire a lock on the project (when updating), hold lock until transaction is committed or aborted. 
 53 |             - Only one transaction can hold a lock for any given object, others must wait
 54 |         - To prevent dirty read:
 55 |             - only see the old value (not the new value that is being committing to database)
 56 |             - Only when the new value is committed do transactions switch over to reading the new value.
 57 |         - Typically read committed uses a separate snapshot for each query
 58 | 
 59 | - **read skew (nonrepeatable read)**
 60 |     - when two transactions are processing into different database, one can read the database in an inconsistent state. But once the transaction is complete, the values will be consistent
 61 |     - Read skew is considered acceptable under read committed isolation
 62 |     - this may be a problem for database backups or Analytic queries and integrity checks
 63 |     
 64 | - **Snapshot isolation** is the most common solution to read skew
 65 |     - Transaction sees all the data that was committed in the database at the start of transaction
 66 |     - each transaction sees only the old data from that particular point in time
 67 |     - Implementation: ***multiversion concurrency control (MVCC)***
 68 |         - From a performance point of view, a key principle of snapshot isolation is readers never block writers, and writers never block readers
 69 |         - database must potentially keep several different committed versions of an object
 70 |         - Typically snapshot isolation uses the same snapshot for an entire transaction
 71 |     - readers never block writers, and writers never block readers
 72 | 
 73 | - MVCC implementation (for postgres)
 74 |     - When a transaction is started, it is given a unique, always-increasing transaction ID (*txid*).
 75 |     - Whenever a transaction writes anything to the database, the data it writes is tagged with the transaction ID of the writer.
 76 |     - Each row in a table has a *created_by* field, containing the ID of the transaction that inserted this row into the table
 77 |     - each row has a *deleted_by* field (initially empty)
 78 |     - If a transaction deletes a row, row is soft deleted by marking the *deleted_by* field to the ID of the transaction that requested the deletion (row is not deleted from database)
 79 |     - when it is certain that no transaction can any longer access the deleted data, a garbage collection process in the database removes the soft deleted rows
 80 |     - An update is internally translated into a delete and a create.
 81 |     - transaction IDs are used to decide which objects it can see and which are invisible for reads. Visibility rules for both creation and deletion:
 82 |         - At the time when the reader’s transaction started, the transaction that created the object had already committed
 83 |         - The object is not marked for deletion, or if it is, the transaction that requested deletion had not yet committed at the time when the reader’s transaction started.
 84 |     - By never updating values in place but instead creating a new version every time a value is changed, the database can provide a consistent snapshot while incurring only a small overhead.
 85 |     - Database index point to all versions of an object and filter out those invisible ones
 86 | 
 87 | - To prevent **Lost update** (in a read-modify-write cycle): when two transactions do this concurrently, one of the modifications can be lost
 88 |     - Atomic update operations: taking an exclusive lock on the object when it is read so that no other transaction can read it until the update has been applied (***cursor stability***).
 89 |     - Explicit locking: explicitly lock objects that are going to be updated
 90 |     - Automatically detecting lost updates: allow transactions to execute in parallel, abort transaction and force it to retry if lost updated is detected
 91 |     - Compare-and-set: allow an update to happen only if the value has not changed since you last read it. If it changes, force retry.
 92 |     - For replicated databases: allow concurrent writes to create several conflicting versions of a value (also known as siblings), and to use application code or special data structures to resolve and merge these versions after the fact.
 93 | 
 94 | - **Write Skew**: if two transactions read the same objects, and then update some of those objects (different transactions may update different objects), then dirty write or lost update may happen
 95 |     - Solution: explicitly lock the rows that the transaction depends on 
 96 |         - **Lock the row for update**
 97 |         - **Unique constraint for create**
 98 | 
 99 | - **phantom**: a write in one transaction changes the result of a search query in another transaction (i.e. objects that do not yet exist in the database, but which might be added in the future)
100 |     - A serializable isolation level is much preferable in most cases
101 |     - **materializing conflicts** (last resort if no alternative is possible): it takes a phantom and turns it into a lock conflict on a concrete set of rows that exist in the database (i.e. pre-create all the rows for different combinations in database, and new inquires becomes checking the existing rows rather than create them)
102 | 
103 | 
104 | ### Serializable isolation (strongest isolation level)
105 | 
106 | - Literally executing transactions in a serial order
107 |     - single-threaded execution
108 |     - **stored procedure**: the application submit the entire transaction code to the database ahead of time, and executing all transactions on a single thread (in-memory)
109 | 
110 | - **Two-phase locking (2PL)** 
111 |     - ***pessimistic*** concurrency control mechanism: if anything may go wrong, it's better to wait until its safe to do anything (similar to *mutual exclusion*)
112 |     - Concurrent transactions are allowed to read the same object when nobody is writing to it. 
113 |     - When a write happens (update/delete): 
114 |         - the second transaction (can be read or write) must wait until first transaction (can be read or write) commits or aborts
115 |         - Reading an old version is forbidden
116 |         - writers don’t just block other writers; they also block readers and vice versa
117 |     - Implementation: lock (in shared mode or exclusive mode) on each object in the database 
118 |         - Read: acquire the lock in shared mode (allow several transactions to hold it), but transaction must wait for an exclusive lock
119 |         - Write: acquire the lock in exclusive mode (only 1 transaction can hold it, others must wait)
120 |         - First read then write: upgrade its shared lock to an exclusive lock
121 |     - First phase: acquire lock; second phase: release lock.
122 |     - Performance is poor, deadlock may happen
123 | 
124 |     - **Predicate lock**: belongs to all objects that match some search condition
125 |         - Read for some condition: acquire a shared-mode predicate lock on the conditions of the query, if another transaction has an exclusive lock on the objects matching the conditions, then the current transaction must wait
126 |         - Write: first check whether either the old or the new value matches any existing predicate lock, if there is matching the current transaction must wait
127 | 
128 |     - **Index-range lock (next-key locking)**: 
129 |         - Similar to predicate locks, but is based on indices of the rows for the search condition
130 |         - Better performance but may lock a bigger range of objects
131 |         - If there is no suitable index where a range lock can be attached, the database can fall back to a shared lock on the entire table
132 | 
133 | - Optimistic concurrency control techniques such as **Serializable Snapshot Isolation (SSI)**
134 |     - ***optimistic*** concurrency control mechanism: instead of blocking if something may go wrong, transactions continue anyway 
135 |     - all reads within a transaction are made from a consistent snapshot of the database
136 |     - when a transaction wants to commit, database checks whether isolation was violated (a query result might have changed), and if so the transaction is aborted and has to be retried
137 |     - By avoiding unnecessary aborts, SSI preserves snapshot isolation’s support for long-running reads from a consistent snapshot.
138 |     - How database checks whether isolation was violated
139 |         - Detecting stale MVCC reads (uncommitted write occurred before the read due to visibility rules)
140 |             - When the transaction wants to commit, the database checks whether any of the ignored writes have now been committed. If so, the transaction must be aborted.
141 |         - Detecting writes that affect prior reads (the write occurs after the read)
142 |             - Similar to the index-range locks, index is recorded for the transaction when reading; when writing, the index will be checked and notify the transaction to abort
143 |     - Performance:
144 |         - one trade-off is the granularity at which transactions’ reads and writes are tracked
145 |         - Compared to two-phase locking, the big advantage is that one transaction doesn’t need to block waiting for locks held by another transaction.
146 | 
147 | 
148 | ## Reference:
149 | 
150 | - [Designing Data-Intensive Applications](https://www.amazon.ca/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321/ref=sr_1_1?dchild=1&gclid=Cj0KCQjw_8mHBhClARIsABfFgpg5q5IQvE2s5OBULx6LQFDETV41haS67EE3JAfvobPADJUJHN7dUbsaAjjrEALw_wcB&hvadid=285888202784&hvdev=c&hvlocphy=9001327&hvnetw=g&hvqmt=e&hvrand=12070511852976413586&hvtargid=kwd-407664346480&hydadcr=16109_9598899&keywords=design+data+intensive+application&qid=1626587742&sr=8-1)
151 | 


--------------------------------------------------------------------------------
/Templates/backtrack.md:
--------------------------------------------------------------------------------
 1 | # Backtracking 
 2 | 
 3 | This is essentially the traversal of a decision tree.
 4 | 
 5 | There are three things we need to consider:
 6 | 
 7 | 1. Path (or track)
 8 | 2. the list of choices
 9 | 3. End condition
10 | 
11 | ## Template
12 | 
13 | ```python
14 | result = []
15 | 
16 | def backtrack(track, result, choices):
17 |     if satisfies_end_condition:
18 |         # Here if track is a list we need to do a copy, otherwise we are just appending the pointer to the list's address
19 |         result.append(track.copy())  
20 |         return
21 | 
22 |     for choice in choices:
23 |         # Make the choice
24 |         track.append(choice)
25 |         backtrack(track, result, choices)
26 |         # undo the choice (i.e. backtrack)
27 |         track.pop()
28 | ```
29 | 
30 | 
31 | Reference:
32 | 
33 | - [Leetcode Backtracking Template](https://leetcode.com/explore/learn/card/recursion-ii/472/backtracking/2793/)
34 | - [Backtracking Template](https://github.com/labuladong/fucking-algorithm/blob/english/think_like_computer/DetailsaboutBacktracking.md)
35 | 
36 | Practice:
37 | 
38 | - Permutations:
39 |     - [46. Permutations](https://leetcode.com/problems/permutations/)
40 |     - [784. Letter Case Permutation](https://leetcode.com/problems/letter-case-permutation/)
41 |     - [47. Permutations II](https://leetcode.com/problems/permutations-ii/)
42 |     - [17. Letter Combinations of a Phone Number](https://leetcode.com/problems/letter-combinations-of-a-phone-number/)
43 | 


--------------------------------------------------------------------------------
/Templates/binary_search.md:
--------------------------------------------------------------------------------
  1 | # Binary Search
  2 | 
  3 | ***The KEY is the search interval***
  4 | 
  5 | ## Generic template
  6 | 
  7 | Find a number in a list of distinct sorted numbers, return -1 if there is no such number.
  8 | 
  9 | Time complexity: `O(logn)`
 10 | 
 11 | ```python
 12 | def generic_binary_search(nums, target):
 13 |     left = 0
 14 |     right = len(nums) - 1
 15 | 
 16 |     while left <= right:
 17 |         mid = left + (right - left) // 2
 18 |         if nums[mid] < target:
 19 |             left = mid + 1
 20 |         elif nums[mid] > target:
 21 |             right = mid - 1
 22 |         elif nums[mid] == target:
 23 |             return mid
 24 | 
 25 |     return -1
 26 | ```
 27 | 
 28 | Search interval: `[left, right]`
 29 | 
 30 | Practice: [704. Binary Search](https://leetcode.com/problems/binary-search/submissions/)
 31 | 
 32 | ## Search left
 33 | 
 34 | Find the first target in a list of sorted numbers with duplication, return -1 if there is no such number.
 35 | 
 36 | ```python
 37 | def binary_search_left(nums, target):
 38 |     if not nums:
 39 |         return -1
 40 | 
 41 |     left = 0
 42 |     right = len(nums)
 43 | 
 44 |     while left < right:
 45 |         mid = left + (right - left) // 2
 46 |         if nums[mid] < target:
 47 |             left = mid + 1
 48 |         elif nums[mid] > target:
 49 |             right = mid
 50 |         elif nums[mid] == target:
 51 |             right = mid
 52 | 
 53 |     return left if nums[left] == target else -1
 54 | ```
 55 | 
 56 | Search interval: `[left, right)`
 57 | 
 58 | ## Search right
 59 | 
 60 | Find the last target in a list of sorted numbers with duplication, return -1 if there is no such number.
 61 | 
 62 | ```python
 63 | def binary_search_left(nums, target):
 64 |     if not nums:
 65 |         return -1
 66 | 
 67 |     left = 0
 68 |     right = len(nums)
 69 | 
 70 |     while left < right:
 71 |         mid = left + (right - left) // 2
 72 |         if nums[mid] < target:
 73 |             left = mid + 1
 74 |         elif nums[mid] > target:
 75 |             right = mid
 76 |         elif nums[mid] == target:
 77 |             left = mid + 1
 78 | 
 79 |     return left - 1 if nums[left - 1] == target else -1
 80 | ```
 81 | 
 82 | Search interval: `[left, right)`
 83 | 
 84 | # Universal template
 85 | 
 86 | ```python
 87 | def generic_binary_search(nums, target):
 88 |     left = 0
 89 |     right = len(nums) - 1
 90 | 
 91 |     while left <= right:
 92 |         mid = left + (right - left) // 2
 93 |         if nums[mid] < target:
 94 |             left = mid + 1
 95 |         elif nums[mid] > target:
 96 |             right = mid - 1
 97 |         elif nums[mid] == target:
 98 |             return mid
 99 | 
100 |             # # search left
101 |             # right = mid - 1
102 | 
103 |             # # search right
104 |             # left = mid + 1
105 | 
106 |     return -1
107 | 
108 |     # # search left
109 |     # return -1 if left >= len(nums) or nums[left] != target else left
110 | 
111 |     # # search right
112 |     # return -1 if right < 0 or nums[right] != target else right
113 | ```
114 | 
115 | Search interval: `[left, right]`
116 | 
117 | # Python builtin function
118 | 
119 | You can also use the builtin `biset` function to do the same thing.
120 | 
121 | [Details here](https://docs.python.org/3/library/bisect.html)
122 | 
123 | 
124 | Practice:
125 | 
126 | - [34. Find First and Last Position of Element in Sorted Array](https://leetcode.com/problems/find-first-and-last-position-of-element-in-sorted-array/)
127 | 


--------------------------------------------------------------------------------
/Templates/dijkstra.md:
--------------------------------------------------------------------------------
 1 | # Dijkstra’s algorithm (Shortest Path)
 2 | 
 3 | ## Implementation:
 4 | 
 5 | - Use an instance variable keep track of the total cost from the start node to each destination
 6 | - The instance variable will contain the current total weight of the smallest weight path from the start to the vertex in question
 7 | - The algorithm iterates once for every vertex in the graph (order is in priority queue)
 8 | - The PriorityQueue class stores tuples of key, value pairs
 9 |     - the key in the priority queue must match the key of the vertex in the graph
10 |     - the value is used for deciding the priority, and thus the position of the key in the priority queue
11 | - Dijkstra’s algorithm works only when the weights are all positive, otherwise the algorithm enters infinity loop
12 | - time complexity: O((V+E)log(V))
13 | 
14 | 
15 | 
16 | ***Essentially, this is a BFS using priority queue instead of queue***
17 | 
18 | ## Template
19 | 
20 | First check the graph traversal template [**HERE**](./graph_traversal.md) if you haven't seen it. It is highly recommended that you fully understand the graph traversal first.
21 | 
22 | ```python
23 | from heapq import heappop, heappush
24 | 
25 | heap = [[0, start]]
26 | seen = set()
27 | while heap:
28 |     # Use dist to set priority, and vertex is the value that stores the info
29 |     dist, vertex = heappop(heap)  
30 |     if vertex not in seen:
31 |         seen.add(vertex)
32 |         if vertex == end:
33 |             return dist
34 |         for end_vertex in vertices[vertex]:
35 |             if end_vertex not in seen:
36 |                 heappush(heap, [dist+1, end_vertex])
37 | ```
38 | 
39 | Note that above template doesn't iterate through the entire maze or graph or whatever input you received. 
40 | This is simply because shortest path normally have a start point and an end point, so we just need to start from the start point, no need to check all points.
41 | 
42 | 
43 | Reference:
44 | 
45 | - [Dijkstra’s Algorithm](https://runestone.academy/runestone/books/published/pythonds/Graphs/DijkstrasAlgorithm.html)
46 | 
47 | Practice:
48 | 
49 | - [1091. Shortest Path in Binary Matrix](https://leetcode.com/problems/shortest-path-in-binary-matrix/)
50 | - [Please Share dijkstra's algorithm questions - Leetcode](https://leetcode.com/discuss/interview-question/731911/please-share-dijkstras-algorithm-questions)
51 | 


--------------------------------------------------------------------------------
/Templates/graph_SCC.md:
--------------------------------------------------------------------------------
 1 | # Strongly Connected Components
 2 | 
 3 | - This is probably won't be in an interview, so code implementation is not provided, but good to know the concepts
 4 | - **Strongly Connected Component (SCC):** a strongly connected component C, of a graph G, as the largest subset of vertices C⊂V such that for every pair of vertices v,w∈C we have a path from v to w and a path from w to v.    
 5 | ![](http://interactivepython.org/runestone/static/pythonds/_images/scc1.png)
 6 | - Once the strongly connected components have been identified we can show a simplified view of the graph by combining all the vertices in one strongly connected component into a single larger vertex.    
 7 | ![](http://interactivepython.org/runestone/static/pythonds/_images/scc2.png)
 8 | - The transposition of a graph G is defined as the graph G^T where all the edges in the graph have been reversed.
 9 | 
10 | ## Implementation
11 | 
12 | 1. Call DFS for the graph G to compute the finish times for each vertex.
13 | 2. Compute G^T (i.e. transposition of G)
14 | 3. Call DFS for the graph G^T but in the main loop of DFS explore each vertex in decreasing order of finish time
15 | 4. Each tree in the forest computed in step 3 is a strongly connected component. Output the vertex ids for each vertex in each tree in the forest to identify the component
16 | 
17 | ## Reference:
18 | 
19 | - [Strongly Connected Components](https://runestone.academy/runestone/books/published/pythonds/Graphs/StronglyConnectedComponents.html)
20 | 


--------------------------------------------------------------------------------
/Templates/graph_traversal.md:
--------------------------------------------------------------------------------
  1 | # Graph Traversal - BFS/DFS
  2 | 
  3 | First check the matrix traversal template [**HERE**](./matrix_traversal.md) if you haven't seen it. It is highly recommended that you fully understand the matrix traversal first.
  4 | 
  5 | We can use either BFS or DFS to traversal the graph
  6 | 
  7 | - Key Differences Between BFS and DFS
  8 |     1. BFS is vertex-based algorithm while DFS is an edge-based algorithm.
  9 |     2. Queue data structure is used in BFS. On the other hand, DFS uses stack or recursion.
 10 |     3. Memory space is efficiently utilized in DFS while space utilization in BFS is not effective.
 11 |         - DFS takes linear space because we have to remember single path with unexplored nodes, while BFS keeps every node in memory.
 12 |     4. BFS is optimal algorithm while DFS is not optimal.
 13 |     5. DFS constructs narrow and long trees. As against, BFS constructs wide and short tree.
 14 | 
 15 | 
 16 | ## Recursive template
 17 | 
 18 | ```python
 19 | def traverse(matrix):
 20 |     """Recursive"""
 21 |     
 22 |     rows, cols = len(matrix), len(matrix[0])
 23 |     visited = set()
 24 |     directions = ((0, 1), (0, -1), (1, 0), (-1, 0))  # Much faster than list
 25 | 
 26 |     def dfs(x, y):
 27 |         if (x, y) in visited:
 28 |             return
 29 |         visited.add((x, y))
 30 |         
 31 |         # Traverse neighbors
 32 |         for dx, dy in directions:
 33 |             x, y = dx + _x, dy + _y
 34 |             if 0 <= x < rows and 0 <= y < cols: # Check boundary
 35 |                 # Add any other checking here ^
 36 |                 dfs(x, y)
 37 |             
 38 |     for i in range(rows):
 39 |         for j in range(cols):
 40 |             dfs(i, j)
 41 | ```
 42 | 
 43 | ## Iterative template
 44 | 
 45 | ```python
 46 | # BFS TEmplate for LC 200
 47 | 
 48 | def numIslands(self, grid: List[List[str]]) -> int:
 49 |     count = 0
 50 | 
 51 |     if not grid:
 52 |         return count
 53 | 
 54 |     rows = len(grid)
 55 |     cols = len(grid[0])
 56 |     directions = ((0, 1), (0, -1), (1, 0), (-1, 0))
 57 |     visited = set()
 58 | 
 59 |     for row in range(rows):
 60 |         for col in range(cols):
 61 |             if grid[row][col] == '1':
 62 |                 # The position here is critical
 63 |                 # count here means counting all the "1"s
 64 |                 count += 1
 65 |                 visited.add((row, col))  # Mark current node as visited
 66 |                 queue = [(row, col)]  # Initiate a queue for bfs
 67 | 
 68 |                 while queue:
 69 |                     for _ in range(len(queue)):
 70 |                         # Must use name other than row and col
 71 |                         # Otherwise there will be collision
 72 |                         _x, _y = queue.pop(0)  # deque and popleft() is better here
 73 | 
 74 |                         for dx, dy in directions:
 75 |                             # Traversal all directions
 76 |                             x, y = dx + _x, dy + _y
 77 | 
 78 |                             if 0 <= x < rows and 0 <= y < cols and grid[x][y] == "1" and (x, y) not in visited:
 79 |                                 visited.add((x, y))
 80 |                                 queue.append((x, y))
 81 |                             
 82 |                             # if count is here (i.e. inside all loops) it'll count how many "1"s
 83 |                     
 84 |                     # If count is here, (i.e. out of the two for loops) 
 85 |                     # it'll count how many steps it'll need from the position 
 86 |                     # to go to all the other places (i.e. leetcode 994)
 87 | 
 88 |     return count
 89 | ```
 90 | 
 91 | Some times it was asked to not stop until you hit the wall (i.e. lc 505), apparently it requires while loop, but the stop condition can be tricky. The trick is this:
 92 | 
 93 | ```python
 94 | while queue:
 95 |     _x, _y, step = queue.popleft()            
 96 |     for dx, dy in directions:
 97 |         x, y, steps = _x, _y, step
 98 |         
 99 |         while 0 <= x + dx < rows and 0 <= y + dy < cols and maze[x+dx][y+dy] == 0:
100 |             x += dx
101 |             y += dy
102 |             steps += 1
103 | ```
104 | 
105 | This way the stop condition is handled properly.
106 | 
107 | 
108 | Reference:
109 | 
110 | - [Tips for all DFS in matrix question](https://leetcode.com/problems/pacific-atlantic-water-flow/discuss/90739/python-dfs-bests-85-tips-for-all-dfs-in-matrix-question)
111 | 
112 | Practice:
113 | 
114 | - [200. Number of Islands](https://leetcode.com/problems/number-of-islands/)
115 | - [130. Surrounded Regions](https://leetcode.com/problems/surrounded-regions/)
116 | - [695. Max Area of Island](https://leetcode.com/problems/max-area-of-island/)
117 | - [994. Rotting Oranges](https://leetcode.com/problems/rotting-oranges/)
118 | - [505. The Maze II](https://leetcode.com/problems/the-maze-ii/)
119 | 


--------------------------------------------------------------------------------
/Templates/linked_list.md:
--------------------------------------------------------------------------------
  1 | # Linked List 
  2 | 
  3 | - Each node object must hold at least two pieces of information. 
  4 | 	1. the node must contain the list item itself (i.e. data field). 
  5 | 	2. each node must hold a reference to the next node.
  6 | 
  7 | ## Tips and Template
  8 | 
  9 | - Traverse a linked list
 10 | 
 11 |     ```python
 12 |     node = root
 13 |     while node:
 14 |         print(node.val)
 15 |         node = node.next
 16 |     ```
 17 | 
 18 | - When we talk about linked list, we are normally talk about singly linked list. There is also a Doubly Linked list defined as follows:
 19 | 
 20 |     ![Leetcode Linked List Learn Doubly Linked List](https://s3-lc-upload.s3.amazonaws.com/uploads/2018/04/17/screen-shot-2018-04-17-at-161130.png)
 21 | 
 22 |     ```python
 23 |     class Node
 24 |         def __init__(self, val):
 25 |             self.val = val
 26 |             self.prev = None
 27 |             self.next = None
 28 |     ```
 29 | 
 30 | - Linked list in-place operation can be confusing (i.e. insert or delete), its better to ***draw the structure*** and it'll become much more obvious 
 31 | 
 32 |     For example when deleting a node in the middle :
 33 | 
 34 |     ![Leetcode Linked List Learn Delete Operation](https://s3-lc-upload.s3.amazonaws.com/uploads/2018/04/26/screen-shot-2018-04-26-at-203640.png)
 35 | 
 36 |     ```python
 37 |     node = self.get_node(index)
 38 |     prev_node = node.prev
 39 |     next_node = node.next
 40 |     prev_node.next = next_node
 41 |     next_node.prev = prev_node
 42 |     ```
 43 | 
 44 | - Its better to use a ***Dummy head*** most of time, especially when deleting node is required
 45 | 
 46 |     For example:
 47 | 
 48 |     ```python
 49 |     def linked_list(root):
 50 |         dummy = Node("x")
 51 |         dummy.next = root
 52 | 
 53 |         # Do your logic here
 54 | 
 55 |         return dummy.next
 56 |     ```
 57 | 
 58 | - In many cases, you need to track the previous node of the current node.
 59 | 
 60 |     ```python
 61 |     dummy = Node("x")
 62 |     dummy.next = root
 63 | 
 64 |     prev = dummy
 65 |     node = head
 66 | 
 67 |     while node:
 68 |         prev = node
 69 |         node = node.next
 70 |     ```
 71 | 
 72 | - ***Two pointer*** approach is widely used in linked list, such as detecting cycle, remove n-th node etc
 73 | 
 74 |     For example when detecting cycle:
 75 |     ```python
 76 |     def hasCycle(self, head: ListNode) -> bool:
 77 |         slow = head
 78 |         fast = head
 79 |         
 80 |         while fast and fast.next:
 81 |             slow = slow.next
 82 |             fast = fast.next.next
 83 |             
 84 |             if slow == fast:
 85 |                 return True
 86 |             
 87 |         return False
 88 |     ```
 89 | 
 90 | - It is common to use ***Hashmap or hashset*** to store visited nodes, its widely use to find intersection or beginning of cyclic linked list
 91 | 
 92 |     ```python
 93 |     def detectCycle(self, head: ListNode) -> ListNode:
 94 |         visited = set()
 95 |         while head:
 96 |             if head.next in visited:
 97 |                 return head.next
 98 |             visited.add(head)
 99 |             head = head.next
100 |         return
101 |     ```
102 | 
103 | 
104 | Here is a great comparison of ***time complexity*** between the linked list and the array from Leetcode.
105 | 
106 | ![Leetcode Linked List Learn Conclusion](https://assets.leetcode.com/uploads/2020/10/02/comparison_of_time_complexity.png)
107 | 
108 | 
109 | Reference:
110 | 
111 | - [Leetcode Introduction to Linked List](https://leetcode.com/explore/learn/card/linked-list/)
112 | 
113 | 
114 | Practice:
115 | 
116 | - Basic questions:
117 |     - [707. Design Linked List](https://leetcode.com/problems/design-linked-list/)
118 |     - [141. Linked List Cycle](https://leetcode.com/problems/linked-list-cycle/)
119 |     - [206. Reverse Linked List](https://leetcode.com/problems/reverse-linked-list/)
120 |     - [21. Merge Two Sorted Lists](https://leetcode.com/problems/merge-two-sorted-lists/)
121 | 
122 | - Tricky questions:
123 |     - [142. Linked List Cycle II - Find linked list cycle start point](https://leetcode.com/problems/linked-list-cycle-ii/)
124 |     - [430. Flatten a Multilevel Doubly Linked List](https://leetcode.com/problems/flatten-a-multilevel-doubly-linked-list/)
125 |     - [138. Copy List with Random Pointer](https://leetcode.com/problems/copy-list-with-random-pointer/)
126 | 


--------------------------------------------------------------------------------
/Templates/matrix_traversal.md:
--------------------------------------------------------------------------------
  1 | # Matrix Traversal
  2 | 
  3 | Here we are using 2D matrix as example, but the idea could be applied to multi-dimension matrix
  4 | 
  5 | ## Template
  6 | 
  7 | ## Generic Template
  8 | 
  9 | ```python
 10 | def traverse(matrix):
 11 |     rows, cols = len(matrix), len(matrix[0])
 12 | 
 13 |     for row in range(rows):
 14 |         for col in range(cols):
 15 |             print(matrix[row][col])
 16 | ```
 17 | 
 18 | ## Transpose Matrix
 19 | 
 20 | ```python
 21 | def transpose(matrix):
 22 |     rows, cols = len(matrix), len(matrix[0])
 23 | 
 24 |     for row in range(rows):
 25 |         for col in range(row, cols):
 26 |             matrix[row][col], matrix[col][row] = matrix[col][row], matrix[row][col]
 27 | ```
 28 | 
 29 | ## Reverse Row
 30 | 
 31 | ```python
 32 | def reverse_row(matrix):
 33 |     rows, cols = len(matrix), len(matrix[0])
 34 | 
 35 |     for row in range(rows):
 36 |         for col in range(cols//2):
 37 |             matrix[row][col], matrix[row][cols-1-col] = matrix[row][cols-1-col], matrix[row][col]
 38 | ```
 39 | 
 40 | ## Reverse Column
 41 | 
 42 | ```python
 43 | def reverse_col(matrix):
 44 |     rows, cols = len(matrix), len(matrix[0])
 45 | 
 46 |     for row in range(rows//2):
 47 |         for col in range(cols):
 48 |             matrix[row][col], matrix[rows-1-row][col] = matrix[rows-1-row][col], matrix[row][col]
 49 | ```
 50 | 
 51 | ## Rotation Template
 52 | 
 53 | - Rotate 90 degrees clockwise: transpose + reverse row
 54 | - Rotate 90 degrees anti-clockwise: transpose + reverse column
 55 | - Rotate 180 degrees: reverse row + reverse column
 56 | 
 57 | ## Spiral Traverse
 58 | 
 59 | This one is a bit tricky, and the idea is a little different than above.
 60 | 
 61 | Instead of looping through each column and then row, we can iterate through the total number of coordinates and increment row and col count accordingly.
 62 | 
 63 | This idea can also be used in above rotations.
 64 | 
 65 | ```python
 66 | def spiral_matrix(matrix):
 67 |     rows, cols = len(matrix), len(matrix[0])
 68 | 
 69 |     row, col, k = 0, 0, 0  # k is the boarder level
 70 |     for _ in range(rows * cols):
 71 |         # Note the order here is important
 72 |         if row == k:  # Top
 73 |             if col == cols - 1 - k:  # Top right
 74 |                 row += 1
 75 |             else:
 76 |                 col += 1
 77 |             continue
 78 | 
 79 |         if col == cols - 1 - k:  # Right
 80 |             if row == rows - 1 - k:  # Bottom Right
 81 |                 col -= 1
 82 |             else:
 83 |                 row += 1
 84 |             continue
 85 | 
 86 |         if row == rows - 1 - k:  # Bottom
 87 |             if col == k:  # Bottom Left
 88 |                 row -= 1
 89 |             else:
 90 |                 col -= 1
 91 |             continue
 92 | 
 93 |         if col == k:  # Left
 94 |             row -= 1
 95 |             if row == k:  # Boarder complete, go 1 layer inside
 96 |                 k += 1
 97 |                 row += 1
 98 |                 col += 1
 99 | ```
100 | 
101 | 
102 | Reference:
103 | 
104 | - [Rotate 90 clockwise, anti-clockwise, and rotate 180 degree](https://leetcode.com/problems/rotate-image/discuss/401356/Rotate-90-clockwise-anti-clockwise-and-rotate-180-degree)
105 | 
106 | Practice:
107 | 
108 | - [766. Toeplitz Matrix](https://leetcode.com/problems/toeplitz-matrix/)
109 | - [74. Search a 2D Matrix](https://leetcode.com/problems/search-a-2d-matrix/)
110 | - [867. Transpose Matrix](https://leetcode.com/problems/transpose-matrix/)
111 | - [832. Flipping an Image](https://leetcode.com/problems/flipping-an-image/)
112 | - [48. Rotate Image](https://leetcode.com/problems/rotate-image/)
113 | - [54. Spiral Matrix](https://leetcode.com/problems/spiral-matrix/)
114 | 


--------------------------------------------------------------------------------
/Templates/merge_sort.md:
--------------------------------------------------------------------------------
 1 | # Merge Sort
 2 | 
 3 | ## Implementation:
 4 | 
 5 | - A recursive algorithm that continually splits a list in half. 
 6 | - If the list is empty or has one item, it is sorted by definition (the base case). 
 7 | - Once the two halves are sorted, the **merge** operation, which takes two smaller sorted lists and combining them together into a single, sorted, new list. 
 8 | - time complexity: O(n log n)
 9 | 
10 | 
11 | ## Template
12 | 
13 | ```python
14 | class Solution:
15 |     def sortArray(self, nums: List[int]) -> List[int]:
16 |         """Merge sort solution"""
17 |         
18 |         if not nums:
19 |             return nums
20 |         
21 |         start = 0
22 |         end = len(nums) - 1
23 |         
24 |         return self.merge_sort(nums, start, end, temp=[-1 for i in nums])
25 |     
26 |     def merge_sort(self, nums, start, end, temp):
27 |         if start >= end:
28 |             return nums
29 |         
30 |         mid = (start + end) // 2
31 |         
32 |         self.merge_sort(nums, start, mid, temp)
33 |         self.merge_sort(nums, mid+1, end, temp)
34 |         return self.merge(nums, start, mid, end, temp)
35 |     
36 |     def merge(self, nums, start, mid, end, temp):
37 |         left = start
38 |         right = mid + 1
39 |         index = start
40 |         
41 |         while left <= mid and right <= end:
42 |             if nums[left] <= nums[right]:
43 |                 temp[index] = nums[left]
44 |                 left += 1
45 |             else:
46 |                 temp[index] = nums[right]
47 |                 right += 1
48 |             
49 |             index += 1
50 |         
51 |         while left <= mid:
52 |             temp[index] = nums[left]
53 |             index += 1
54 |             left += 1
55 |         
56 |         while right <= end:
57 |             temp[index] = nums[right]
58 |             index += 1
59 |             right += 1
60 |             
61 |         for i in range(start, end+1):
62 |             nums[i] = temp[i]
63 |             index += 1
64 |         
65 |         return nums
66 | ```
67 | 
68 | 
69 | Reference:
70 | 
71 | - [Merge sort in Python - stackabuse](https://stackabuse.com/merge-sort-in-python/)
72 | - [Merge sort in Python - educative](https://www.educative.io/edpresso/merge-sort-in-python)
73 | 
74 | Practice:
75 | 
76 | - [912. Sort an Array](https://leetcode.com/problems/sort-an-array/)
77 | 


--------------------------------------------------------------------------------
/Templates/monotonic_stack.md:
--------------------------------------------------------------------------------
 1 | # Monotonic Stack 
 2 | 
 3 | - A monotonic stack is a special form stack inside which all elements are sorted in either ascending or descending order
 4 | 
 5 | ## Template
 6 | 
 7 | ```python
 8 | def monotonic_stack(nums):
 9 |     """An increasing monotonic stack, all elements are sorted in ascending order"""
10 | 
11 |     stack = []
12 |     res = [-1] * len(nums)
13 | 
14 |     # Template for descending traversal
15 |     for i in range(len(nums)-1, -1, -1):
16 |         while stack and nums[stack[-1]] <= nums[i]:
17 |             stack.pop()   # Remove existing element that is smaller than incoming element
18 |         if stack:
19 |             res[i] = nums[stack[-1]]
20 |         stack.append(i)  # use for next iteration
21 | 
22 |     # # Template for ascending traversal
23 |     # for i in range(len(nums)):
24 |     #     while stack and (nums[stack[-1]] < nums[i]):
25 |     #         res[stack.pop()] = nums[i]
26 |     #     stack.append(i)
27 |     return res
28 | ```
29 | 
30 | Reference:
31 | 
32 | - [Using monotonic stack w/ analysis](https://leetcode.com/problems/next-greater-element-i/discuss/1113246/Cpp-Using-monotonic-stack-w-analysis)
33 | 
34 | Practice:
35 | 
36 | - [496. Next Greater Element I](https://leetcode.com/problems/next-greater-element-i/)
37 | - [503. Next Greater Element II](https://leetcode.com/problems/next-greater-element-ii/)
38 | - [739. Daily Temperatures](https://leetcode.com/problems/daily-temperatures/)
39 | - [1019. Next Greater Node In Linked List](https://leetcode.com/problems/next-greater-node-in-linked-list/)
40 | - [316. Remove Duplicate Letters](https://leetcode.com/problems/remove-duplicate-letters/)
41 | - [402. Remove K Digits](https://leetcode.com/problems/remove-k-digits/)
42 | - [42. Trapping Rain Water](https://leetcode.com/problems/trapping-rain-water/)
43 | - [84. Largest Rectangle in Histogram](https://leetcode.com/problems/largest-rectangle-in-histogram/)
44 | 


--------------------------------------------------------------------------------
/Templates/prim_spanning_tree.md:
--------------------------------------------------------------------------------
 1 | # Prim’s Spanning Tree Algorithm
 2 | 
 3 | - This is probably won't be in an interview, so code implementation is not provided, but good to know the concepts
 4 | - The algorithm is for solving the problem to efficiently transfer a piece of information to anyone and everyone who may be listening.
 5 | - **uncontrolled flooding:** the broadcast host sends a single copy of the broadcast message and lets the routers sort things out (i.e. the brute force way)
 6 |     - Each message starts with a time to live (*ttl*) value set to some number greater than or equal to the number of edges between the broadcast host and its most distant listener
 7 |     - Each router gets a copy of the message and passes the message on to all of its neighboring routers. 
 8 |     - When the message is passed on the *ttl* is decreased. Each router continues to send copies of the message to all its neighbors until the *ttl* value reaches 0.
 9 | - **Minimum weight spanning tree:** define a minimum spanning tree T for a graph G=(V,E) such that T is an acyclic subset of E that connects all the vertices in V. The sum of the weights of the edges in T is minimized.
10 |     - the broadcast host sends a single copy of the broadcast message into the network
11 |     - Each router forwards the message to any neighbor that is part of the spanning tree, excluding the neighbor that just sent it the message
12 | 
13 | ## Prim’s algorithm
14 | 
15 | - Prim’s algorithm belongs to the “greedy algorithms” family because at each step it chooses the cheapest next step, which is to follow the edge with the lowest weight in the spanning tree case
16 | - Algorithm:
17 |     - while T is not yet a spanning tree
18 |         - Find an edge that is safe to add to the tree
19 |         - Add the new edge to T
20 | - A **safe edge** is any edge that connects a vertex that is in the spanning tree to a vertex that is not in the spanning tree
21 | - The algorithm is similar to Dijkstra’s algorithm and also uses a priority queue to select the next vertex to add to the growing graph
22 |     - Select a starting node, and initialize all the other vertices to infinity
23 |     - A node is not considered to be part of the spanning tree until it is removed from the priority queue.
24 |     - Always examine the smallest distance and update the predecessor links
25 | 
26 | ## Reference:
27 | 
28 | - [Prim’s Spanning Tree Algorithm](https://runestone.academy/runestone/books/published/pythonds/Graphs/PrimsSpanningTreeAlgorithm.html)
29 | 


--------------------------------------------------------------------------------
/Templates/quick_sort.md:
--------------------------------------------------------------------------------
  1 | # Quick Sort
  2 | 
  3 | ## Implementation:
  4 | 
  5 | - First selects a value (**pivot value**), and then use this value to assist with splitting the list. 
  6 | - The actual position where the pivot value belongs in the final sorted list, commonly called the **split point**, will be used to divide the list for subsequent calls to the quick sort. 
  7 | - The **partition** process then finds the split point and at the same time move other items to the appropriate side of the list, either less than or greater than the pivot value.
  8 | 	- Partitioning begins by locating two position markers (i.e. *leftmark* and *rightmark*) at the beginning and end of the remaining items in the list. It will find the split point and at the same time move other items to the appropriate side of the list, either less than or greater than the pivot value.
  9 | 	- At the point where rightmark becomes less than leftmark, the position of rightmark is now the split point. The pivot value can be exchanged with the contents of the split point and the pivot value is now in place, all the items to the left of the split point are less than the pivot value, and all the items to the right of the split point are greater than the pivot value. The list can now be divided at the split point and the quick sort can be invoked recursively on the two halves.
 10 | 	- **Median of three:** to choose the median value from the first, the middle, and the last element in the list.
 11 | 	- time complexity: best case: O(n log n), worst case: O(n2)
 12 | 
 13 | 
 14 | ## Template
 15 | 
 16 | ```python
 17 | class Solution:
 18 |     def sortArray(self, nums: List[int]) -> List[int]:
 19 |         """Quick sort solution"""
 20 |         
 21 |         if not nums:
 22 |             return nums
 23 |         
 24 |         start = 0
 25 |         end = len(nums) - 1
 26 |         
 27 |         self.quick_sort(nums, start, end)
 28 |         return nums
 29 |     
 30 |     def quick_sort(self, nums, start, end):
 31 |         if start >= end:
 32 |             return nums
 33 |         
 34 |         left, right = start, end
 35 |         pivot = nums[(start+end)//2]
 36 |         
 37 |         while left <= right:
 38 |             while left <= right and nums[left] < pivot:
 39 |                 left += 1
 40 |             
 41 |             while left <= right and nums[right] > pivot:
 42 |                 right -= 1
 43 |             
 44 |             if left <= right:
 45 |                 nums[left], nums[right] = nums[right], nums[left]
 46 |                 left += 1
 47 |                 right -= 1
 48 |                 
 49 |         self.quick_sort(nums, start, right)
 50 |         self.quick_sort(nums, left, end)
 51 | 
 52 | 
 53 | class Solution:
 54 |     def sortArray(self, nums: List[int]) -> List[int]:
 55 |         """
 56 |         Quick sort with 3 partition
 57 |         The main idea is that we still have the same pivot point, but 
 58 |         instead of comparing with just smaller and greater or equal,
 59 |         we take out the equal case and process it individually.
 60 |         a[lo,lt-1] < pivot
 61 |         a[lt, i-1] = pivot
 62 |         a[i,gt] = unseen
 63 |         a[gt+1, hi] > pivot
 64 |         """
 65 |         
 66 |         if not nums:
 67 |             return nums
 68 |         
 69 |         start = 0
 70 |         end = len(nums) - 1
 71 |         
 72 |         self.quick_sort(nums, start, end)
 73 |         
 74 |         return nums
 75 |     
 76 |     def quick_sort(self, nums, start, end):   
 77 |         if start >= end:
 78 |             return nums
 79 |         
 80 |         left = start
 81 |         right = end
 82 |         index = start
 83 |         
 84 |         pivot = nums[(start+end)//2]
 85 |         
 86 |         while index <= right:
 87 |             if nums[index] < pivot: 
 88 |                 nums[index], nums[left] = nums[left], nums[index]
 89 |                 left += 1
 90 |                 index += 1
 91 |             elif nums[index] > pivot:  
 92 |                 nums[index], nums[right] = nums[right], nums[index]
 93 |                 right -= 1
 94 |             else:
 95 |                 index += 1
 96 |         
 97 |         self.quick_sort(nums, start, left - 1)
 98 |         self.quick_sort(nums, right + 1, end)
 99 | ```
100 | 
101 | 
102 | Reference:
103 | 
104 | - [Quick sort in Python](https://stackabuse.com/quicksort-in-python/)
105 | - [Quicksort 3 way partition](https://gist.github.com/adonese/4bf34d5b57ee0358626c)
106 | 
107 | Practice:
108 | 
109 | - [912. Sort an Array](https://leetcode.com/problems/sort-an-array/)
110 | 


--------------------------------------------------------------------------------
/Templates/sliding_window.md:
--------------------------------------------------------------------------------
 1 | # Sliding Window
 2 | 
 3 | ***Fundamentally this is a two pointer approach***
 4 | 
 5 | How it works:
 6 | 
 7 | - A general way is to use a hashmap assisted with two pointers.
 8 | - Use two pointers: start and end to represent a window.
 9 | - Move end to find a valid window.
10 | - When a valid window is found, move start to find a smaller window.
11 | - To check if a window is valid, we use a map to store (char, count) for chars in t, and use counter for the number of chars of t to be found in s.
12 | 
13 | There are two keys here:
14 | 1. the two pointers both start from the beginning of the second string initially
15 | 2. move j until `j - i == len(first_string)`
16 | 
17 | 
18 | ```python
19 | def sliding_window(nums):
20 |     left = 0  # initiate the left boundary of window
21 |     for right in range(len(nums)):  # iterate the right boundary of window
22 |         while is_valid_window:
23 |             left += 1  # Reduce left boundary to shrink window size
24 | ```
25 | 
26 | The template code is fairly simple.
27 | 
28 | Basically we iterate through the given list, and create a window. 
29 | 
30 | When there the condition is satisfied (i.e. subset of the list satisfied the required returning condition), we reduce the left side to shrink the window, to find the best solution (most cases its the minimum subset of given list)
31 | 
32 | 
33 | Reference:
34 | 
35 | - [Leetcode 567 detailed explanation](https://leetcode.com/problems/permutation-in-string/discuss/638531/Java-or-C%2B%2B-or-Python3-or-Detailed-explanation-or-O(N)-time)
36 | - [10-line template that can solve most 'substring' problems
37 | ](https://leetcode.com/problems/minimum-window-substring/discuss/26808/here-is-a-10-line-template-that-can-solve-most-substring-problems)
38 | - [Sliding Window algorithm template to solve all the Leetcode substring search problem.](https://leetcode.com/problems/find-all-anagrams-in-a-string/discuss/92007/Sliding-Window-algorithm-template-to-solve-all-the-Leetcode-substring-search-problem.)
39 | 
40 | 
41 | Practice:
42 | 
43 | - [594. Longest Harmonious Subsequence](https://leetcode.com/problems/longest-harmonious-subsequence/)
44 | - [3. Longest Substring Without Repeating Characters](https://leetcode.com/problems/longest-substring-without-repeating-characters/)
45 | - [159. Longest Substring with At Most Two Distinct Characters](https://leetcode.com/problems/longest-substring-with-at-most-two-distinct-characters/)
46 | - [438. Find All Anagrams in a String](https://leetcode.com/problems/find-all-anagrams-in-a-string/)
47 | - [1156. Swap For Longest Repeated Character Substring](https://leetcode.com/problems/swap-for-longest-repeated-character-substring/)
48 | - [1004. Max Consecutive Ones III](https://leetcode.com/problems/max-consecutive-ones-iii/)
49 | - [76. Minimum Window Substring](https://leetcode.com/problems/minimum-window-substring/)
50 | - [30. Substring with Concatenation of All Words](https://leetcode.com/problems/substring-with-concatenation-of-all-words/)
51 | 


--------------------------------------------------------------------------------
/Templates/topological_sort.md:
--------------------------------------------------------------------------------
 1 | # Topological Sort
 2 | 
 3 | A topological sort takes a directed acyclic graph and produces a linear ordering of all its vertices such that if the graph G contains an edge (v,w) then the vertex v comes before the vertex w in the ordering.
 4 | 
 5 | Two important steps for topological sort is:
 6 | 
 7 | 1. Find the in degree for each node
 8 | 2. Construct the adjacency list for the graph
 9 | 
10 | 
11 | ```python
12 | def findOrder(self, numCourses: int, prerequisites: List[List[int]]) -> List[int]:
13 |     """
14 |     Topological sort, works for DAG (Directed acyclic graph)
15 |     1. Calculate the in-degree for all points, initiate empty adj_list
16 |     2. Put all the 0-degree points (i.e. point that has no neighbor points) into BFS queue
17 |     3. Pop point from BFS queue and put it in the topo queue. Each time of the process visit 
18 |         all the neighbor points and reduce their in-degrees by 1
19 |     4. if the in-degree becomes 0, put them in the queue
20 |     5. End when the queue is empty
21 |     
22 |     https://sugarac.gitbooks.io/facebook-interview-handbook/jiu-zhang-dai-ma-mo-ban.html
23 |     https://leetcode.com/problems/course-schedule-ii/discuss/368716/Python3-Breadth-first-search
24 |     """
25 |     
26 |     # Init a list stores No. of incoming edges of each vertex and 
27 |     # Init map (an adjacency list) to record the node's children
28 |     in_degree = defaultdict(int)
29 |     adj_list = defaultdict(list)
30 |     
31 |     # Build map to put the child into parent's list
32 |     for current_course, prev_course in prerequisites:
33 |         adj_list[prev_course].append(current_course)
34 |         in_degree[current_course] += 1  # a directed edge
35 |     
36 |     #  a queue of all vertices with no incoming edge
37 |     queue = []
38 |     for node in range(numCourses):
39 |         if node not in in_degree:
40 |             queue.append(node)
41 | 
42 |     res = []
43 |     while queue:
44 |         node = queue.pop(0)  # BFS, pops vertex
45 |         res.append(node)
46 |         for neighbor in adj_list[node]:
47 |             # for each descendant of current vertex, reduce its in-degree by 1
48 |             in_degree[neighbor] -= 1
49 |             
50 |             if in_degree[neighbor] == 0:
51 |                 queue.append(neighbor)
52 |                 del in_degree[neighbor]
53 |     return res if len(res)==numCourses else []
54 | ```
55 | 
56 | Practice:
57 | 
58 | - [207. Course Schedule](https://leetcode.com/problems/course-schedule/)
59 | - [210. Course Schedule II](https://leetcode.com/problems/course-schedule-ii/)
60 | - [1136. Parallel Courses](https://leetcode.com/problems/parallel-courses/)
61 | - [269. Alien Dictionary](https://leetcode.com/problems/alien-dictionary/)
62 | - [1203. Sort Items by Groups Respecting Dependencies](https://leetcode.com/problems/sort-items-by-groups-respecting-dependencies/)
63 | 
64 | 
65 | Reference:
66 | 
67 | - [Topological sort explain (Chinese)](https://sugarac.gitbooks.io/facebook-interview-handbook/content/jiu-zhang-dai-ma-mo-ban.html)
68 | - [Solution for 210](https://leetcode.com/problems/course-schedule-ii/discuss/368716/Python3-Breadth-first-search)
69 | 


--------------------------------------------------------------------------------
/Templates/tree_traversal.md:
--------------------------------------------------------------------------------
  1 | # Tree Traversal 
  2 | 
  3 | Tree traversal is critical for solving tree and graph related problems
  4 | 
  5 | Here are a few different ways to traverse a tree
  6 | 
  7 | - BFS (Level order)
  8 | - DFS
  9 |     - **preorder:** visit root first, then recursively do traversal of the left subtree,then a recursive traversal of the right subtree.
 10 |         - root -> left -> right
 11 |     - **inorder:** recursively do traversal on the left subtree, then the root node, and finally do traversal of the right subtree.
 12 |         - left -> root -> right
 13 |     - **postorder:** do traversal of the left subtree and the right subtree, finally to the root node.
 14 |         - left -> right -> root
 15 | 
 16 | To better understand this, here are some plots:
 17 | 
 18 | Pre-order
 19 | 
 20 | ![Pre-order traversal](../images/Preorder.png)
 21 | 
 22 | Pre-order
 23 | 
 24 | ![In-order traversal](../images/Inorder.png)
 25 | 
 26 | Pre-order
 27 | 
 28 | ![Post-order traversal](../images/Postorder.png)
 29 | 
 30 | DFS Summary
 31 | 
 32 | ![DFS traversal summary](../images/Tree-DFS.png)
 33 | 
 34 | ## Template
 35 | 
 36 | ### BFS
 37 | 
 38 | ```python
 39 | def bfs_traverse(root):
 40 |     if not root:
 41 |         return []
 42 | 
 43 |     res = []
 44 |     queue = [root]  # Its better to use deque here, i.e. deque([root])
 45 |     while queue:
 46 |         level = []
 47 |         queue_size = len(queue)
 48 |         for i in range(queue_size):
 49 |             node = queue.pop(0)
 50 |             level.append(node.val)
 51 |             if node.left:
 52 |                 queue.append(node.left)
 53 | 
 54 |             if node.right:
 55 |                 queue.append(node.right)
 56 |         
 57 |         res.append(level)
 58 |     
 59 |     return res
 60 | ```
 61 | 
 62 | ### DFS
 63 | 
 64 | ```python
 65 | # Recursive
 66 | def dfs_traverse(root):
 67 |     if not root:
 68 |         return
 69 | 
 70 |     # Pre-Order
 71 |     dfs_traverse(root.left)
 72 |     # In-Order
 73 |     dfs_traverse(root.right)
 74 |     # Post-Order
 75 | 
 76 | # Iterative
 77 | def preorder(root):
 78 |     stack = []
 79 |     res = []
 80 |     
 81 |     while root or stack:
 82 |         if root:
 83 |             res.append(root.val)
 84 |             stack.append(root)
 85 |             root = root.left
 86 |         elif stack:
 87 |             node = stack.pop()
 88 |             root = node.right
 89 |     
 90 |     return res
 91 | 
 92 | def inorder(root):
 93 |     stack = []
 94 |     res = []
 95 |     
 96 |     while root or stack:
 97 |         if root:
 98 |             stack.append(root)
 99 |             root = root.left
100 |         elif stack:
101 |             node = stack.pop()
102 |             res.append(node.val)
103 |             root = node.right
104 | 
105 |     return res
106 | 
107 | def postorder(root):
108 |     """
109 |     There are multiple ways to do a post order traversal iteratively, such as 
110 |     using two stacks, one stack, reversing pre-order, or Morris traversal
111 |     Here is a template for using one stack
112 |     """
113 |     stack = []
114 |     res = []
115 |     
116 |     while stack or root:
117 | 
118 |         while root:
119 |             if root.right:
120 |                 stack.append(root.right)
121 |             stack.append(root)
122 |             root = root.left
123 | 
124 |         root = stack.pop()
125 | 
126 |         if root.right and stack and stack[-1] == root.right:
127 |             stack.pop()
128 |             stack.append(root)
129 |             root = root.right 
130 | 
131 |         else:
132 |             res.append(root.val)
133 |             root = None
134 |     return res
135 | ```
136 | 
137 | 
138 | Reference:
139 | 
140 | - [Iterative Postorder Traversal | Set 2 (Using One Stack)](https://www.geeksforgeeks.org/iterative-postorder-traversal-using-stack/)
141 | - [Morris traversal (O(1) space pre-order traversal)](https://www.educative.io/edpresso/what-is-morris-traversal)
142 | - [Leetcode Binary Tree](https://leetcode.com/explore/learn/card/data-structure-tree/)
143 | 
144 | Practice:
145 | 
146 | - [144. Binary Tree Preorder Traversal](https://leetcode.com/problems/binary-tree-preorder-traversal/)
147 | - [94. Binary Tree Inorder Traversal](https://leetcode.com/problems/binary-tree-inorder-traversal/)
148 | - [145. Binary Tree Postorder Traversal](https://leetcode.com/problems/binary-tree-postorder-traversal/)
149 | - [102. Binary Tree Level Order Traversal](https://leetcode.com/problems/binary-tree-level-order-traversal/)
150 | - [106. Construct Binary Tree from Inorder and Postorder Traversal](https://leetcode.com/problems/construct-binary-tree-from-inorder-and-postorder-traversal/)
151 | - [105. Construct Binary Tree from Preorder and Inorder Traversal](https://leetcode.com/problems/construct-binary-tree-from-preorder-and-inorder-traversal/)
152 | - [236. Lowest Common Ancestor of a Binary Tree](https://leetcode.com/problems/lowest-common-ancestor-of-a-binary-tree/)
153 | - [297. Serialize and Deserialize Binary Tree](https://leetcode.com/problems/serialize-and-deserialize-binary-tree/)
154 | 


--------------------------------------------------------------------------------
/Templates/trie.md:
--------------------------------------------------------------------------------
 1 | # Trie (Prefix Tree)
 2 | 
 3 | ```python
 4 | class TrieNode:
 5 |     def __init__(self):
 6 |         self.children = collections.defaultdict(TrieNode)
 7 |         self.is_word = False
 8 | 
 9 | class Trie:
10 | 
11 |     def __init__(self):
12 |         self.root = TrieNode()
13 | 
14 |     def insert(self, word):
15 |         current = self.root
16 |         for letter in word:
17 |             current = current.children[letter]
18 |         current.is_word = True
19 | 
20 |     def search(self, word):
21 |         current = self.root
22 |         for letter in word:
23 |             current = current.children.get(letter)
24 |             if current is None:
25 |                 return False
26 |         return current.is_word
27 | 
28 |     def startsWith(self, prefix):
29 |         current = self.root
30 |         for letter in prefix:
31 |             current = current.children.get(letter)
32 |             if current is None:
33 |                 return False
34 |         return True
35 | ```
36 | 
37 | Practice: 
38 | 
39 | - [720. Longest Word in Dictionary](https://leetcode.com/problems/longest-word-in-dictionary/)
40 | - [208. Implement Trie (Prefix Tree)](https://leetcode.com/problems/implement-trie-prefix-tree/)
41 | - [648. Replace Words](https://leetcode.com/problems/replace-words/)
42 | - [677. Map Sum Pairs](https://leetcode.com/problems/map-sum-pairs/)
43 | - [1268. Search Suggestions System](https://leetcode.com/problems/search-suggestions-system/)
44 | - [676. Implement Magic Dictionary](https://leetcode.com/problems/implement-magic-dictionary/)
45 | - [1023. Camelcase Matching](https://leetcode.com/problems/camelcase-matching/)
46 | 
47 | Reference:
48 | 
49 | - [Implementing a Trie in Python (in less than 100 lines of code)](https://towardsdatascience.com/implementing-a-trie-data-structure-in-python-in-less-than-100-lines-of-code-a877ea23c1a1)
50 | 


--------------------------------------------------------------------------------
/Templates/union_find.md:
--------------------------------------------------------------------------------
 1 | # Union Find 
 2 | 
 3 | - Union Find (or disjoint-set) is a very elegant data structure
 4 | - Essentially it utilizes a list representation for the joint data points
 5 |     - the index of the data point indicates its linkage status
 6 | - For detailed explanation, please see this [Lecture Notes](https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf)
 7 | 
 8 | ## Template
 9 | 
10 | ```python
11 | class UnionFind:
12 |     def __init__(self, n):
13 |         self.count = n
14 |         self.parent = list(range(n))  # root list
15 |         self.size = [1] * n  # weight
16 |         
17 |     def find(self, i):
18 |         while self.parent[i] != i:
19 |             self.parent[i] = self.parent[self.parent[i]]  # Path compression
20 |             i = self.parent[i]
21 |         return i
22 |     
23 |     def union(self, p, q):
24 |         i, j = self.find(p), self.find(q)
25 |         
26 |         if i == j:
27 |             return 
28 | 
29 |         # merge smaller tree into larger tree to obtain a flat structure
30 |         if self.size[i] < self.size[j]:
31 |             self.parent[i] = j
32 |             self.size[j] += self.size[i]
33 |         else:
34 |             self.parent[j] = i
35 |             self.size[i] += self.size[j]
36 |         
37 |         self.count -= 1
38 | ```
39 | 
40 | 
41 | Reference:
42 | 
43 | - [Union Find with Explanations (Java / Python)](https://leetcode.com/problems/redundant-connection/discuss/123819/Beats-97.96-Union-Find-Java-with-Explanations)
44 | - [[Python] Solved by Union Find Template](https://leetcode.com/problems/couples-holding-hands/discuss/391618/python-solved-by-union-find-template)
45 | - [Python, Weighted Union+Find with Path Compression](https://leetcode.com/problems/number-of-provinces/discuss/1244070/Python-Weighted-Union%2BFind-with-Path-Compression)
46 | - [[Python3] Classic Union Find Solution](https://leetcode.com/problems/number-of-islands-ii/discuss/1083904/Python3-Classic-Union-Find-Solution)
47 | 
48 | Practice:
49 | 
50 | - [684. Redundant Connection](https://leetcode.com/problems/redundant-connection/)
51 | - [547. Number of Provinces](https://leetcode.com/problems/number-of-provinces/)
52 | - [765. Couples Holding Hands](https://leetcode.com/problems/couples-holding-hands/)
53 | - [305. Number of Islands II](https://leetcode.com/problems/number-of-islands-ii/)
54 | 


--------------------------------------------------------------------------------
/images/Inorder.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zmcddn/coding-interview-guide/0da5588f52ba92a2ce8fc03d6fca9a1ae394e4b5/images/Inorder.png


--------------------------------------------------------------------------------
/images/Postorder.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zmcddn/coding-interview-guide/0da5588f52ba92a2ce8fc03d6fca9a1ae394e4b5/images/Postorder.png


--------------------------------------------------------------------------------
/images/Preorder.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zmcddn/coding-interview-guide/0da5588f52ba92a2ce8fc03d6fca9a1ae394e4b5/images/Preorder.png


--------------------------------------------------------------------------------
/images/System-Components.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zmcddn/coding-interview-guide/0da5588f52ba92a2ce8fc03d6fca9a1ae394e4b5/images/System-Components.png


--------------------------------------------------------------------------------
/images/Tree-DFS.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zmcddn/coding-interview-guide/0da5588f52ba92a2ce8fc03d6fca9a1ae394e4b5/images/Tree-DFS.png


--------------------------------------------------------------------------------
/images/how-to-use-the-repo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zmcddn/coding-interview-guide/0da5588f52ba92a2ce8fc03d6fca9a1ae394e4b5/images/how-to-use-the-repo.png


--------------------------------------------------------------------------------