├── .gitignore ├── README.md ├── SystemDesign ├── RDBMS.md ├── cache.md ├── consistency_consensus.md ├── internet_protocol_suite.md ├── load_balancer.md ├── navigate_url.md ├── nosql_db.md ├── replication_partition.md ├── scale_web_app.md ├── storage_system.md └── transaction_isolation.md ├── Templates ├── backtrack.md ├── binary_search.md ├── dijkstra.md ├── graph_SCC.md ├── graph_traversal.md ├── linked_list.md ├── matrix_traversal.md ├── merge_sort.md ├── monotonic_stack.md ├── prim_spanning_tree.md ├── quick_sort.md ├── sliding_window.md ├── topological_sort.md ├── tree_traversal.md ├── trie.md └── union_find.md └── images ├── Inorder.png ├── Postorder.png ├── Preorder.png ├── System-Components.png ├── Tree-DFS.png └── how-to-use-the-repo.png /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | *.iml 3 | *.docx 4 | 5 | # Windows & Unix temporary files 6 | ~$* 7 | *~ 8 | 9 | # OS generated files # 10 | .DS_Store 11 | .DS_Store? 12 | ._* 13 | .Spotlight-V100 14 | .Trashes 15 | ehthumbs.db 16 | Thumbs.db 17 | 18 | # Python 19 | *.pyc 20 | __pycache__ 21 | 22 | *.egg-info 23 | build 24 | dist 25 | 26 | # IDE 27 | .vscode 28 | .vscode/ 29 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # The Coding Interview Guide 2 | 3 | > This is a collection of my notes when I was/am studying for interviews (or just for learning) 4 | > and it is also intended to become a systematic guide for people who would like to 5 | > become a software development engineer (SDE). 6 | 7 | > I was graduated as an Electrical Engineer and after worked 1.5 years as an Electrical 8 | > Engineer I decided to switch to the programming world. I only had one programming course 9 | > at university and I have taught myself all the most common data structures and 10 | > algorithms, machine learning fundamentals, and software architectures. I started working 11 | > on leetcode problems at end of Feb 2020, and since then I solve at least 1 problem 12 | > everyday. 13 | 14 | > During my learning journey, I found that I learned all the knowledge piece by piece 15 | > when I need them, but I was always lack of a systematic understanding/overview. Since 16 | > I made lots of notes during study, I think its necessary to summarize them and make 17 | > them a systematic guide for whoever wants to become a software development engineer. 18 | 19 | > Hope this repo is helpful and best of luck to you! 20 | 21 | 22 | ## Preface 1: How to use this repo 23 | 24 | In this repo I'm going to show you how to get started for those who want to switch career, 25 | to improve the interviewing technics, or to revisit the fundamentals. 26 | 27 | - I'll build a systematic path for data structures, algorithms, and system design problems 28 | - I'll be sharing the tips and tricks on how to practice for the algorithms 29 | - I'll be sharing the technics for answering Behavior Questions (BQs) 30 | - I'll be giving resources that I learnt 31 | - MAKE SURE YOU READ THE [QA SECTION](#appendix-1-question--answer) 32 | 33 | Most importantly, learning is a life long journey, and I'm still learning everyday. 34 | Hopefully you can learn with me, and I can help people getting into the field more easily 35 | or improving their technical abilities. 36 | 37 | ***How to use this repo:*** 38 | ![how to use this repo](./images/how-to-use-the-repo.png) 39 | 40 | Basically, you should use this repo as a guide (yeah the name kinda indicates) to build your own knowledge base and templates. 41 | 42 | And if you have enough time, you should go over every topics. If not you can just read the topics you are interested. 43 | 44 | 45 | ## Preface 2: Before get started, REMEMBER the following facts 46 | 47 | - **Programming is about writing the code, not reading**. So don't just read, IMPLEMENT it! 48 | - **You can't memorize everything at the first time**. So keep repeating and practicing, and WRITE them down! 49 | - **Don't feel you aren't smart enough**. In fact a lot of programming questions are tricky since you've never seen them before. Once you've seen enough, you'll know the tricks. 50 | - **Review, review, review**. Review your code and your notes once a while, try to refactor/optimize your code, and refresh your memories about the fundamentals. 51 | - **Learning is a life lon journey**. So keep learning and reading! 52 | 53 | Above points are really really important, keep these in mind and they'll help you in your career. 54 | 55 | Now let's begin our journey! 56 | 57 | 58 | ## Table of Content 59 | 60 | - [The Coding Interview Guide](#the-coding-interview-guide) 61 | - [Preface 1: How to use this repo](#preface-1-how-to-use-this-repo) 62 | - [Preface 2: Before get started, REMEMBER the following facts](#preface-2-before-get-started-remember-the-following-facts) 63 | - [Table of Content](#table-of-content) 64 | - [Section 1: Data Structures and Algorithms](#section-1-data-structures-and-algorithms) 65 | - [Chapter 1: Data Structures](#chapter-1-data-structures) 66 | - [1.1 Array](#11-array) 67 | - [1.2 Linked List](#12-linked-list) 68 | - [1.3 Stack](#13-stack) 69 | - [1.3.1 Arithmetic Expressions](#131-arithmetic-expressions) 70 | - [1.4 Queue](#14-queue) 71 | - [1.5 Hash Table](#15-hash-table) 72 | - [1.6 Trees](#16-trees) 73 | - [1.6.1 Tree Traversal: access the nodes of the tree](#161-tree-traversal-access-the-nodes-of-the-tree) 74 | - [1.6.2 Binary Search Tree (BST)](#162-binary-search-tree-bst) 75 | - [1.6.3 Heap / Priority Queue / Binary Heap](#163-heap--priority-queue--binary-heap) 76 | - [1.6.4 More Trees](#164-more-trees) 77 | - [1.7 Graph](#17-graph) 78 | - [1.7.1 Vocabulary and Definitions](#171-vocabulary-and-definitions) 79 | - [1.7.2 Graph Representation](#172-graph-representation) 80 | - [1.7.3 Graph Algorithms](#173-graph-algorithms) 81 | - [Chapter 2: Common Algorithm Types](#chapter-2-common-algorithm-types) 82 | - [2.1 Brute Force](#21-brute-force) 83 | - [2.2 Search](#22-search) 84 | - [2.2.1 Sequential Search](#221-sequential-search) 85 | - [2.2.2 Binary Search](#222-binary-search) 86 | - [2.3 Sort](#23-sort) 87 | - [2.3.1 Bubble Sort](#231-bubble-sort) 88 | - [2.3.2 Selection Sort](#232-selection-sort) 89 | - [2.3.3 Insertion Sort](#233-insertion-sort) 90 | - [2.3.4 Shell Sort](#234-shell-sort) 91 | - [2.3.5 Merge Sort](#235-merge-sort) 92 | - [2.3.6 Quick Sort](#236-quick-sort) 93 | - [2.3.7 Heap Sort](#237-heap-sort) 94 | - [2.4 Recursion](#24-recursion) 95 | - [2.4.1 Recursive function in Python](#241-recursive-function-in-python) 96 | - [2.5 Backtracking](#25-backtracking) 97 | - [2.6 Dynamic Programming](#26-dynamic-programming) 98 | - [2.7 Divide and Conquer](#27-divide-and-conquer) 99 | - [2.8 Greedy](#28-greedy) 100 | - [2.9 Branch and Bound](#29-branch-and-bound) 101 | - [Chapter 3: Frequently Used Technics and Algorithms](#chapter-3-frequently-used-technics-and-algorithms) 102 | - [3.1 Must know for interview](#31-must-know-for-interview) 103 | - [3.2 Good to know but can be skipped](#32-good-to-know-but-can-be-skipped) 104 | - [Summary](#summary) 105 | - [Section 2: System Design](#section-2-system-design) 106 | - [Chapter 4: System Design Interview Template](#chapter-4-system-design-interview-template) 107 | - [Chapter 5: System Design Components](#chapter-5-system-design-components) 108 | - [Chapter 6: Classic Designs](#chapter-6-classic-designs) 109 | - [Chapter 7: System Design Case Study](#chapter-7-system-design-case-study) 110 | - [Section 3: Transferrable Skills and Offer](#section-3-transferrable-skills-and-offer) 111 | - [Chapter 8: Behavioral Questions (BQ)](#chapter-8-behavioral-questions-bq) 112 | - [8.1 Four things to remember for the BQ](#81-four-things-to-remember-for-the-bq) 113 | - [8.2 How to prepare for BQ](#82-how-to-prepare-for-bq) 114 | - [Chapter 9: Offer Negotiation](#chapter-9-offer-negotiation) 115 | - [Appendix 1: Question & Answer](#appendix-1-question--answer) 116 | - [A1.1 Technical Questions](#a11-technical-questions) 117 | - [A1.1.1 How to use LeetCode as a beginner](#a111-how-to-use-leetcode-as-a-beginner) 118 | - [A1.1.2 How to solve LeetCode problem EFFECTIVELY](#a112-how-to-solve-leetcode-problem-effectively) 119 | - [A1.1.3 How to solve LeetCode problem EFFICIENTLY](#a113-how-to-solve-leetcode-problem-efficiently) 120 | - [A1.1.4 Pay close attention to these when solving problem (gain max value of leetcode problem)](#a114-pay-close-attention-to-these-when-solving-problem-gain-max-value-of-leetcode-problem) 121 | - [A1.1.5 Why you should use a template for the algorithm and data structures](#a115-why-you-should-use-a-template-for-the-algorithm-and-data-structures) 122 | - [A1.1.6 What should I do if I lost confident when practicing leetcode](#a116-what-should-i-do-if-i-lost-confident-when-practicing-leetcode) 123 | - [A1.1.7 I still can't solve new problems even if I finished x number of problems on LeetCode](#a117-i-still-cant-solve-new-problems-even-if-i-finished-x-number-of-problems-on-leetcode) 124 | - [A1.2 Interview Questions](#a12-interview-questions) 125 | - [A1.2.1 What's the interview process look like](#a121-whats-the-interview-process-look-like) 126 | - [A1.2.2 How to write an effective resume](#a122-how-to-write-an-effective-resume) 127 | - [A1.2.3 I have applied to many jobs but still no interview](#a123-i-have-applied-to-many-jobs-but-still-no-interview) 128 | - [A1.2.4 How to solve an algorithm/data structure problem in interview](#a124-how-to-solve-an-algorithmdata-structure-problem-in-interview) 129 | - [A1.3 General Questions](#a13-general-questions) 130 | - [A1.3.1 Large Company VS Small Company](#a131-large-company-vs-small-company) 131 | - [A1.3.2 How to get your FIRST job! (How to become more competitive among the candidates)](#a132-how-to-get-your-first-job-how-to-become-more-competitive-among-the-candidates) 132 | - [Appendix 2: Resources](#appendix-2-resources) 133 | - [A2.1 Learning Experience](#a21-learning-experience) 134 | - [A2.1.1 Online MOOC courses](#a211-online-mooc-courses) 135 | - [A2.2 How to solve Algorithm Questions](#a22-how-to-solve-algorithm-questions) 136 | - [A2.3 OOD (Object Oriented Design)](#a23-ood-object-oriented-design) 137 | - [A2.3.1 SOLID Principals](#a231-solid-principals) 138 | - [A2.3.2 Clean Code - Uncle Bob lessons](#a232-clean-code---uncle-bob-lessons) 139 | - [A2.4 Design Patterns](#a24-design-patterns) 140 | - [A2.5 Async in Python](#a25-async-in-python) 141 | - [A2.6 System Design](#a26-system-design) 142 | - [A2.7 Machine Learning](#a27-machine-learning) 143 | - [A2.8 Reinforcement Learning](#a28-reinforcement-learning) 144 | - [Postface](#postface) 145 | 146 | --- 147 | 148 | ## Section 1: Data Structures and Algorithms 149 | 150 | **Book:** [Problem Solving with Algorithms and Data Structures using Python](https://runestone.academy/runestone/books/published/pythonds/index.html) 151 | 152 | - For those who needs to study the fundamental data structures and algorithms, highly recommend to go over above textbook thoroughly first, and then come back to the following content, or practice on Leetcode or other platform 153 | 154 | 155 | **Basic data structures**: 156 | 157 | - Array 158 | - Linked List 159 | - Stack 160 | - Queue 161 | - Hash Table 162 | - Tree 163 | - Graph 164 | 165 | **Common Algorithm Types**: 166 | 167 | - Brute Force 168 | - Search and Sort 169 | - Recursive 170 | - Backtracking 171 | - Dynamic Programming 172 | - Divide and Conquer 173 | - Greedy 174 | - Branch and Bound 175 | 176 | **Big O Notations**: 177 | 178 | - It is critical that you understand and are able to calculate the Big O for the code you wrote. 179 | - **The order of magnitude function describes the part of T(n) that increases the fastest as the value of n increases. Order of magnitude is often called Big-O notation (for “order”) and written as O(f(n)).** 180 | 181 | - Basically, the Big O measures the number of assignment statements 182 | 183 | | f(n) | Name | 184 | | :----- | :---- | 185 | | 1 | Constant | 186 | | log n | Logarithmic | 187 | | n | Linear | 188 | | n log n | Log Linear | 189 | | n^2 | Quadratic | 190 | | n^3 | Cubic | 191 | | 2^n | Exponential | 192 | 193 | ![BigO Image](https://runestone.academy/runestone/books/published/pythonds/_images/newplot.png) 194 | 195 | 196 | ### Chapter 1: Data Structures 197 | 198 | #### 1.1 Array 199 | 200 | - An array (in Python its called *list*) is a collection of items where each item holds a relative position with respect to the others. 201 | 202 | #### 1.2 Linked List 203 | 204 | - Similar to array, but requires O(N) time on average to visit an element by index 205 | - Linked list utilize memory better than array, since it can use discrete memory space, whereas array must use continuous memory space 206 | - [Details and Templates](./Templates/linked_list.md) 207 | 208 | #### 1.3 Stack 209 | 210 | - Stacks are fundamentally important, as they can be used to reverse the order of items. 211 | - The order of insertion is the reverse of the order of removal. 212 | - Stack maintain a FILO (first in last out) ordering property. 213 | - When pop is called on the end of the list it takes O(1) but when pop is called on the first element in the list or anywhere in the middle it is O(n) (in Python). 214 | 215 | ##### 1.3.1 Arithmetic Expressions 216 | 217 | - Infix: the operator is in between the two operands that it is working on (i.e. A+B) 218 | - Fully Parenthesized expression: uses one pair of parentheses for each operator. (i.e. ((A + (B * C)) + D)) 219 | - Prefix: all operators precede the two operands that they work on (i.e. +AB) 220 | - Postfix: operators come after the corresponding operands (i.e. AB+) 221 | 222 | | Infix Expression | Prefix Expression | Postfix Expression | 223 | | ----------------- | ----------------- | ------------------ | 224 | | A + B | + A B | A B + | 225 | | A + B * C | + A * B C | A B C * + | 226 | | (A + B) * C | * + A B C | A B + C * | 227 | | A + B * C + D | + + A * B C D | A B C * + D + | 228 | | (A + B) * (C + D) | * + A B + C D | A B + C D + * | 229 | | A * B + C * D | + * A B * C D | A B * C D * + | 230 | | A + B + C + D | + + + A B C D | A B + C + D + | 231 | 232 | - **NOTE:** 233 | - Only infix notation requires parentheses to determine precedence 234 | - The order of operations within prefix and postfix expressions is completely determined by the position of the operator and nothing else 235 | 236 | #### 1.4 Queue 237 | 238 | - A queue is structured as an ordered collection of items which are added at one end, called the “rear,” and removed from the other end, called the “front.” 239 | - Queues maintain a FIFO ordering property. 240 | - A ***deque***, also known as a double-ended queue, is an ordered collection of items similar to the queue. 241 | - It has two ends, a front and a rear, and the items remain positioned in the collection. 242 | - New items can be added at either the front or the rear. 243 | - Likewise, existing items can be removed from either end. 244 | 245 | #### 1.5 Hash Table 246 | 247 | - A **hash table** is a collection of items which are stored in such a way as to make it easy to find them later. 248 | - Each position of the hash table, often called a slot, can hold an item and is named by an integer value starting at 0. 249 | - The mapping between an item and the slot where that item belongs in the hash table is called the **hash function**. 250 | - **Remainder method** takes an item and divides it by the table size, returning the remainder as its hash value (i.e. `h(item) = item % 11`) 251 | - **load factor** is the number of items divided by the table size 252 | - **collision** refers to the situation that multiple items have the same hash value 253 | - **folding method** for constructing hash functions begins by dividing the item into equal-size pieces (the last piece may not be of equal size). These pieces are then added together to give the resulting hash value. 254 | - **mid-square method** first squares the item, and then extract some portion of the resulting digits. For example, 44^2 = 1936, extract middle two digits 93, then perform remainder step (93%11=5). 255 | - **Collision Resolution** is the process to systematically place the second item in the hash table when two items hash to the same slot. 256 | - **Open addressing (linear probing):** sequentially find the next open slot or address in the hash table 257 | - A disadvantage to linear probing is the tendency for clustering; items become clustered in the table. 258 | - **Rehashing** is one way to deal with clustering, which is to skip the slot when looking sequentially for the next open slot, thereby more evenly distributing the items that have caused collisions. 259 | - **Quadratic probing:** instead of using a constant “skip” value, we use a rehash function that increments the hash value by 1, 3, 5, 7, 9, and so on. This means that if the first hash value is h, the successive values are h+1, h+4, h+9, h+16, and so on. 260 | - **Chaining** allows many items to exist at the same location in the hash table. 261 | - When collisions happen, the item is still placed in the proper slot of the hash table. 262 | - As more and more items hash to the same location, the difficulty of searching for the item in the collection increases. 263 | ![](http://interactivepython.org/runestone/static/pythonds/_images/chaining.png) 264 | - The initial size for the hash table has to be a prime number so that the collision resolution algorithm can be as efficient as possible. 265 | 266 | #### 1.6 Trees 267 | 268 | * A tree data structure has its root at the top and its leaves on the bottom. 269 | * Three properties of tree: 270 | 1. we start at the top of the tree and follow a path made of circles and arrows all the way to the bottom. 271 | 2. all of the children of one node are independent of the children of another node. 272 | 3. each leaf node is unique. 273 | * **binary tree:** each node in the tree has a maximum of two children. 274 | * A **balanced binary tree** has roughly the same number of nodes in the left and right subtrees of the root. 275 | 276 | ##### 1.6.1 Tree Traversal: access the nodes of the tree 277 | 278 | - Tree traversal is the foundation of all tree related problems. 279 | - Here are a few different ways to traverse a tree: 280 | - BFS: Level-order 281 | - DFS: Pre-order, In-order, Post-order 282 | - [Details and Templates](./Templates/tree_traversal.md) 283 | 284 | 285 | ##### 1.6.2 Binary Search Tree (BST) 286 | 287 | - BST Property (left subtree < root < right subtree): 288 | 1. The value in each node must be `greater than (or equal to)` any values stored in its left subtree. 289 | 2. The value in each node must be `less than (or equal to)` any values stored in its right subtree. 290 | - `Inorder traversal` in BST will be in `ascending order`. Therefore, the inorder traversal is the most frequent used traversal method of a BST. 291 | - **successor:** the node that has the next-largest key in the tree 292 | - it has no more than one child 293 | - You could go over the [Leetcode Binary Search Tree topic](https://leetcode.com/explore/learn/card/introduction-to-data-structure-binary-search-tree/) for details 294 | 295 | ##### 1.6.3 Heap / Priority Queue / Binary Heap 296 | 297 | - **Priority Queue:** 298 | - the logical order of items inside a queue is determined by their priority. 299 | - The highest priority items are at the front of the queue and the lowest priority items are at the back. 300 | - **Binary Heap:** the classic way to implement a priority queue. 301 | - both enqueue and dequeue items are **O(logn)** 302 | - **min heap:** the smallest key is always at the front 303 | - **max heap:** the largest key value is always at the front 304 | - **complete binary tree:** a tree in which each level has all of its nodes (except the bottom level) 305 | - can be implemented using a single list 306 | - Because the tree is complete, the left child of a parent (at position **p**) is the node that is found in position **2p** in the list. Similarly, the right child of the parent is at position **2p+1** in the list. 307 | ![](http://interactivepython.org/runestone/static/pythonds/_images/heapOrder.png) 308 | - **heap order property:** In a heap, for every node **x** with parent **p**, the key in **p** is smaller than or equal to the key in **x**. 309 | * For example, the root of the tree must be the smallest item in the tree 310 | - When to use heap: 311 | - Priority Queue implementation 312 | - whenever need quick access to largest/smallest item 313 | - Instant access to the item 314 | - insertions are fast, allow in-place sorting 315 | - More details can be seen in [this discussion](https://stackoverflow.com/questions/749199/when-would-i-want-to-use-a-heap) 316 | 317 | ##### 1.6.4 More Trees 318 | 319 | - ***Parse tree*** can be used to represent real-world constructions like sentences or mathematical expressions. 320 | - A simple solution to keeping track of parents as we traverse the tree is to use a stack. 321 | - When we want to descend to a child of the current node, we first push the current node on the stack. 322 | - When we want to return to the parent of the current node, we pop the parent off the stack. 323 | - ***AVL Tree***: a balanced binary tree. the AVL is named for its inventors G.M. Adelson-Velskii and E.M. Landis. 324 | - For each node: *balanceFactor* = *height(leftSubTree)* − *height(rightSubTree)* 325 | - a subtree is left-heavy if *balance_factor > 0* 326 | - a subtree is right-heavy if *balance_factor < 0* 327 | - a subtree is perfectly in balance if *balance_factor = 0* 328 | - For simplicity we can define a tree to be in balance if the balance factor is -1, 0, or 1. 329 | - The number of nodes follows the pattern of *Fibonacci sequence*, as the number of elements get larger the ratio of Fi/Fi-1 closes to the golden ratio, so the time complexity is derived to be **O(log n)** 330 | - ***Red-Black Tree*** 331 | - [Details in Wiki](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree) 332 | - ***B+ Tree***: N-array tree 333 | - [Details in Wiki](https://en.wikipedia.org/wiki/B%2B_tree) 334 | - ***Trie*** 335 | - *This is a common data structure in interviews* 336 | - [Template](./Templates/trie.md) 337 | - ***Binary Index Tree (Fenwick Tree)*** 338 | - [Binary Index Tree (Fenwick Tree)](https://www.geeksforgeeks.org/binary-indexed-tree-or-fenwick-tree-2/) 339 | - [315. Count of Smaller Numbers After Self](https://leetcode.com/problems/count-of-smaller-numbers-after-self/) 340 | 341 | 342 | #### 1.7 Graph 343 | 344 | ##### 1.7.1 Vocabulary and Definitions 345 | 346 | - **Vertex (or Node):** the name is called "key" and the additional information is called "payload" 347 | - **Edge (or arc):** it connects two vertices to show that there is a relationship between them. 348 | - One way edge is called **directed graph (or digraph)** 349 | - **Weight:** edges maybe weighted to show that there is a coset to fo from one vertex to another. 350 | - **Path:** a sequence of vertices that are connected bny edges 351 | - Unweighted path length is the number of edges in the path, specifically n- 352 | - Weighted path is the sum of the weights of all the edges in the path 353 | - **Cycle:** a path that starts and ends at the same vertex 354 | - A graph with no cycles is called an **acyclic graph**. 355 | - A directed graph with no cycles is called a **directed acyclic graph (or DAG)** 356 | - **Graph:** a graph (G) is composed with a set of vertices (V) and edges (E) Each edge is a tuple of vertex and weight (v,w). G=(V,E) where w,v∈V 357 | 358 | ##### 1.7.2 Graph Representation 359 | 360 | - Adjacency Matrix (2D matrix) 361 | - Good when number of edges is large 362 | - Each of the rows and columns represent a vertex in the graph. 363 | - The value in the cell at the intersection of row v and column w indicates if there is an edge from vertex v to vertex w. It also represents the weight of the edge from vertex v to vertex w. 364 | - When two vertices are connected by an edge, we say that they are **adjacent** 365 | ![](http://interactivepython.org/runestone/static/pythonds/_images/adjMat.png) 366 | - **sparse:** most of the cells in the matrix are empty 367 | - Adjacency List 368 | - space-efficient way for implementation 369 | - keep a master list of all the vertices in the Graph object. each vertex is an element of the list with the vertex as ID and a list of its adjacent vertices as value 370 | ![](http://interactivepython.org/runestone/static/pythonds/_images/adjlist.png) 371 | 372 | ##### 1.7.3 Graph Algorithms 373 | 374 | - Graph traversal: BFS & DFS 375 | - [Template](./Templates/graph_traversal.md) 376 | - Graph Algorithms: 377 | - Shortest Path: 378 | - Dijkstra’s Algorithm (Single source point) 379 | - ***Essentially, this is a BFS using priority queue instead of queue*** 380 | - [Template](./Templates/dijkstra.md) 381 | - Floyd Warshall Algorithm (Multiple source point) 382 | - Topological Sort 383 | - [Template](./Templates/topological_sort.md) 384 | - Strongly Connected Components 385 | - [More Info](./Templates/graph_SCC.md) 386 | - Prim’s Spanning Tree Algorithm 387 | - [More Info](./Templates/prim_spanning_tree.md) 388 | 389 | ### Chapter 2: Common Algorithm Types 390 | 391 | #### 2.1 Brute Force 392 | 393 | - Most common algorithm 394 | - Whenever you are facing a problem without many clues, you should solve it using brute force first, and observe the process and try to optimize your solution 395 | 396 | #### 2.2 Search 397 | 398 | ##### 2.2.1 Sequential Search 399 | 400 | - Sequential Search: visit the stored value in a sequence (use loop) 401 | 402 | ##### 2.2.2 Binary Search 403 | 404 | - Examine the middle item of an ordered list 405 | - KEY is the search interval 406 | - [Template](./Templates/binary_search.md) 407 | 408 | #### 2.3 Sort 409 | 410 | ##### 2.3.1 Bubble Sort 411 | 412 | - Compares adjacent items and exchanges those that are out of order. 413 | - **Short bubble:** stop early if it finds that the list has become sorted. 414 | - time complexity: O(n2) 415 | 416 | ##### 2.3.2 Selection Sort 417 | 418 | - Looks for the largest value as it makes a pass and, after completing the pass, places it in the proper location. 419 | - time complexity: O(n2) 420 | 421 | ##### 2.3.3 Insertion Sort 422 | 423 | - Maintains a sorted sub-list in the lower positions of the list. 424 | - Each new item is then “inserted” back into the previous sub-list such that the sorted sub-list is one item larger. 425 | - time complexity: O(n2) 426 | 427 | ##### 2.3.4 Shell Sort 428 | 429 | - Breaks the original list into a number of smaller sub-lists, each of which is sorted using an insertion sort. 430 | - the shell sort uses an increment *i*, sometimes called the **gap**, to create a sub-list by choosing all items that are *i* items apart. 431 | - After all the sub-lists are sorted, it finally does a standard insertion sort 432 | - time complexity goes between O(n) and O(n2), by changing the increment, a shell sort can perform at O(n^(3/2)). 433 | 434 | ##### 2.3.5 Merge Sort 435 | 436 | - A recursive algorithm that continually splits a list in half. 437 | - [Details and Templates](./Templates/merge_sort.md) 438 | 439 | ##### 2.3.6 Quick Sort 440 | 441 | - First selects a value (**pivot value**), and then use this value to assist with splitting the list. 442 | - [Details and Templates](./Templates/quick_sort.md) 443 | 444 | ##### 2.3.7 Heap Sort 445 | 446 | - Use the property of heap to sort the list 447 | 448 | #### 2.4 Recursion 449 | 450 | **Recursion** is a method of solving problems that involves breaking a problem down into smaller and smaller sub-problems until you get to a small enough problem that it can be solved trivially. Usually recursion involves a function calling itself. 451 | 452 | Three Laws of Recursion: 453 | 454 | 1. A recursive algorithm must have a base case. 455 | 2. A recursive algorithm must change its state and move toward the base case. 456 | 3. A recursive algorithm must call itself, recursively. 457 | 458 | Recursive visualization: Fractal tree 459 | 460 | - A **fractal** is something that looks the same at all different levels of magnification. 461 | - A fractal tree: a small twig has the same shape and characteristics as a whole tree. 462 | 463 | ##### 2.4.1 Recursive function in Python 464 | 465 | * When a function is called in Python, a stack frame is allocated to handle the local variables of the function. 466 | * When the function returns, the return value is left on top of the stack for the calling function to access. 467 | * Even though we are calling the same function over and over, each call creates a new scope for the variables that are local to the function. 468 | 469 | #### 2.5 Backtracking 470 | 471 | - a general algorithm for finding all (or some) solutions to constraint satisfaction problems (i.e. chess, puzzles, crosswords, verbal arithmetic, Sudoku, etc) 472 | - [Template](./Templates/backtrack.md) 473 | 474 | 475 | #### 2.6 Dynamic Programming 476 | 477 | **Dynamic Programming (DP)** is an algorithm technique which is usually based on a recurrent formula and one (or some) starting states. 478 | - A sub-solution of the problem is constructed from previously found ones. 479 | - Usually used to find the extreme cases such as shortest path, best fit, smallest set, etc. 480 | 481 | #### 2.7 Divide and Conquer 482 | 483 | - **Divide**: break into non-overlapping sub-problems of the same type (of problem) 484 | - **Conquer**: solve sub-problems 485 | - the algorithm is to keep dividing and conquering, and finally combine them to get the solution 486 | - the algorithm can be written in recursive or loop 487 | 488 | #### 2.8 Greedy 489 | 490 | **Greedy algorithm:** 491 | 492 | - find a safe move first 493 | - prove safety 494 | - solve subproblem (which should be similar to original problem) 495 | - estimate running time 496 | 497 | **Optimization:** 498 | 499 | - assume everything is sorted (if not, maybe sort first) 500 | - decide sort order 501 | - the final running time can be O(n log n) (i.e. sort is O(log n), greedy move can be O(n)) 502 | 503 | - [More details](https://www.hackerearth.com/practice/algorithms/greedy/basics-of-greedy-algorithms/tutorial/) 504 | 505 | #### 2.9 Branch and Bound 506 | 507 | ### Chapter 3: Frequently Used Technics and Algorithms 508 | 509 | #### 3.1 Must know for interview 510 | 511 | - Matrix Traversal 512 | - Focusing on various ways of traversing 2D matrix 513 | - [Template](./Templates/matrix_traversal.md) 514 | - Sliding Window 515 | - ***Fundamentally this is a two pointer approach*** 516 | - [Template](./Templates/sliding_window.md) 517 | - Union find 518 | - Essentially its a list representation for the joint data points 519 | - [Template](./Templates/union_find.md) 520 | - Bit Manipulation 521 | - Prefix Sum 522 | - monotonic stack/queue 523 | - [Monotonic Stack template](./Templates/monotonic_stack.md) 524 | 525 | #### 3.2 Good to know but can be skipped 526 | 527 | - Segment Tree 528 | - Kadane's algorithm 529 | - Reservoir Sampling 530 | - Line sweep 531 | - KMP algorithm (pattern match) 532 | - Manacher (Longest Palindromic Substring) 533 | - Skip List 534 | 535 | ### Summary 536 | 537 | --- 538 | 539 | ## Section 2: System Design 540 | 541 | ### Chapter 4: System Design Interview Template 542 | 543 | System design questions can be very difficult to prepare, because it covers a wide range of areas. 544 | 545 | Here is a template I use for the system design interview: 546 | 547 | 1. Feature expectations (5 mins) - gather requirements: 548 | 549 | - Functional requirements: 550 | - Use cases 551 | - Scenarios that will NOT be covered 552 | - End-user (who will use it) 553 | - Capacity (how many people will use it, DAU (daily active user)) 554 | - How to use it 555 | - None-Functional requirements: 556 | - Scalability 557 | - Availability 558 | - Performance/Latency 559 | - Consistency 560 | - Durability/Fault-tolerant 561 | 562 | 2. Estimations (2-5 mins) - estimate scale: 563 | 564 | - Throughput (QPS for read and write queries) 565 | - Latency expected from the system (for read and write queries) 566 | - Read/Write ratio (heavy read, heavy write, or similar) 567 | - Traffic estimates (QPS for read and write) 568 | - Storage estimates (media files, text/photo/video) 569 | - Memory estimates 570 | - Cache: what is the kind of data we want to store in cache 571 | - How much RAM and how many machines 572 | - How much data stored on disk/ssd 573 | 574 | 3. High Level Design (5-10 min) - discuss a very high level with the interviewer: 575 | 576 | - System components (load balancer, services, cache, database, etc) 577 | - Database schema 578 | - APIs for Read/Write scenarios for crucial components 579 | - Request flow process (from client to database) 580 | 581 | 4. Deep Dive (15-20 mins) - focus on any part of the component: 582 | 583 | - Scaling individual component 584 | - Availability, Consistency and Scale story for each component 585 | - Consistency and availability patterns 586 | - Deep dive on any of the following component 587 | - DNS 588 | - CDN (Pull vs Push vs Hybrid) 589 | - Load Balancer/Reverse Proxy 590 | - LB types 591 | - LB algorithms 592 | - Application layer scaling (Microservice, Service Discovery, Service Mesh) 593 | - Database (RDBMS vs NoSQL) 594 | - RDBMS: 595 | - Leader-follower, Multi-leader, Leaderless, Federation, Sharding, Denormalization, SQL Tuning 596 | - NoSQL: 597 | - Key-Value, Wide-Column, Graph, Document 598 | - RAM [Bounded size] => Redis, Memcached 599 | - AP [Unbounded size] => Cassandra, RIAK, Voldemort 600 | - CP [Unbounded size] => HBase, MongoDB, Couchbase, DynamoDB 601 | - Caches: 602 | - Client caching, CDN caching, Webserver caching, Database caching, Application caching, Cache at Query level, Cache at Object level 603 | - Cache Patterns: 604 | - Cache aside 605 | - Write through 606 | - Write behind 607 | - Refresh ahead 608 | - Eviction policies: 609 | - LRU 610 | - LFU 611 | - FIFO 612 | - Asynchronism 613 | - Message queues 614 | - Task queues 615 | - Back pressure 616 | - Communication 617 | - TCP 618 | - UDP 619 | - REST 620 | - RPC 621 | 622 | 5. Justify (5 mins): 623 | 624 | - Throughput of each layer 625 | - Latency caused between each layer 626 | - Overall latency justification 627 | 628 | 629 | Notes: 630 | 631 | - Treat the system design as an actual work project, for which you have to gather and clear all the requirements and then do the design, and treat your interviewer as your colleague to discuss the trade offs for your design 632 | - Step 1 is the most important one, you'll need to know what you are about to build after all, and figure out all the requirements needed 633 | - Step 2 should be asked, but most of the time you may be asked to design a system as a startup (i.e. you don't have many users), and then scale as you have more customers. So you don't have to give a detailed analysis at the beginning, unless is specifically asked. 634 | - API design vs database schema design: you probably don't need to talk about both. DB design is asked more frequently in my experience. 635 | - The key in system design is talking about trade offs, why you selected certain technologies over others and what are the draw backs. 636 | 637 | 638 | Reference: 639 | 640 | - [System Design Template](https://leetcode.com/discuss/career/229177/My-System-Design-Template) 641 | - [System Design - InterviewBit](https://www.interviewbit.com/courses/system-design/) 642 | - [SNAKE - System Design Principles to crack a system design in 5 steps \| Bowen's blog](https://bowenli86.github.io/2016/06/28/system%20design/SNAKE-System-Design-Principles-to-crack-a-system-design-in-5-steps/) 643 | 644 | 645 | ### Chapter 5: System Design Components 646 | 647 | Network systems will eventually comes down to these components and design patterns, so it is critical to understand these components and be able to discuss design decisions and trade offs for any component. 648 | 649 | ![System Components](./images/System-Components.png) 650 | 651 | - [Internet Protocol Suite](./SystemDesign/internet_protocol_suite.md) 652 | - OSI model 653 | - Internet protocol suite 654 | - TCP, UDP, QUIC, SCTP, TCP/IP 655 | - HTTP, HTTPS 656 | - socket, websocket, long-polling 657 | - REST, SOAP 658 | - HTTP response status codes 659 | - [Load Balancer, Reverse Proxy, API Gateway](./SystemDesign/load_balancer.md) 660 | - LB types: layer 4 and layer 7 661 | - LB algorithms: least connection, least response time, least bandwidth, round robin, IP hash 662 | - Reverse Proxy 663 | - API Gateway 664 | - An example: The Architecture of Uber’s API gateway 665 | - [Cache](./SystemDesign/cache.md) 666 | - Cache Usage Pattern 667 | - Cache Aside 668 | - Cache-as-SoR (system-of-record): Read through, write through, write behind 669 | - Cache Eviction Policies 670 | - Redis vs Memcached 671 | - Data Store 672 | - Database Management Systems 673 | - Design Principals 674 | - [Replication & Partition](./SystemDesign/replication_partition.md) 675 | - Leader-follower replication, Sync/Async replication 676 | - Handling node outage 677 | - Replication logs 678 | - Eventual consistency 679 | - Multi-leader replication Topology, write conflict resolve 680 | - Leaderless replication, Quorum, sloppy quorum, hinted handoff 681 | - Key-value store partition 682 | - Local and Global index 683 | - Rebalancing partition 684 | - Coordination service, gossip protocol 685 | - [Transaction & Isolation](./SystemDesign/transaction_isolation.md) 686 | - ACID 687 | - Read committed 688 | - Read skew 689 | - Snapshot isolation 690 | - MVCC 691 | - Lost update 692 | - Write skew 693 | - Phantom 694 | - Two-phase locking (2PL): Shared lock, exclusive lock, predicate lock, index-range lock 695 | - Serializable Snapshot Isolation (SSI) 696 | - [Consistency & Consensus](./SystemDesign/consistency_consensus.md) 697 | - Linearizability 698 | - CAP theorem 699 | - Causal dependency, consistent with causality, causally consistent 700 | - Total order, partially ordered 701 | - Lamport timestamp 702 | - Total Order Broadcast 703 | - Fencing token: monotonically increasing number for lock 704 | - epoch number: monotonically increasing number for election 705 | - 2PC, 3PC, XA transaction 706 | - Major Types 707 | - [RDBMS](SystemDesign/RDBMS.md) 708 | - Postgres vs MySQL 709 | - [NoSQL](./SystemDesign/nosql_db.md) 710 | - NoSQL database types 711 | - Cassandra vs MongoDB 712 | - [Data Storage Systems](./SystemDesign/storage_system.md) 713 | - File Storage 714 | - Block Storage 715 | - Object Storage 716 | - HDFS and Map Reduce 717 | - Architectural Patterns 718 | 719 | Now put them together, here is something you should know: 720 | 721 | - [What happens when you navigate to an url](./SystemDesign/navigate_url.md) 722 | - [How to scale web app from monolithic to distributed](./SystemDesign/scale_web_app.md) 723 | 724 | ### Chapter 6: Classic Designs 725 | 726 | - notification system 727 | - rate limiter 728 | - top k problem 729 | - distributed message queue 730 | - distributed cache 731 | 732 | ### Chapter 7: System Design Case Study 733 | 734 | - chat system (slack, etc) 735 | - streaming system (youtube, etc) 736 | - map system (google map, etc) 737 | - booking system (ticket master, etc) 738 | - notification system 739 | - news feed 740 | - payment system 741 | - top k (recommendation system, etc) 742 | - url shortener 743 | - distributed web crawler 744 | - search auto-completion system 745 | - file system (dropbox, google drive) 746 | 747 | --- 748 | 749 | ## Section 3: Transferrable Skills and Offer 750 | 751 | ### Chapter 8: Behavioral Questions (BQ) 752 | 753 | #### 8.1 Four things to remember for the BQ 754 | 755 | - Behavioral Questions have been evaluated more and more in interviews, so make sure you are well prepared before you go to an interview 756 | - There are many many articles online talking about behavioral questions, so if you are looking for an answer to a specific question, just go ahead and search that question on Google and YouTube. 757 | - Be prepared to TALK ABOUT YOUR RESUME 758 | - Make sure you can answer anything you put on your resume, technologies, projects, experience, etc 759 | - use ***STAR*** to make your stories 760 | - situation: briefly describe the background 761 | - task: briefly describe what was needed to be done 762 | - action: describe what you did, focus on what YOU did 763 | - results: show the results, especially YOUR impact 764 | 765 | #### 8.2 How to prepare for BQ 766 | 767 | Follow these steps: 768 | 769 | 1. Prepare to talk about your resume 770 | - know all the technologies you've listed on your resume 771 | - be ready to explain why you quit each of your job (at least why quitted the most recent job) 772 | - be ready to talk about the projects you listed in your resume 773 | - technologies 774 | - challenges 775 | - YOUR impact, what did you do 776 | - collaborations 777 | 778 | 2. There are three questions you must prepare: 779 | 1. Introduce yourself (a good way to prepare is the elevator pitch, google it if you don't know) 780 | 2. Why ABC Company (i.e. why do you want to apply to/work for our company) 781 | 3. Why do you want to quit (or quitted) your job (if you ever had a job) 782 | 783 | 3. Go through the [Amazon leadership Principles](https://www.amazon.jobs/en/principles) 784 | - Prepare 2 - 3 stories for each principal, and you should be good for most of the interviews for ANY companies 785 | - [Amazon's 14 Leadership Principles Video via Jeff Bezo](https://www.youtube.com/watch?v=B-xdfQv3I1k) is really great 786 | - This list: [Amazon asks these 35 questions in 95% of job interviews](https://www.youtube.com/watch?v=dse8OTDlRcM&list=PLLucmoeZjtMTarjnBcV5qOuAI4lE5ZinV&index=18) should give you enough details for the most common questions 787 | - Make sure you note down the stories you prepared, and practice to talk to others about them 788 | 789 | 4. Make sure you give the following questions extra attention: 790 | - Your strength and Weakness 791 | - The most challenging problem you've solved or project you worked on 792 | - A follow up question could be if you are doing it again now, how would you do it differently 793 | - Conflict with your colleagues 794 | - Disagree with your colleagues/boss 795 | - Mistakes/Failures you made and what did you learn from it 796 | - Lead teams (if you are senior or manager) 797 | 798 | 5. Now you are prepared to answer questions, but you'll also need to prepare questions to ASK 799 | - Ask good questions will show the interviewers that you are interested in the company, the position, the job itself, and possibly your professionalism 800 | - Ask poking questions about the team and technology 801 | - Programming languages (Python, Java, etc) 802 | - Development technologies (Docker, K8s, etc) 803 | - Frameworks (Django, Spring, etc) and their versions (from the versions you'll know how up-to-date their tech stack is ) 804 | - development tools (IDEs, OS, Cloud providers, etc) 805 | - Generic questions 806 | - What is a day like in your company (this may seem too generic, but is quite important). For example, what's the sprint like (do you have sprints), do you have standup (frequency and time), how many working hours per week, when do you start your day, and much more. Pick those that you are most interested in 807 | - What's the team like, what's the tech stack for the team, how many BE/FE/QA etc 808 | - These kinda questions show that you are really interested in the job and team. You'll need to know this info anyway if there will be an offer and you choose to accept it 809 | - Interview related questions 810 | - What's the interview process like (sometimes the HR/interviewer will let you know clearly, if not you should ask), how many rounds, etc 811 | - Its even possible to poke the potential questions: what areas will the interview cover (algorithms, system design, take home project, etc) 812 | - Its ok to ask, its your HR's decision whether to tell you 813 | - This article shows you [How to predict your interview questions](https://interviewgenie.com/blog-1/2020/5/4/how-to-predict-your-interview-questions) 814 | - Ask any questions that you might be interested through your interview 815 | - For example, when certain technologies were mention in your interview, you may ask your interview how those technologies are used in the company 816 | - **NOTE** this step is very important, it will not only show that you are interested in the company and the position, but also give you a chance to learn the company culture and tech stack, and then you can decide if you really want to work for this company or not. 817 | - From the technical perspective, ask the tech stack and even specific versions will let you know if the team has lots of tech debt. Discussing certain technologies will also show that you are strong in the area 818 | - From the company culture perspective, how your interviewers dealing with certain situations will give you a sneak peak of how company operates and what the company culture looks like 819 | 820 | - Here are some really good resources for you to prepare: 821 | - AGAIN if you wanna prepare for certain questions, there are lots examples online, just google, and use Youtube 822 | - There are also interview tips from the big company's website, make sure you check them out 823 | - [Leetcode Interview Thoughts Amazon and Google](https://leetcode.com/discuss/interview-question/455991/i-got-an-offer-from-amazon-sde-i-and-google-l3-heres-my-thoughts) 824 | - [How to sell yourself in interviews — Interview Genie](https://interviewgenie.com/blog-1/2018/6/6/how-to-sell-yourself-in-interviews) 825 | - [How to answer interview questions about the Amazon leadership principle “Frugality” — Interview Genie](https://interviewgenie.com/blog-1/2019/4/9/how-to-answer-amazon-frugality-interview-questions) 826 | 827 | - **One more thing**. During your daily work, make notes on the achievements you had. Write down the details in the STAR format as mentioned above so you won't forget it when you need it, but make sure you don't leak any sensitive data! 828 | 829 | ### Chapter 9: Offer Negotiation 830 | 831 | - Congratulations you got an offer!! But should you accept it immediately? 832 | - Let me put it this way, its for your own sake to negotiate the offer 833 | - Offer Negotiation will not only show that you are seriously considering joining the company, but also will make you happy when you actually accept the offer 834 | - It is really unlikely that the company revoke the offer if you negotiated, but it is possible. Frankly speaking, do you really want to work at a place where you can't ask for anything? 835 | - There is really only ONE resource that I'd like to share: [Ten Rules for Negotiating a Job Offer](https://haseebq.com/my-ten-rules-for-negotiating-a-job-offer/). Read it carefully and thoroughly, and you are good to go 836 | 837 | --- 838 | 839 | ## Appendix 1: Question & Answer 840 | 841 | **DISCLAIMER:** These QAs are my personal opinions and experience. They are not guaranteed to be the perfect solution to the question, but is something I found really helpful from my own experience. 842 | 843 | **NOTE:** You should read these QAs first before jumping into the content and resources, since these answers may save you lots time preparing the interview and potentially help you ace the interview. 844 | 845 | ### A1.1 Technical Questions 846 | 847 | #### A1.1.1 How to use LeetCode as a beginner 848 | 849 | - First of all, if you don't know what LeetCode is, google it and thank me later. 850 | 851 | - As a beginner or new to algorithm questions, LeetCode can get overwhelming because there are almost 2000 (at the time of writing) problems! 852 | 853 | - If you are new to algorithm and data structure, go the "**Explore**" tab on the top navigation bar, and go to the "**Learn**" row and learn all of them. 854 | 855 | - If you already know all the data structures, and would like to practice, do the questions from the tags, and do them from easy to hard 856 | - Note that most of the companies rarely test hard ones, but some highly frequent hard problems come up more often recently 857 | 858 | - If you are really familiar with all the data structures and common algorithms, do the problems randomly, so you can think the best data structure for solving the problems most efficiently 859 | 860 | - If you are time sensitive/critical (i.e. you have an interview in the near future or you are actively looking for jobs), do the company based questions (LeetCode premium feature) 861 | 862 | #### A1.1.2 How to solve LeetCode problem EFFECTIVELY 863 | 864 | **Rule of thumb: make every question count!** 865 | 866 | What I mean is that you have to really understand the question after you've solved it. 867 | 868 | It doesn't really matter if you solved it by yourself or looked at the answers. 869 | 870 | Here are a list of ***CRITICAL*** things you have to always think about when you are working on problems: 871 | 872 | - What's the best data structure/s to solve this problem 873 | - What's the time and space complexity (Big O's) 874 | - What's the tradeoff for the current approach (i.e. more space or more time) 875 | 876 | After solving the question (again whether you solved it yourself or looked at solution/discussion): 877 | 878 | - Is your solution the best way to solve it, if not is there a way to optimize your solution 879 | - If you can't solve it yourself, what was the reason? 880 | - Have you seen the data structure/algorithm before? 881 | - If not you should stop working on more problems and study that immediately 882 | - If so you should practice more on this type of problems 883 | - Is there any tricks when solving the problem? 884 | - If not just keep practicing 885 | - If so NOTE it! 886 | - Did you have no clue when seeing the problem? 887 | - Practice more of this type of problems, and summarize the solution for each problem solved 888 | 889 | If you strictly do above things when you are working on the problem and after you solved it, its just a matter of time until you are an expert. 890 | 891 | #### A1.1.3 How to solve LeetCode problem EFFICIENTLY 892 | 893 | **Rule of thumb: Don't work on a single problem for too long, and don't be afraid to look at the solution!** 894 | 895 | I know many people don't wanna look for a solution if they can't solve the problem, but spending too much time (i.e hours) on a single problem isn't efficient at all! After all you only have 24 hours a day. 896 | 897 | So here is what you should do: 898 | 899 | - If you have no clue at all after reading the question, look at solution directly 900 | - It may sounds a little cheese but this is the most efficient way cause you'll probably have no clue after 1 hour. Once you have solved enough questions this won't happen 901 | - If you have some clue but not sure how to do it, then spend some time work on it 902 | - Normally spend like 15 - 30 mins, and if you still can't solve it, look at the solution 903 | - If you have an idea on how to solve it, do it! 904 | - For this case, spend as much time as you need, even its one hour! 905 | - The reason for that is you know how to solve it, but you are not really familiar on the approach so you need more practice. By solving it on your own after many try and errors, you should be very familiar with this question and you should be able to solve it very quickly next time 906 | 907 | To summarize, there are really just two points: 908 | 909 | 1. Don't be afraid at looking at the solution 910 | 2. If you are blocked, see point 1 911 | 912 | #### A1.1.4 Pay close attention to these when solving problem (gain max value of leetcode problem) 913 | 914 | **Rule of thumb: Consider edge cases, explain it step by step, analyze complexities, walk through code for test case** 915 | 916 | To gain the max value of leetcode problems, you need to do more than just solve the problem. 917 | 918 | Here are a few things you need to pay close attention when solving the problem, because doing so will get you prepared for an interview better: 919 | 920 | - When first see the problem, ask many questions like boundaries, some edge case etc. 921 | - Leetcode problems are quite straightforward, it shows you pretty much everything. However in the interview you'll have to work with the interviewer to get all the details of the problem you are about to solve. Make sure you fully understand the question and is aware of the boundaries and any possible edge cases 922 | - When solving the problem, don't just jump right in writing the code, but try to explain what you are about to do first by writing some pseudo code to illustrate your thinking process. 923 | - Doing so will allow your interviewer to understand your approach, and possibly correct you (or guide you) to the right path 924 | - You can actually follow this process when working on the leetcode problems. For example, you can first write down the pseudo code as comments, and then fill in the actual code 925 | - Since communication is also a really important factor in the interview process, this explanation step will greatly prepare you for an actual interview 926 | - You can ask for the space and time complexities prior to solving the problem, but most of the time you should mention this after finishing the problem. 927 | - Make sure you analyze both time and space complexity 928 | - Last but not least, once you have the solution, make sure to walk through your code with an test case. 929 | - Believe it or not, a lot people cannot debug their own code! 930 | - Doing so will also show a sign that you review your code first before pushing it out, which is something you should do on your daily job 931 | 932 | #### A1.1.5 Why you should use a template for the algorithm and data structures 933 | 934 | Should you use templates? Many people have asked this question, and the answer is always YES. 935 | 936 | Here are the reasons: 937 | 938 | - Interview is a stress process and time restricted, remember its time restricted, so knowing the template will enable you to focus on solving the problem and communicating with the interviewer about your thinking process 939 | - Some the algorithms may look easy, but really difficult to implement correct due to the various boundary/edge cases (such as binary search), so knowing a template will enable you to write some bug-free code more easily 940 | - Templates are summarized from solving many problems, so it's easier and more efficient to learn from the templates 941 | - It is also easier to pick up and get prepared for interview if you haven't had one for more than a year when you have some templates 942 | - Some algorithms are difficult to implement, or at least to implement nicely. Having a good template will make your code looks much better 943 | - Templates are the main reason for this repo :) 944 | 945 | #### A1.1.6 What should I do if I lost confident when practicing leetcode 946 | 947 | I know it can be super frustrating when you first started on the algorithm and data structure problems. 948 | 949 | I've been there and I know how it feels. 950 | 951 | Here is the things you must know: 952 | 953 | - Any algorithms with names, they are not meant to be figured out by yourself easily. Check out those algorithms on wiki, you can see the history. This is why we need the template, and why we need to study for them. 954 | - Think of the algorithms and data structures as a math course, and the interview to be a final exam. You need to learn the formula (i.e. each algorithm and data structure) one by one, and interview is just a way to test some of them. It is totally fine not to know all of them at first, what's important is to learn them step by step 955 | - Build your confidence step by step 956 | - this is why you need to solve problems based on TAGs, and solve them from easy to hard 957 | - know a template, so you can start programming at least 958 | - You are not alone, we all feel the same, so don't worry, just keep working! 959 | 960 | You should also refer to other QA questions for more details. 961 | 962 | #### A1.1.7 I still can't solve new problems even if I finished x number of problems on LeetCode 963 | 964 | This can be a common thing, and it means you weren't using Leetcode very effectively and efficiently. 965 | 966 | Basically you should refer to other QA questions for more details. 967 | 968 | Here is a short summary: 969 | 970 | - Make notes on the problem you solved, and revisit them later 971 | - Its ok to work on the same problem multiple times in different time, practice makes perfect after all 972 | - AGAIN, solve problems by tags, so you can master one type at a time 973 | - Start with templates, and go beyond the templates so you can solve slightly varied problems but in the same category 974 | - Once you are familiar with the basic algorithms and data structures, medium and hard problems and just combination of two or three algorithms/data structures 975 | 976 | ### A1.2 Interview Questions 977 | 978 | #### A1.2.1 What's the interview process look like 979 | 980 | The interview process normally follows the following steps, but of course each company is different: 981 | 982 | 1. You'll get an email asking you to finish an Online Assessment (OA) quiz, normally algorithm questions or simple knowledge test for the company's tech stack 983 | - Note this is mostly used by large companies (and some mid-size companies). Small companies rarely has it. 984 | 2. You got contacted by the company that there will be a screening phone interview 985 | - This step normally is just talking about yourself and go over your resume/experience, and learn more about the company and position 986 | 3. A technical phone interview. 987 | - This could be anything, from algorithm questions, to language/framework features. 988 | - Each company is different. 989 | 4. Another round of technical phone interview or take home quiz/project 990 | - If this is not a take home quiz/project, then its referred as the "On-site" where you spend many hours to finish many rounds of interviews (3-5 rounds depending on the company and experience level) 991 | 5. This step varies, could be another technical interview, or company culture interview 992 | - Again not all company has this step, but if you got one then its mostly just to check the candidate with the company culture, or some high level open technical discussions. 993 | 6. Offer! 994 | 995 | Above steps is just a summary from my own experience (and from what I learnt from my friends). 996 | 997 | DO ask the interviewer/hr the interview process when you got contacted! 998 | 999 | #### A1.2.2 How to write an effective resume 1000 | 1001 | Well it's all about years of experience (YOE). Here are some tips you should know: 1002 | 1003 | - If you have less experience, you should keep your resume in 1 page. If you have lots of experience, you should keep it in 2 pages. just don't go over two pages. 1004 | - You can have more details in the more recent experience, and less details in the early-year experience. 1005 | - Try tailor your resume towards the job posting requirements, especially making sure you have the KEYWORDS aligned with the job posting. 1006 | - Make sure you emphasize what YOU did and YOUR contribution to the projects listed 1007 | - I know it's often hard, but try give specific analytical numbers (i.e. improved efficiency by 200%) 1008 | - Highlight your skills and achievements 1009 | - DOUBLE CHECK SPELLING AND GRAMMAR, have another person proofread for you 1010 | 1011 | #### A1.2.3 I have applied to many jobs but still no interview 1012 | 1013 | - If you are a new grad, this is common. It's difficult for a new grad to find job pretty much in any industries. My advise is to work on your resume as much as you can, make sure each resume is targeted to the job posting really well, write cover letter to further show your enthusiastic about the job, and keep applying. 1014 | - If you have a few years of experience, make sure your resume is great. 1015 | - Make sure your resume is up-to-date, no silly spell and grammar issues 1016 | - Make sure you have concentrated on your contributions in your resume 1017 | - Make sure you have highlighted your skills and achievements 1018 | - Maybe find a professional service to help you with your resume if it's still not as polished as you want it to be 1019 | - keep applying, you'll never know if there is a good opportunity waiting for you 1020 | 1021 | #### A1.2.4 How to solve an algorithm/data structure problem in interview 1022 | 1023 | In other QAs I explained how to grind Leetcode efficiently and effectively. 1024 | 1025 | With enough practice, you got an interview, here are some tips that can increase your success rate. 1026 | 1027 | - Make sure you ask clarifications at the beginning. What are the inputs, input types, what are given, what should be the output etc 1028 | - Before writing code, THINK OUT LOUD 1029 | - Discuss your thinking process with your interviewer 1030 | - what data structures you choose, why you choose them, what are the steps to implement 1031 | - During the process, write them down. For example, write them as comments, step by step 1032 | - Once you interviewer knows your thoughts, ask if its ok to code, then code. Sometime you may even able to skip the coding entirely. 1033 | - Be ready to discuss time and space complexity 1034 | - Be ready for the follow up questions 1035 | 1036 | ### A1.3 General Questions 1037 | 1038 | #### A1.3.1 Large Company VS Small Company 1039 | 1040 | The large companies such as the "Big Five" (Google, Amazon, Facebook, Apple, Microsoft) or the FAANG (Facebook, Amazon, Apple, Netflix, Google) is very attractive to developers, but there are also many smaller companies and startups which makes the majority of the market. 1041 | 1042 | There are also FAANGMULA or FAANGULTAD, feel free to look them up. 1043 | 1044 | The debate of whether working for a large company or a small company is ongoing and probably will never stop, since they can be quite different. 1045 | 1046 | The short conclusion is that: 1047 | 1048 | - Large company 1049 | - Large company is a big platform where you can build your network very effectively and gain an insight of how the large company and complex architecture works, 1050 | - big companies have their own tech stack that you'll have to learn, and can't use them elsewhere so its important to know how they work under the hood 1051 | - the down side is that there might be micromanagement and office politics (to some degree at least), it is difficult to learn the full development cycle, software development process maybe long and cumbersome, and promotion could take long time but has a clear path 1052 | - Small company 1053 | - small company allows you to learn the full development cycle, gain a lot of project experience quickly, and release a complete product from idea to production. 1054 | - You'll be spending the majority of your time coding and reviewing code, so the development cycle and shipping is pretty fast. 1055 | - small companies mostly use open source tools and frequently keep the tech stack updated with latest technologies. 1056 | - the scope of the projects may be smaller, and the promotion may be faster, but the career path may not be very clear 1057 | - The down side is that your company is not well know so your skills could potentially be questioned, building connections is a bit more difficult, and the company structure and process may be a chaos for a period of time 1058 | 1059 | I've created a table for you. (It is subjective and depending on different teams, numbers are not accurate but to give you an idea) 1060 | 1061 | | | Large Company | Small Company | 1062 | | :-: | :-----------: | :-----------: | 1063 | | Networking | Easy to connect to highly talented people | Have to work on networking | 1064 | | Programming (varies with different comp) | less | more | 1065 | | Feature release process | could have many rounds of review and approval process | normally just peer review | 1066 | | tech stack | depends on different teams, lots of internal tools | depends on project and the company culture, frequently seen popular open-source tools | 1067 | | Interview process | Many rounds, from 5 to 10+ | normally 3-5 rounds, sometimes have a take home project | 1068 | | Interview Content | Mostly focus on algorithm/data structure problems | Mostly about hands on experience on the company tech stack | 1069 | | Career Path | Clear but take longer time | Not super clear but could be promoted to high level (c-level is also possible) | 1070 | 1071 | 1072 | 1073 | #### A1.3.2 How to get your FIRST job! (How to become more competitive among the candidates) 1074 | 1075 | To begin with, this is not limited to how to get the first job, but is meant to show you how to stand out of the crowd. 1076 | 1077 | **Rule of thumb: build your reputation, gain more project experience, networking!!** 1078 | 1079 | Essentially, you'll need to build your reputation. How to do that? There are a few ways: 1080 | 1081 | - **Attending meetups!** First and for most! 1082 | - Meetups, especially local meetups give you the opportunities to directly talk to a person who has a job, and whose company might be hiring. So if you go to meetups frequently and make connections, job opportunities will come to you! 1083 | 1084 | - Make sure you resume and **LinkedIn profile** (especially this one!) is up-to-date! 1085 | - The recruiters and agents are quite active on LinkedIn, and I received a lot of interests from them. 1086 | 1087 | - Gain more project experience. **Either contribute to open source project or make your own personal project.** 1088 | - This is not only for new grads, but also for those who have a few years of experience but has no project to show. 1089 | - I'm sure you all have lots of project experience from work, but I'm afraid you can't show them to the interviewer! The personal project is a way to SHOW OFF your skills and codes, and a way to convince the HR/interviewer that you are a strong candidate 1090 | - NOTE: most of the time HR/interviewer will not look at your pet project, but you can still let them know and probably brag about it during your interview, it'll always add scores! 1091 | - At last, make sure your pet project has quality code and follow standards! Otherwise it won't help. 1092 | 1093 | - Apply to many jobs and **don't shy to ask for help**. 1094 | - **Don't restrict yourself!** For first job, make sure you are fully prepared and apply to all companies that you can apply to, don't limit yourself to only large companies or certain companies. After all, once you had your first job, it'll be easier to find the second one, third one, etc 1095 | - Ask your network for potential job opportunities. Don't be shy to ask around, but make sure you are polite and not harassing people. It doesn't hurt to ask and sometimes the result may surprise you. 1096 | 1097 | - **Don't give up!** If you can't find a job or even receive any responses, **its not your fault**. 1098 | - Most companies won't response if you are not a good match nowadays. I know it doesn't feel good and you feel like you are just hanging there, but be aware that its not your fault. 1099 | 1100 | A lot times finding jobs is more of luck than anything else, so be prepared, be patient, and don't give up! 1101 | 1102 | Good luck!!! 1103 | 1104 | 1105 | ## Appendix 2: Resources 1106 | 1107 | ### A2.1 Learning Experience 1108 | 1109 | - [Everything About Python — Beginner To Advanced](https://medium.com/fintechexplained/everything-about-python-from-beginner-to-advance-level-227d52ef32d2) 1110 | - [Coding Interview University](https://github.com/jwasham/coding-interview-university) 1111 | - [GitHub - TheAlgorithms/Python: All Algorithms implemented in Python](https://github.com/TheAlgorithms/Python) 1112 | - [VisuAlgo - visualising data structures and algorithms through animation](https://visualgo.net/en) 1113 | 1114 | #### A2.1.1 Online MOOC courses 1115 | 1116 | - [CS 61A: Structure and Interpretation of Computer Programs](https://cs61a.org/) 1117 | - [CS 61B: Data Structures Spring 2019](https://sp19.datastructur.es/) 1118 | - [CS 61C: Computer Architecture (Machine Structures)](https://cs61c.org/) 1119 | - [MIT6.046: Design and Analysis of Algorithms](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-design-and-analysis-of-algorithms-spring-2015/) 1120 | - [MIT6.824: Distributed Systems](https://pdos.csail.mit.edu/6.824/) 1121 | 1122 | 1123 | ### A2.2 How to solve Algorithm Questions 1124 | 1125 | - [Fucking Algorithm - Algorithm template - Java, Python - English, Chinese](https://github.com/labuladong/fucking-algorithm/tree/english) 1126 | - [Algorithm Pattern - Algorithm template - Go - Chinese](https://github.com/greyireland/algorithm-pattern) 1127 | - [Hello Algorithm - Leetcode Solution - Java - English, Chinese](https://github.com/geekxh/hello-algorithm) 1128 | - [LeetCode Topics and Interview Questions Collection - Leetcode Solution - Java - English, Chinese](https://github.com/yuanguangxin/LeetCode) 1129 | - [Top 10 algorithms in Interview Questions](https://www.geeksforgeeks.org/top-10-algorithms-in-interview-questions/#algo3) 1130 | - [CHEATSHEET: LEETCODE COMMON TEMPLATES & COMMON CODE PROBLEMS - English](https://cheatsheet.dennyzhang.com/cheatsheet-leetcode-a4) 1131 | - [LeetCode 101 - A LeetCode Grinding Guide (C++ Version) - Chinese](https://github.com/changgyhub/leetcode_101/blob/master/LeetCode%20101%20-%20A%20LeetCode%20Grinding%20Guide%20(C%2B%2B%20Version).pdf) 1132 | 1133 | ### A2.3 OOD (Object Oriented Design) 1134 | 1135 | #### A2.3.1 SOLID Principals 1136 | 1137 | - [S.O.L.I.D. Principles of Object-Oriented Design - A Tutorial on Object-Oriented Design](https://www.youtube.com/watch?v=GtZtQ2VFweA&ab_channel=LaraconEU) 1138 | - [Understanding the Single Responsibility Principle](https://www.youtube.com/watch?v=L2m-S0Pj_Xk&ab_channel=edutechional) 1139 | - [Understanding the Open Closed Principle](https://www.youtube.com/watch?v=Ryhy7333mqQ&ab_channel=edutechional) 1140 | - [Understanding the Liskov Substitution Principle](https://www.youtube.com/watch?v=Mmy1EUKC_iE&ab_channel=edutechional) 1141 | - [OOP Design Principles: Interface Segregation Principle](https://www.youtube.com/watch?v=Ye1h3zKl1lg&ab_channel=edutechional) 1142 | - [OOP Design Principles: Dependency Inversion Principle](https://www.youtube.com/watch?v=qL2-5g_lJTs&ab_channel=edutechional) 1143 | - [Refactoring From Trash to SOLID](https://medium.com/swlh/refactoring-from-trash-to-solid-74b10005ccd3) 1144 | 1145 | #### A2.3.2 Clean Code - Uncle Bob lessons 1146 | 1147 | Uncle Bob, whose a software engineer and introduced teh S.O.L.I.D. principals for clean code writing. 1148 | 1149 | Here is a recent series of his open talks and I feel its valuable to spend time watching them at least once. 1150 | 1151 | If you don't want to read the book, at least you should watch this series 1152 | 1153 | - [Clean Code - Uncle Bob / Lesson 1: SOLID principals, refactor, DRY](https://www.youtube.com/watch?v=7EmboKQH8lM&ab_channel=UnityCoin) 1154 | - [Clean Code - Uncle Bob / Lesson 2: Comments, docs, naming, reviews](https://www.youtube.com/watch?v=2a_ytyt9sf8&ab_channel=UnityCoin) 1155 | - [Clean Code - Uncle Bob / Lesson 3: Software growth, QA, teamwork](https://www.youtube.com/watch?v=Qjywrq2gM8o&ab_channel=UnityCoin) 1156 | - [Clean Code - Uncle Bob / Lesson 4: TDD](https://www.youtube.com/watch?v=58jGpV2Cg50&ab_channel=UnityCoin) 1157 | - [Clean Code - Uncle Bob / Lesson 5: Architecture, project development](https://www.youtube.com/watch?v=sn0aFEMVTpA&ab_channel=UnityCoin) 1158 | - [Clean Code - Uncle Bob / Lesson 6: project management](https://www.youtube.com/watch?v=l-gF0vDhJVI&ab_channel=UnityCoin) 1159 | 1160 | ### A2.4 Design Patterns 1161 | 1162 | - [Design Patterns](https://www.tutorialspoint.com/design_pattern/filter_pattern.htm) 1163 | - You all need to learn the design patterns eventually 1164 | - [Design Patterns in Python](https://github.com/faif/python-patterns) 1165 | 1166 | 1167 | ### A2.5 Async in Python 1168 | 1169 | - [Demystifying Python's Async and Await Keywords](https://www.youtube.com/watch?v=F19R_M4Nay4&ab_channel=JetBrainsTV) 1170 | - [Thinking Outside the GIL with AsyncIO and Multiprocessing - PyCon 2018](https://www.youtube.com/watch?v=0kXaLh8Fz3k&ab_channel=PyCon2018) 1171 | - [Advanced asyncio: Solving Real-world Production Problems - PyCon 2019](https://www.youtube.com/watch?v=bckD_GK80oY&ab_channel=PyCon2019) 1172 | 1173 | 1174 | ### A2.6 System Design 1175 | 1176 | - [System Design Interview](https://www.youtube.com/c/SystemDesignInterview/videos) 1177 | - MUST see thoroughly, greatly explained by a Senior Amazon engineer, this is what you should expect in an interview 1178 | - [The System Design Primer](https://github.com/donnemartin/system-design-primer) 1179 | - The repo with explanations, examples, and study cases 1180 | - [Distributed systems for fun and profit](http://book.mixu.net/distsys/single-page.html) 1181 | - About 100 pages free book 1182 | - [System Design Cheatsheet](https://gist.github.com/vasanthk/485d1c25737e8e72759f) 1183 | - [System Design Cheatsheet - Guvi - Medium](https://medium.com/guvi/system-design-cheatsheet-251c6fe7f20c) 1184 | - [CheatSheet: System Design For Job Interview – CheatSheet](https://cheatsheet.dennyzhang.com/cheatsheet-systemdesign-a4) 1185 | - [GitHub - puncsky/system-design-and-architecture: Learn how to design large-scale systems. Prep for the system design interview.](https://github.com/puncsky/system-design-and-architecture) 1186 | - [GitHub - checkcheckzz/system-design-interview: System design interview for IT companies](https://github.com/checkcheckzz/system-design-interview) 1187 | - [High Performance Browser Networking (O'Reilly)](https://hpbn.co/) 1188 | - [GitHub - binhnguyennus/awesome-scalability: The Patterns of Scalable, Reliable, and Performant Large-Scale Systems](https://github.com/binhnguyennus/awesome-scalability) 1189 | - [The Architecture of Open Source Applications (Volume 2): Scalable Web Architecture and Distributed Systems](http://www.aosabook.org/en/distsys.html) 1190 | 1191 | This repo has the full list of company engineering blogs: 1192 | 1193 | - [Engineering Blogs](https://github.com/kilimchoi/engineering-blogs) 1194 | 1195 | Papers: 1196 | 1197 | - [Google MapReduce: Simplified Data Processing on Large Clusters](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf) 1198 | - [The Google File System](https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf) 1199 | - [TAO: Facebook’s Distributed Data Store for the Social Graph](https://www.usenix.org/system/files/conference/atc13/atc13-bronson.pdf) 1200 | - [Dynamo: Amazon’s Highly Available Key-value Store](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) 1201 | 1202 | ### A2.7 Machine Learning 1203 | 1204 | - [100 Days of Machine Learning Coding](https://github.com/Avik-Jain/100-Days-Of-ML-Code) 1205 | 1206 | 1207 | ### A2.8 Reinforcement Learning 1208 | 1209 | - [Reinforcement Learning Methods and Tutorials](https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow) 1210 | 1211 | --- 1212 | 1213 | ## Postface 1214 | 1215 | The content ends here, but the learn never stops. 1216 | 1217 | If you like my content, feel free to star/like/fork on github. 1218 | 1219 | [Here is my patron page](https://www.patreon.com/CZTechHut), any support is much appreciated and will motivate me a lot in creating more content. 1220 | 1221 | Thanks again and best wishes. 1222 | -------------------------------------------------------------------------------- /SystemDesign/RDBMS.md: -------------------------------------------------------------------------------- 1 | # RDBMS (Relational Database Management System) 2 | 3 | - Relational databases are normally row based 4 | - Postgres and MySQL are most widely used 5 | 6 | ## Postgres vs MySQL 7 | 8 | - Postgres: 9 | - object-relational database 10 | - open source, easy to install, highly extensible 11 | - implements Multi-version Concurrency Control (MVCC) without read locks 12 | - protecting data integrity at the transaction level 13 | - MySQL 14 | - purely relational database 15 | - most popular 16 | - better performance on large scale of data (larger than millions) 17 | 18 | 19 | 20 | Reference 21 | 22 | - [MySQL vs PostgreSQL -- Choose the Right Database for Your Project \| Okta Developer](https://developer.okta.com/blog/2019/07/19/mysql-vs-postgres#:~:text=Postgres%20is%20an%20object%2Drelational,more%20closely%20to%20SQL%20standards.) 23 | - [Why Uber Engineering Switched from Postgres to MySQL \| Uber Engineering Blog](https://eng.uber.com/postgres-to-mysql-migration/) 24 | 25 | -------------------------------------------------------------------------------- /SystemDesign/cache.md: -------------------------------------------------------------------------------- 1 | # Cache 2 | 3 | When talking about cache, here we are talking about web development related cache. 4 | 5 | [Here is a good article to start with](https://www.digitalocean.com/community/tutorials/web-caching-basics-terminology-http-headers-and-caching-strategies) 6 | 7 | Here we are focusing on database cache. 8 | 9 | - Cache Usage Pattern 10 | - **Cache Aside**: application is responsible for reading and writing from the database and the cache doesn't interact with the database at all 11 | - Application queries data from cache first 12 | - if cache contains data return directly bypasses database 13 | - if not fetch from database, then stores in cache 14 | - The most common cache-aside systems are Memcached and Redis 15 | - **Cache-as-SoR (system-of-record)**: the application treats cache as the main data store and reads data from it and writes data to it 16 | - **Read through** 17 | - the cache is configured with a loader component that knows how to load data from the database 18 | - if an entry does not exist within the cache, the cache invokes the loader to retrieve the value from the database, then caches the value, then returns it to the caller. 19 | - **Write through** 20 | - the cache is configured with a writer component that knows how to write data to database 21 | - When the cache is asked to store a value for a key, the cache invokes the writer to store the value in the SoR, as well as updating the cache. 22 | - **Write behind** 23 | - Similar to write behind, rather than writing to the database while the thread making the update waits (as with write-through), write-behind queues the data for writing at a later time. 24 | - Cache Eviction Policies 25 | - LRU: least recently used 26 | - LFU: least frequently used 27 | - FIFO: first in first out 28 | - LIFO: last in first out 29 | - FILO: first in last out 30 | - and many many more 31 | 32 | 33 | ## Redis 34 | 35 | ### Memory management 36 | 37 | - Redis only caches all the key information. Not all data storage occurs in memory 38 | - When the physical memory is full, Redis may swap values not used for a long time to the disk. 39 | - When the memory usage exceeds the threshold value, Redis will trigger the swap operation. 40 | - Redis calculates the values for the keys to be swapped to the disk based on *“swappability = age\*log(size_in_memory)”* 41 | - The machine memory must keep all the keys and it will not swap all the data. 42 | - When Redis swaps the in-memory data to the disk, the main thread that provides services, and the child thread for the swap operation will share this part of memory. 43 | - So, if you update the data you intend to swap, Redis will block this operation, preventing the execution of such a change until the child thread completes the swap operation. 44 | 45 | ### Multi-threading 46 | 47 | - Redis was known to be single threaded, but now its changed 48 | - After Redis 4.0, it also has background threads to process slow operations such as clean up, releasing useless connections, bulk delete, etc 49 | - Redis 6.0 version supporting multi-threading was finally released on 2020-05-02 50 | - There are two main directions for optimization: 51 | - To improve network IO performance, typical implementations such as using DPDK to replace the kernel network stack. 52 | - Use multi-threading to make full use of multi-core, typical implementations such as Memcached. 53 | 54 | 55 | 56 | ## Redis vs. Memcached: In-Memory Data Storage Systems 57 | 58 | | Comparison | Redis | Memcached | 59 | | :---: | :--------: | :---: | 60 | | Data Types Supported | string, hash, list, set, sorted set | Hash Table with string and integers | 61 | | Server-end data operations | owns more data structures and supports richer data operations | need to copy the data to the client end for similar changes and then set the data back thus increases IO counts and data sizes | 62 | | Memory Management | Encapsulated malloc/free | Slab Allocation mechanism | 63 | | Memory use efficiency | Lower | higher memory utilization rate | 64 | | Data Persistence | RDB snapshot and AOF log | None | 65 | | Performance | single core so higher performance in small data storage | multiple cores so outperforms for storing data of 100k or above | 66 | 67 | 68 | Reference: 69 | 70 | - [Caching for Resiliency](https://medium.com/the-cloud-architect/patterns-for-resilient-architecture-part-4-85afa66d6341#:~:text=There%20are%20two%20basic%20caching,%E2%80%94%20also%20called%20inline%2Dcache.) 71 | - [Database Caching](https://aws.amazon.com/caching/database-caching/) 72 | - [Cache Usage Patterns](https://www.ehcache.org/documentation/3.3/caching-patterns.html) 73 | - [Using Read-Through and Write-Through in Distributed Cache - DZone Database](https://dzone.com/articles/using-read-through-amp-write-through-in-distribute) 74 | - [Cache replacement policies](https://en.wikipedia.org/wiki/Cache_replacement_policies) 75 | - [Redis vs. Memcached: In-Memory Data Storage Systems](https://medium.com/@Alibaba_Cloud/redis-vs-memcached-in-memory-data-storage-systems-3395279b0941) 76 | - [Redis 6.0, which supports multi-threading, is finally released New features serial 13 questions - Programmer Sought](https://www.programmersought.com/article/30635498543/) 77 | -------------------------------------------------------------------------------- /SystemDesign/consistency_consensus.md: -------------------------------------------------------------------------------- 1 | # Consistency and Consensus 2 | 3 | ## Linearizability 4 | 5 | - Imagine each operation (read or write) is marked with a vertical line at the time of execution, the requirement of linearizability is that the lines joining up the operation markers always move forward in time, never backward 6 | - once a new value has been written or read, all subsequent reads see the value that was 7 | - it doesn’t assume any transaction isolation: another client may change a value at any time 8 | - essentially means “behave as though there is only a single copy of the data, and all operations on it are atomic” 9 | - Used for leader election in coordination service (such as Zookeeper) 10 | - Used for uniqueness constraints in database 11 | 12 | ### Linearizability vs Serializability 13 | 14 | - Serializability 15 | - an isolation property of transactions 16 | - It guarantees that transactions behave the same as if they had executed in some serial order 17 | - It is okay for that serial order to be different from the order in which transactions were actually run 18 | - Linearizability 19 | - a recency guarantee on reads and writes of a register (object) 20 | - It doesn’t group operations together into transactions, so it does not prevent problems such as write skew 21 | 22 | ## CAP theorem 23 | 24 | - **CAP (Consistency, Availability, Partition tolerance) theorem**: you can only pick 2 out of 3 25 | - A better way to describe is **either Consistent or Available when Partitioned** 26 | 27 | ## Ordering Guarantees 28 | 29 | - *causal dependency*: i.e. question and the answer, git commit history and brunches 30 | - *consistent with causality*: the effects of all operations that happened causally before that point in time are visible, but no operations that happened causally afterward can be seen. 31 | - For example, if the snapshot contains an answer, it must also contain the question being answered 32 | - *causally consistent*: system obeys the ordering imposed by causality 33 | - A *total order* always allows any two elements to be compared (i.e. unique sequence number) 34 | - *partially ordered*: in some cases one set is greater than another, but in other cases they are incomparable. 35 | - Tracking causal dependency: 36 | - Explicitly tracking all the data that has been read would mean a large overhead. 37 | - a better way could be to use sequence numbers or timestamps to order events 38 | - **Lamport timestamp**: a simple method for generating sequence numbers that is consistent with causality 39 | - Each node has a unique identifier, and each node keeps a counter of the number of operations it has processed 40 | - The Lamport timestamp is then simply a pair of (counter, node ID) 41 | - Lamport timestamp bears no relationship to a physical time-of-day clock, but it provides total ordering 42 | - Total Order Broadcast: protocol for exchanging messages between nodes 43 | - two safety properties should always be satisfied 44 | - *Reliable delivery*: No messages are lost: if a message is delivered to one node, it is delivered to all nodes. 45 | - *Totally ordered delivery*: Messages are delivered to every node in the same order. 46 | - Implementation: 47 | - assume that every time the lock server grants a lock or lease, it also returns a **fencing token**, which is a number that increases every time a lock is granted 48 | - every time a client sends a write request to the storage service, it must include its current fencing token 49 | - For example, if a node has delivered message 4 and receives an incoming message with a sequence number of 6, it knows that it must wait for message 5 before it can deliver message 6. 50 | - Usages: 51 | - Consensus services (i.e. Zookeeper) 52 | - Logs (replication log, transaction log, or write-ahead log) 53 | 54 | 55 | ## Consensus: get several nodes to agree on something 56 | 57 | ### Two-Phase Commit (2PC):transaction commit across multiple nodes 58 | 59 | - with help of ***transaction manager (coordinator)***, *participants* (nodes that participates in read/write for the transaction) 60 | - Phase 1: coordinator sends prepare request to each participants, ask if they are able to commit. 61 | - Phase 2: if all are good to commit, commit is performed. Otherwise abort. 62 | - process: 63 | 1. When the application wants to begin a distributed transaction, it requests a **globally unique transaction ID** from the coordinator 64 | 2. The application begins a single-node transaction on each of the participants, and attaches the globally unique transaction ID to the single-node transaction. All reads/writes are done in one of these node atomically 65 | 3. When the application is ready to commit, the coordinator sends a prepare request to all participants with the global transaction ID. If any one failed, the coordinator sends an abort request for that transaction ID to all participants. 66 | 4. When a participant receives the prepare request, it makes sure that it can definitely commit the transaction under all circumstances. The participant will not abort the transaction, but without actually committing it. 67 | 5. Once the coordinator’s decision has been written to disk, the commit or abort request is sent to all participants. If this request fails or times out, the coordinator must retry forever until it succeeds. 68 | - two crucial “points of no return” (ensure the atomicity of 2PC): 69 | 1. when a participant votes “yes,” it promises that it will definitely be able to commit later (although the coordinator may still choose to abort) 70 | 2. once the coordinator decides, that decision is irrevocable 71 | - coordinator failed 72 | - The only way 2PC can complete is by waiting for the coordinator to recover. 73 | - This is why the coordinator must write its commit or abort decision to a transaction log on disk before sending commit or abort requests to participants 74 | - when the coordinator recovers, it determines the status of all in-doubt transactions by reading its transaction log. 75 | - Any transactions that don’t have a commit record in the coordinator’s log are aborted. 76 | - Thus, the commit point of 2PC comes down to a regular single-node atomic commit on the coordinator. 77 | - 2PC is thus called a ***blocking atomic commit protocol*** 78 | - ***Three-phase commit (3PC)*** is called ***nonblocking atomic commit*** and requires a ***perfect failure detector*** to ensure nonblocking 79 | 80 | 81 | ### Exactly-once message processing 82 | 83 | - implemented by atomically committing the message acknowledgment and the database writes in a single transaction 84 | - If either the message delivery or the database transaction fails, both are aborted, so it can safely be retried later 85 | - all systems affected by the transaction are required to use the same atomic commit protocol 86 | 87 | 88 | ### XA (eXtended Architecture) transactions 89 | 90 | - XA is not a network protocol but merely a C API for interfacing with a transaction coordinator 91 | - XA assumes that your application uses a network driver or client library to communicate with the participant databases or messaging services 92 | - The standard does not specify how it should be implemented, but in practice the coordinator is often a library that is loaded into the **same process** as the application issuing the transaction 93 | - It keeps track of the participants in a transaction, collects their responses after asking them to prepare (via a callback into the driver), and uses a log on the local disk to keep track of the commit/abort decision for each transaction. 94 | - If the application process crashes, the coordinator goes with it. Since logs are on application server's local disk, the server must be restarted. 95 | - The database server cannot contact the coordinator directly, since all communication must go via its client library. 96 | - If the logs are corrupted so that the in-doubt transactions cannot decide the outcome, only manual intervene (by admin) can resolve it, otherwise it'll lock forever 97 | - one solution: ***heuristic decisions***: allowing a participant to unilaterally decide to abort or commit an in-doubt transaction without a definitive decision from the coordinator, but this MAY break atomicity. Thus, it is only intended only for getting out of catastrophic situations, and not for regular use. 98 | - Limitations: 99 | - Coordinator can become a single point of failure 100 | - when the coordinator is part of a **stateless** application server, it changes the nature of the deployment, since the coordinator’s logs become a crucial part, the server is no longer stateless 101 | - Compatibility: XA cannot detect deadlocks across different systems (since that would require a standardized protocol for systems to exchange information on the locks that each transaction is waiting for), and it does not work with SSI, since that would require a protocol for across different systems. 102 | 103 | ### fault tolerance consensus 104 | 105 | - a fault tolerance consensus algorithm must satisfy the following properties: 106 | 107 | - Uniform agreement: No two nodes decide differently. 108 | - Integrity: No node decides twice. 109 | - Validity: If a node decides value v, then v was proposed by some node. 110 | - Termination: Every node that does not crash eventually decides some value. 111 | - The system model of consensus assumes that when a node “crashes,” it suddenly disappears and never comes back. 112 | - any consensus algorithm requires at least a majority of nodes to be functioning correctly in order to assure termination. That majority can safely form a quorum. 113 | - The best-known fault-tolerant consensus algorithms are Viewstamped Replication (VSR), Paxos, Raft, and Zab. 114 | 115 | - ***epoch number***: a number such that within each epoch, the leader is unique 116 | - each time the leader is dead, nodes will start a vote to elect a new leader 117 | - the election is given an incremented epoch number (totally ordered) 118 | - if there is conflict, the leader with higher epoch number wins 119 | - before the selected leader does anything, need to make sure there is no leader with higher epoch number: collect votes from a quorum of nodes 120 | - so there will be two rounds of voting for the process, one is to select leader, one is to vote on the leader's proposal. If the results of two votes are inconsistent, leader cannot be promoted 121 | 122 | - Limitations of consensus 123 | - The process by which nodes vote on proposals before they are decided is a kind of synchronous replication, some committed data can potentially be lost on failover 124 | - Consensus systems always require a strict majority to operate (you need at least 3 nodes) 125 | - Most consensus algorithms assume a fixed set of nodes that participate in voting, so it's difficult to scale 126 | - Consensus systems generally rely on timeouts to detect failed nodes 127 | 128 | 129 | ## Reference: 130 | 131 | - [Designing Data-Intensive Applications](https://www.amazon.ca/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321/ref=sr_1_1?dchild=1&gclid=Cj0KCQjw_8mHBhClARIsABfFgpg5q5IQvE2s5OBULx6LQFDETV41haS67EE3JAfvobPADJUJHN7dUbsaAjjrEALw_wcB&hvadid=285888202784&hvdev=c&hvlocphy=9001327&hvnetw=g&hvqmt=e&hvrand=12070511852976413586&hvtargid=kwd-407664346480&hydadcr=16109_9598899&keywords=design+data+intensive+application&qid=1626587742&sr=8-1) 132 | -------------------------------------------------------------------------------- /SystemDesign/internet_protocol_suite.md: -------------------------------------------------------------------------------- 1 | # Internet Protocol Suite 2 | 3 | When talking about network communications, the first thing you need to understand is the `Open Systems Interconnection Model (OSI Model)` 4 | 5 | ![OSI model](https://img.router-switch.com/media/wysiwyg/Help-Center-FAQ/Router/osi-model.png) 6 | 7 | For the system design purpose, we don't need to go lower than layer 4. 8 | 9 | The most commonly used protocols are summarized in this table: 10 | 11 | ![Internet Protocol Suite](http://2.bp.blogspot.com/-8spz6AylxBQ/UWKFo86yYjI/AAAAAAAAANI/XKyMikMWn7c/s1600/tcpip.jpg) 12 | 13 | It is definitely not a complete table, and we are particularly interested in the following areas: 14 | 15 | - TCP vs UDP: 16 | - TCP: 17 | - connection-oriented protocol 18 | - connection established between the peer entities prior to transmission 19 | - transmission flow is controlled such that a fast sender does not overwhelm a slow receiver. 20 | - UDP: 21 | - message-oriented protocol 22 | - basically broadcasting messages, no connection/sequence guaranteed 23 | 24 | - Other transport layer protocols: 25 | - QUIC: Based on UDP, initially designed by google. 26 | - SCTP: Combination of TCP and UDP, used for telephony over the Internet. 27 | 28 | - TCP/IP: 29 | - Sometimes people also talking about TCP/IP, it means a protocol stack which contains different protocols required for the data transfer from sender to receiver 30 | - Details can be seen [here](https://stackoverflow.com/questions/31473578/tcp-ip-and-tcp-and-ip-difference) and [here](https://www.fortinet.com/resources/cyberglossary/tcp-ip) 31 | 32 | - HTTP: How does it work when a client wants to communicate with a server 33 | - Open a TCP connection 34 | - Send an HTTP message 35 | - Read the response sent by the server 36 | - Close connection (or reuse connection for further communication) 37 | 38 | - HTTPS: 39 | - Extension of HTTP, but more secure 40 | - Use SSL/TLS to ensure security of data transportation 41 | 42 | - socket: 43 | - A socket is one endpoint of a two-way communication link between two programs running on the network. 44 | - A socket is bound to a port number so that the TCP layer can identify the application that data is destined to be sent to. 45 | 46 | - websocket: 47 | - A WebSocket is a persistent connection between a client and server 48 | - WebSockets provide a bidirectional, full-duplex communications channel that operates over HTTP through a single TCP/IP socket connection 49 | 50 | - HTTP vs Long-polling vs websocket: 51 | - HTTP is a strictly unidirectional protocol 52 | - Long-polling is an HTTP request with a long timeout period 53 | - resources on the server are tied up throughout the length of the long-poll, even when no data is available to send. 54 | - Websocket: allow for sending message-based data, similar to UDP, but with TCP 55 | - uses HTTP as the initial transport mechanism (i.e. HTTP request headers), but keeps the TCP connection alive after the HTTP response is received 56 | - Once TCP connection is established, it uses websocket protocol to communicate 57 | - WebSocket is a framed protocol, meaning that a chunk of data (a message) is divided into a number of discrete chunks, with the size of the chunk encoded in the frame. 58 | - The frame includes a frame type, a payload length, and a data portion. 59 | - More comparison between websocket and http can be seen [here](https://www.geeksforgeeks.org/what-is-web-socket-and-how-it-is-different-from-the-http/) 60 | 61 | - REST: 62 | - a software architectural style that was created to guide the design and development of the architecture for the World Wide Web 63 | - Any web service that obeys the REST constraints is informally described as **RESTful** 64 | - The goal of REST is to increase performance, scalability, simplicity, modifiability, visibility, portability, and reliability. 65 | - Six guiding constraints define a RESTful system: 66 | - Client–server architecture 67 | - client application and server application MUST be able to evolve separately without any dependency on each other 68 | - Statelessness 69 | - The server will not store anything about the latest HTTP request the client made. It will treat every request as new. No session, no history. 70 | - Cacheability 71 | - caching shall be applied to resources when applicable 72 | - Caching can be implemented on the server or client-side. 73 | - Layered system 74 | - allows you to use a layered system architecture where you deploy the APIs on server A, and store data on server B and authenticate requests in Server C 75 | - Uniform interface 76 | - A resource in the system should have only one logical URI, and that should provide a way to fetch related or additional data. 77 | - Code on demand (optional) 78 | - you are free to return executable code to support a part of your application 79 | 80 | - REST vs SOAP 81 | - REST is an architectural style, while SOAP is a protocol 82 | - REST is not a standard in itself, but RESTful implementations make use of standards 83 | 84 | - HTTP response status codes 85 | - For a full list please see [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) 86 | - Some common ones: 87 | - 200: ok/success 88 | - 201: created 89 | - 202: accepted 90 | - 204: No content 91 | - 300: more than one possible response 92 | - 301: permanent redirect 93 | - 302: temporarily redirect 94 | - 400: The server could not understand the request due to invalid syntax. 95 | - 401: unauthenticated 96 | - 403: Permission denied 97 | - 404: The server can not find the requested resource (URL not recognized) 98 | - 500: Unhandled error on server 99 | - 502: Server got an invalid response 100 | 101 | 102 | Reference: 103 | 104 | - [Network Layers & Network Layer in OSI Model](https://www.router-switch.com/faq/network-layers-in-osi-model-features-of-osi.html) 105 | - [Application Layer (Internet protocol Suite) ~ Networking Space](http://walkwidnetwork.blogspot.com/2013/04/application-layer-internet-protocol.html) 106 | - [The Internet protocol suite (article) \| Khan Academy](https://www.khanacademy.org/computing/computers-and-internet/xcae6f4a7ff015e7d:the-internet/xcae6f4a7ff015e7d:the-internet-protocol-suite/a/the-internet-protocols) 107 | - [QUIC - Wikipedia](https://en.wikipedia.org/wiki/QUIC) 108 | - [An overview of HTTP - HTTP \| MDN](https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview) 109 | - [What Is a Socket? (The Java™ Tutorials > Custom Networking > All About Sockets)](https://docs.oracle.com/javase/tutorial/networking/sockets/definition.html) 110 | - [WebSockets - A Conceptual Deep Dive \| Ably Realtime](https://ably.com/topic/websockets) 111 | - [How Do Websockets Work? - Kevin Sookocheff](https://sookocheff.com/post/networking/how-do-websockets-work/) 112 | - [Short Polling vs Long Polling vs WebSockets - System Design](https://www.youtube.com/watch?v=ZBM28ZPlin8&ab_channel=BeABetterDev) 113 | - [Representational state transfer - Wikipedia](https://en.wikipedia.org/wiki/Representational_state_transfer) 114 | - [REST Principles and Architectural Constraints](https://restfulapi.net/rest-architectural-constraints/) 115 | -------------------------------------------------------------------------------- /SystemDesign/load_balancer.md: -------------------------------------------------------------------------------- 1 | # Load Balancer 2 | 3 | - A load balancer distributes incoming client requests among a group of servers, in each case returning the response from the selected server to the appropriate client. 4 | - A load balancer can also enhance the user experience by reducing the number of error responses the client sees. 5 | - Session persistence (sending all requests from a particular client to the same server) is also available for some load balancers 6 | 7 | ## Types of Load Balancers: Layer 4 and Layer 7 8 | 9 | Layer 4 load balancing: 10 | 11 | - “Layer 4 load balancing” most commonly refers to a deployment where the load balancer’s IP address is the one advertised to clients for a web site or service (via DNS, for example) 12 | - Layer 4 load balancing operates at the intermediate transport layer, which deals with delivery of messages with no regard to the content of the messages. 13 | - When it receives a request and makes the load balancing decision, it also performs Network Address Translation (NAT) on the request packet, changing the recorded destination IP address from its own to that of the content server it has chosen on the internal network 14 | - Before forwarding server responses to clients, the load balancer changes the source address recorded in the packet header from the server’s IP address to its own. 15 | - Layer 4 load balancing was a popular architectural approach to traffic handling when commodity hardware was not as powerful as it is now, and the interaction between clients and application servers was much less complex. 16 | 17 | Layer 7 load balancing: 18 | 19 | - Layer 7 load balancing operates at the high‑level application layer, which deals with the actual content of each message. 20 | - HTTP is the predominant Layer 7 protocol for website traffic on the Internet. 21 | - It terminates the network traffic and reads the message within. 22 | - It can make a load‑balancing decision based on the content of the message (the URL or cookie, for example). 23 | - It then makes a new TCP connection to the selected upstream server (or reuses an existing one, by means of HTTP keepalives) and writes the request to the server. 24 | - Layer 7 load balancing is more CPU‑intensive than packet‑based Layer 4 load balancing 25 | 26 | ## Load Balancing Algorithms 27 | 28 | - Least connection 29 | - Selects the server with least active connection 30 | - Weighted Least Connection 31 | - Similar to least connection but with weight 32 | - Least response time 33 | - Selects the server with least response time 34 | - Weighted Least response time 35 | - Similar to least response time but with weight 36 | - Least bandwidth 37 | - Selects the server with least bandwidth 38 | - Round robin 39 | - Checks each server one by one 40 | - Weighted round-robin 41 | - Checks each server one by one based one the weight that admin assigns on the server 42 | - IP hash 43 | - combines source and destination IP addresses of the client and server to generate a unique hash key, which is used to allocate the client to a particular server. 44 | - the client request is directed to the same server it was using previously. 45 | 46 | # Reverse Proxy 47 | 48 | - A reverse proxy accepts a request from a client, forwards it to a server that can fulfill it, and returns the server’s response to the client. 49 | - Increased security – No information about your backend servers is visible outside your internal network 50 | - Has security features to help protect backend servers from distributed denial-of-service (DDoS) attacks (i.e. IP address blacklisting, etc) 51 | - Increased scalability and flexibility – Because clients see only the reverse proxy’s IP address, you are free to change the configuration of your backend infrastructure. 52 | - Increased performance on response time 53 | - Compression (of server responses to reduce bandwidth) 54 | - SSL termination (Encryption) 55 | - Caching (server response for same requests) 56 | 57 | # API Gateway 58 | 59 | - An API Gateway is the element that coordinates and orchestrates how all the requests are processed in a Microservices architecture 60 | - An API Gateway includes an HTTP server where routes are associated with a Microservice or with a FaaS function 61 | - When an API Gateway receives a request, it looks up for the Microservice which can serve the request and delivers it to the relevant part. 62 | - Besides this pure routing task, an API gateway can also be the part that performs **authentication**, **input validation**, **load balancing** and **centralized middleware** functionality, among other tasks. 63 | - It often makes sense to deploy a reverse proxy even with just one web server or application server. 64 | - Drawbacks for API gateway are: 65 | - It creates a tight coupling between the client and the backend. 66 | - It has limited choice of communication protocols for services. 67 | - It could becomes a bottleneck for your application 68 | 69 | ## An example: The Architecture of Uber’s API gateway 70 | 71 | Components in a request lifecycle: 72 | 73 | 1. **Protocol manager**: provides the ability to implement APIs that can ingest any type of relevant protocol payload 74 | 2. **Middleware**: implements composable logic before the endpoint handler is invoked 75 | - Middleware implements cross-cutting concerns, such as authentication, authorization, rate limiting, circuit breaking, etc. 76 | 3. **Endpoint handler**: responsible for request validation, payload transformation, and converting the endpoint request object to the client request object. 77 | 4. **Client**: performs a request to a back-end service 78 | 79 | 80 | # Reference: 81 | 82 | - [System Design: What is Load Balancing? - YouTube](https://www.youtube.com/watch?v=gMIslJN44P0&ab_channel=BeABetterDev) 83 | - [System Design — Load Balancing. Concepts about load balancers and… \| by Larry | Peng Yang | Computer Science Fundamentals | Medium](https://medium.com/must-know-computer-science/system-design-load-balancing-1c2e7675fc27) 84 | - [What Is Layer 4 Load Balancing? \| NGINX Load Balancer](https://www.nginx.com/resources/glossary/layer-4-load-balancing/) 85 | - [Benefits of Layer 7 Load Balancing \| NGINX Load Balancer](https://www.nginx.com/resources/glossary/layer-7-load-balancing/) 86 | - [Load Balancing Algorithms, Types and Techniques](https://kemptechnologies.com/load-balancer/load-balancing-algorithms-techniques/) 87 | - [What is a Proxy? \| System Design - YouTube](https://www.youtube.com/watch?v=xiUmXVcLdCw&ab_channel=BeABetterDev) 88 | - [What is a Reverse Proxy vs. Load Balancer? - NGINX](https://www.nginx.com/resources/glossary/reverse-proxy-vs-load-balancer/) 89 | - [Stupid question of the day: What is an API Gateway and what it has to do with a Serverless model? \| by Gabry Martinez | Medium](https://gabrymartinez.medium.com/stupid-question-of-the-day-what-is-an-api-gateway-and-what-it-has-to-do-with-a-serverless-model-2acee3e3eeba) 90 | - [What is API Gateway?. In microservices architecture, there… \| by Vivek Kumar Singh | System Design Blog | Medium](https://medium.com/system-design-blog/what-is-api-gateway-68a11d4ab322) 91 | - [The Architecture of Uber's API gateway \| Uber Engineering Blog](https://eng.uber.com/architecture-api-gateway/) 92 | -------------------------------------------------------------------------------- /SystemDesign/navigate_url.md: -------------------------------------------------------------------------------- 1 | # What really happens when you enter a url 2 | 3 | 1. enter url in browser 4 | 2. browser looks up IP address for domain name 5 | - DNS look up process: 6 | - **Browser cache**: browser caches DNS record 7 | - **OS cache**: if browser cache doesn't have the desired record, OS has its own cache 8 | - **Router cache**: request goes to router, has its own DNS cache 9 | - **ISP DNS cache**: next check step is the cache ISP’s DNS server 10 | - **Recursive search**: ISP's DNS server makes recursive search 11 | - DNS search bottleneck solutions: 12 | - **Round-robin DNS**: DNS lookup returns multiple IP addresses, rather than just one. 13 | - **Load-balancer**: the piece of hardware that listens on a particular IP address and forwards the requests to other servers 14 | - **Geographic DNS**: mapping a domain name to different IP addresses, depending on the client’s geographic location. 15 | - **Anycast**: a routing technique where a single IP address maps to multiple physical servers. However, it doesn't `fit well with TCP`. 16 | 3. Browser sends HTTP request to the web server, use `GET` 17 | 4. The web server (i.e. facebook) responds with a permanent redirect 18 | - The redirect may due to search engine 19 | 5. Browser follows redirect 20 | 6. Server handles the request 21 | 7. Server sends back HTML response 22 | 8. Browser begins rendering the HTML 23 | 9. Browser sends request for HTML embedded objects (i.e. images, styles, etc) 24 | 10. Browser sends further asynchronous (AJAX) request 25 | 26 | ### Reference 27 | 28 | - [What really happens when you navigate to a URL](http://igoro.com/archive/what-really-happens-when-you-navigate-to-a-url/comment-page-3/) 29 | -------------------------------------------------------------------------------- /SystemDesign/nosql_db.md: -------------------------------------------------------------------------------- 1 | # NoSQL databases 2 | 3 | ## NoSQL database types 4 | 5 | - Column-oriented database: Cassandra, HBase, Hypertable, BigTable 6 | - Key-value stores: Redis, Voldemort, Riak, and Amazon’s Dynamo 7 | - Document stores: MongoDB and CouchDB 8 | - Graph database: Neo4j 9 | 10 | 11 | ## Cassandra 12 | 13 | - Cassandra is de-centralized, which means all nodes are the same (its called **server-symmetry**). There is no leader-follower structure, all the nodes follows P2P (peer to peer) gossip protocol, so its **highly available** 14 | - Cassandra is **highly scalable**, new node will automatically be discovered and there is no reboot needed 15 | 16 | ## Cassandra vs MongoDB 17 | 18 | | Difference | Cassandra | MongoDB | 19 | | :--------: | :-------: | :-----: | 20 | | DB structure | Unstructured data | JSON-like documents | 21 | | Index | Primary Key, but no secondary indexes supported | Index, if no index then each file is searched which could slow down read times | 22 | | Query | CQL (like SQL) | More like programming | 23 | | Replication | leader-leader | leader-follower with auto-election | 24 | 25 | 26 | 27 | ## Reference 28 | 29 | - [NoSQL Database Types - DZone Database](https://dzone.com/articles/nosql-database-types-1) 30 | - [A Comprehensive Guide to Cassandra Architecture](https://www.instaclustr.com/cassandra-architecture/) 31 | - [Cassandra vs MongoDB in 2018](https://blog.panoply.io/cassandra-vs-mongodb) 32 | - [Cassandra vs. MongoDB vs. Hbase: A Comparison of NoSQL Databases \| Logz.io](https://logz.io/blog/nosql-database-comparison/) 33 | - [Introduction to Amazon DynamoDB for Cassandra developers \| AWS Database Blog](https://aws.amazon.com/blogs/database/introduction-to-amazon-dynamodb-for-cassandra-developers/) 34 | - [Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database](https://www.youtube.com/watch?v=yvBR71D0nAQ&ab_channel=AmazonWebServices) 35 | - [Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB](https://www.youtube.com/watch?v=HaEPXoXVf2k&ab_channel=AmazonWebServices) 36 | - [HBase Tutorial for Beginners: Learn in 3 Days!](https://www.guru99.com/hbase-tutorials.html) 37 | -------------------------------------------------------------------------------- /SystemDesign/replication_partition.md: -------------------------------------------------------------------------------- 1 | # Replication & Partition 2 | 3 | ## Replication 4 | 5 | ### Leader-follower (single-leader) 6 | 7 | - Writes are only accepted on the leader, followers are read only 8 | - Synchronous Replication: the leader waits until follower has confirmed that it received the write before reporting success to the user, and before making the write visible to other clients 9 | - Asynchronous Replication: the leader sends the message, but doesn’t wait for a response from the follower. 10 | - Setting Up New Followers: 11 | - Take a consistent snapshot of the leader’s database at some point in time 12 | - Copy the snapshot to the new follower node 13 | - The follower connects to the leader and requests all the data changes that have happened since the snapshot was taken (according to leader’s replication log) 14 | - Handling Node Outages 15 | - Follower failure: Catch-up recovery 16 | - Similar to setting up new followers, each follower keeps a log of the data changes it has received from the leader, so we know where it was stopped 17 | - Leader failure: Failover - one of the followers needs to be promoted to be the new leader manually or automatically 18 | - Determining that the leader has failed (i.e. timeout: if a node doesn’t respond for some period of time). 19 | - Choosing a new leader: the best candidate is usually the replica with the most up-to-date data changes from the old leader 20 | - Reconfiguring the system to use the new leader 21 | - Implementation of Replication Logs 22 | - Statement-based replication: leader logs every write request (statement) that it executes and sends that statement log to its followers 23 | - Random, time, custom functions may go wrong 24 | - Write-ahead log (WAL) shipping: every write is appended to a log 25 | - Replication is closely coupled to the storage engine (of the logs) 26 | - Logical (row-based) log replication: use different log formats for replication and for the storage engine 27 | - Trigger-based replication: only replicate a subset of the data 28 | - **Eventual consistency**: if an application reads from an asynchronous follower, it may see outdated information if the follower has fallen behind, but the followers will eventually catch up and become consistent with the leader 29 | 30 | ### Multi-leader 31 | 32 | - Multi-leader replication 33 | - Circular topology 34 | - Star topology 35 | - All-to-all topology 36 | - Write conflict resolve 37 | - make sure all writes for a particular record go through the same leader 38 | - Unique-write ID: last write win (LWW) 39 | - Unique-replica ID: writes with higher numbered replica wins 40 | - Somehow merge conflicts 41 | - Record and solve/report later 42 | - Custom conflict resolution: 43 | - On-write: db detects conflict in the log and calls the conflict handler 44 | - On-read: all conflicting writes are stored, next time data is read all versions are returned to the application for user to solve or automatically resolve, and write the result back to db. 45 | - Automatic conflict resolve 46 | - Conflict free replicated datatypes (CRDT) 47 | - Mergeable persistent data structures 48 | - Operational transformation (google doc) 49 | 50 | 51 | ### Leaderless 52 | 53 | - How to catch up when node comes back: user versions 54 | - Read repair: repair node when clients read based on version number 55 | - Anti-entropy process: background process constantly look for difference 56 | - **Quorum Consistency** 57 | - For ***N*** nodes, its considered as successful if 58 | - each write has ***W*** writes confirmed successful 59 | - there is at lease ***R*** nodes for read 60 | - such that ***W + R > N*** 61 | - **Sloppy quorum**: writes and reads still require *w* and *r* successful responses, but those may include nodes that are among the designated *n* nodes for a value 62 | - **Hinted handoff**: Once the network interruption is fixed, any writes that one node temporarily accepted on behalf of another node are sent to the appropriate “home” nodes 63 | 64 | 65 | ## Partition (sharding): query throughput can be scaled by adding more nodes 66 | 67 | - For key-value store 68 | - Partition by key range (i.e. alphabetical) and keep sorted in each partition 69 | - may result in skew or hot spot 70 | - Solution: Could use a combined key 71 | - For example, you could prefix each timestamp with the sensor name so that the partitioning is first by name and then by time 72 | - Partition by hash of key (uniformly distributed) 73 | - Looses the ability to do efficient range queries 74 | - Solution: **compound primary key** consisting of several columns 75 | - For example, only the first part of that key is hashed to determine the partition, the other columns are used as a concatenated index for sorting the data 76 | 77 | - Handle skew and hot spots: responsibility of the application 78 | - if one key is known to be very hot, a simple technique is to add a random number to the beginning or end of the key 79 | - it only makes sense to append the random number for the small number of hot keys 80 | - need some way of keeping track of which keys are being split 81 | 82 | - Partitioning and Secondary Indexes 83 | - Partitioning Secondary Indexes by Document (***local index***) 84 | - each partition is completely separate: each partition maintains its own secondary indexes, covering only the documents in that partition 85 | - doesn’t care what data is stored in other partitions 86 | - **scatter/gather** problem: read queries on secondary indexes on partitioned database needs to query to all partitions 87 | - Partitioning Secondary Indexes by Term (***global index***) 88 | - a global index that covers data in all partitions, and the global index must also be partitioned but can be different from the primary key index 89 | - We call this kind of index term-partitioned, because the term we’re looking for determines the partition of the index 90 | - Reads faster but writes slower and complex 91 | - updates to global secondary indexes are often asynchronous 92 | 93 | - Strategies for rebalancing partitions 94 | - Do NOT use Hash mod N since it can result in different result 95 | - Use fixed number of partition, create many more partitions than nodes and assign several partitions to each node 96 | - Dynamic partitioning, auto-partition when 97 | - existing partition reach a configured size 98 | - the number of partitions is proportional to the size of the dataset 99 | - the number of partitions proportional to the number of nodes 100 | 101 | - Request Routing among partitions and nodes 102 | - Coordination service (keep track of cluster metadata) to route request (i.e. zookeeper) 103 | - **gossip protocol**: Requests can be sent to any node, and that node forwards them to the appropriate node for the requested partition 104 | 105 | 106 | Reference: 107 | 108 | - [Designing Data-Intensive Applications](https://www.amazon.ca/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321/ref=sr_1_1?dchild=1&gclid=Cj0KCQjw_8mHBhClARIsABfFgpg5q5IQvE2s5OBULx6LQFDETV41haS67EE3JAfvobPADJUJHN7dUbsaAjjrEALw_wcB&hvadid=285888202784&hvdev=c&hvlocphy=9001327&hvnetw=g&hvqmt=e&hvrand=12070511852976413586&hvtargid=kwd-407664346480&hydadcr=16109_9598899&keywords=design+data+intensive+application&qid=1626587742&sr=8-1) 109 | -------------------------------------------------------------------------------- /SystemDesign/scale_web_app.md: -------------------------------------------------------------------------------- 1 | # Web app scale from monolithic to distributed 2 | 3 | ## 1. Single Server + Database 4 | 5 | ![01-initial-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/01-initial-700.png) 6 | 7 | MVC Design: 8 | - Advantage: 9 | - Easy to debug 10 | - Easy to deploy 11 | 12 | - Disadvantage: 13 | - very difficult to scale. Each request is handled by the server and one page can have many requests which will consume lots of server resources 14 | - whenever there is a server issue, entire site goes down 15 | - deployment goes with both front end or backend end 16 | 17 | ## 2. Adding a Reverse Proxy 18 | 19 | ![02-reverse-proxy-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/02-reverse-proxy-700.png) 20 | 21 | - **Health Checks** make sure that our actual server is still up and running 22 | - **Routing** forwards a request to the right endpoint 23 | - **Authentication** makes sure that a user is actually permitted to access the server 24 | - **Firewalling** ensure that users only have access to the parts of our network they are allowed to use ... and more 25 | 26 | ## 3. Add a Load Balancer with multiple servers 27 | 28 | ![03-load-balancer-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/03-load-balancer-700.png) 29 | 30 | - Reverse Proxy can act as load balancer 31 | 32 | ## 4. Add more database to each server 33 | 34 | ![04-database-scale-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/04-database-scale-700.png) 35 | 36 | How to ensure data consistency: 37 | - **Master/Slave setup or Write with Read-replicas**: split into multiple parts where each part does its own thing 38 | 39 | ## 5. Microservices 40 | 41 | ![05-microservices-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/05-microservices-700.png) 42 | 43 | - each service can be scaled individually, enabling us to better adjust to demand 44 | - development teams can work independently, each being responsible for their own microservice's lifecycle (creation, deployment, updating etc.) 45 | - each microservice can use its own resources 46 | 47 | ## 6. Caching & CDN(Content Delivery Networks) 48 | 49 | ![06-cdn-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/06-cdn-700.png) 50 | 51 | - cache the static content 52 | 53 | ## 7. Message Queues 54 | 55 | ![07-message-queue-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/07-message-queue-700.png) 56 | 57 | Advantages: 58 | - it decouples tasks and processors. Sometimes a lot of images need to be processed, sometimes only a few. Sometimes a lot of processors are available, sometimes its just a couple. By simply adding tasks to a backlog rather than processing them directly we ensure that our system stays responsive and no tasks get lost. 59 | - it allows us to scale on demand. Starting up more processors takes time - so by the time a lot of users tries to upload images, it's already too late. By adding our tasks to a queue we can defer the need to provision additional capacities to process them 60 | 61 | ## 8. Sharding 62 | 63 | ![08-sharding-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/08-sharding-700.png) 64 | 65 | - Sharding can be based on any number of factors, e.g. letters, location, usage frequency (power-users are routed to the good hardware) and so on. 66 | - You can shard servers, databases or almost any aspect of your stack this way, depending on your needs. 67 | 68 | ## 9. Load-balancing the load-balancer 69 | 70 | ![09-dns-700.png](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/09-dns-700.png) 71 | 72 | - DNS (Domain Name System): allows us to specify multiple IPs per domain name, each leading to a different load balancer. 73 | 74 | ## Reference: 75 | 76 | - [Scaling webapps for newbs & non-techies](https://arcentry.com/blog/scaling-webapps-for-newbs-and-non-techies/) 77 | 78 | ## More resources 79 | 80 | - [Scaling Up to Your First 10 Million Users](https://www.youtube.com/watch?v=Ma3xWDXTxRg&ab_channel=AmazonWebServices) 81 | - [Web Scalability for Startup Engineers](https://www.amazon.ca/Scalability-Startup-Engineers-Artur-Ejsmont/dp/0071843655) 82 | - [A Beginner's Guide to Scaling to 11 Million+ Users on Amazon's AWS - High Scalability -](http://highscalability.com/blog/2016/1/11/a-beginners-guide-to-scaling-to-11-million-users-on-amazons.html) 83 | -------------------------------------------------------------------------------- /SystemDesign/storage_system.md: -------------------------------------------------------------------------------- 1 | # File Storage, Block Storage, Object Storage, and HDFS 2 | 3 | ## File storage 4 | 5 | - Data is stored as a single piece of information inside a folder (hierarchical directories structure) 6 | - Just like you’d organize pieces of paper inside a manila folder 7 | - When you need to access that piece of data, your computer needs to know the path to find it (i.e. `/home/images/beach.jpeg`) 8 | 9 | Pros: 10 | 11 | - oldest and most widely used data storage system for direct and network-attached storage (NAS) systems 12 | - has broad capabilities and can store just about anything 13 | - great for storing an array of complex files and is fairly fast for users to navigate 14 | 15 | Cons: 16 | 17 | - File-based storage systems must scale out by adding more systems, rather than scale up by adding more capacity. 18 | 19 | ## Block storage 20 | 21 | - Data is chopped into blocks, each block of data is given a unique identifier, which allows a storage system to place the smaller pieces of data wherever is most convenient. 22 | - Some data can be stored in a Linux environment and some can be stored in a Windows unit. 23 | - When data is requested, the underlying storage software reassembles the blocks of data from these environments and presents them back to the user. 24 | 25 | Pros: 26 | 27 | - doesn’t rely on a single path to data, so its fast 28 | - gives the user complete freedom to configure their data 29 | - easy to use and manage, efficient and reliable 30 | - the more data you need to store, the better off you’ll be with block storage. 31 | 32 | Cons: 33 | 34 | - Can be expensive 35 | - limited capability to handle metadata 36 | 37 | ## Object storage 38 | 39 | - A flat structure in which files are broken into pieces and spread out among hardware.3 40 | - Data is broken into discrete units called objects and is kept in a single repository, instead of being kept as files in folders or as blocks on servers. 41 | - Object storage volumes work as modular units: 42 | - each is a self-contained repository that owns the data and the metadata that describes the data 43 | - each has a unique identifier that allows the object to be found over a distributed system 44 | - To retrieve the data, the storage operating system uses the metadata and identifiers 45 | - Can be extremely detailed (i.e. video, photo meta data, etc) 46 | - Object storage requires a simple HTTP API 47 | 48 | Pros: 49 | 50 | - Cost efficient, pay what you use 51 | - well suited for static data 52 | - can scale to extremely large quantities of data 53 | - good at storing unstructured data. 54 | 55 | Cons: 56 | 57 | - Objects can’t be modified 58 | - doesn’t work well with traditional databases, because writing objects is a slow process and writing an app to use an object storage API isn’t as simple as using file storage 59 | 60 | ## Hadoop Distributed File System (HDFS) 61 | 62 | - A distributed file system designed to run on commodity hardware. 63 | - It stores each file as a sequence of blocks 64 | - all blocks in a file except the last block are the same size. 65 | - The blocks of a file are replicated for fault tolerance (HDFS requires Block Storage) 66 | - The block size and replication factor are configurable per file. 67 | - Files in HDFS are write-once and have strictly one writer at any time. 68 | 69 | Pros: 70 | 71 | - highly fault-tolerant and is designed to be deployed on low-cost hardware 72 | - provides high throughput access to application data and is suitable for applications that have large data sets 73 | - Enables streaming access to file system data 74 | 75 | Cons: 76 | 77 | - Problems with small files 78 | - the data read or write done from the disk which makes it difficult to perform in-memory calculation and lead to processing overhead or High up processing. 79 | - Supports Only Batch Processing 80 | 81 | ### Map Reduce 82 | 83 | - MapReduce is a programming model or pattern within the Hadoop framework that is used to access big data stored in the HDFS. 84 | - It is a core component, integral to the functioning of the Hadoop framework. 85 | - With MapReduce, rather than sending data to where the application or logic resides, the logic is executed on the server where the data already resides, to expedite processing. 86 | 87 | How does it work: 88 | 89 | - Essentially there are two functions: **Map** and **Reduce** 90 | - The **Map** function takes input from the disk as `` pairs, processes them, and produces another set of intermediate `` pairs as output. 91 | - The **Reduce** function also takes inputs as `` pairs, and produces `` pairs as output. 92 | - **Combine** is an optional process. 93 | - The combiner is a reducer that runs individually on each mapper server. 94 | - It reduces the data on each mapper further to a simplified form before passing it downstream. 95 | - **Partition** is the process that translates the `` pairs resulting from mappers to another set of `` pairs to feed into the reducer. 96 | - It decides how the data has to be presented to the reducer and also assigns it to a particular reducer. 97 | 98 | ## Database vs Storage Systems 99 | 100 | Conclusion: you shouldn't use database for file storage, mainly because of performance 101 | 102 | - [Database vs File system storage - Stack Overflow](https://stackoverflow.com/questions/38120895/database-vs-file-system-storage) 103 | - [Is it a bad practice to store large files (10 MB) in a database? - Software Engineering Stack Exchange](https://softwareengineering.stackexchange.com/questions/150669/is-it-a-bad-practice-to-store-large-files-10-mb-in-a-database) 104 | 105 | 106 | ## Reference: 107 | 108 | - [File storage, block storage, or object storage?](https://www.redhat.com/en/topics/data-storage/file-block-object-storage) 109 | - [System Design — Storage. Storage concepts and considerations in… \| by Larry | Peng Yang | Computer Science Fundamentals | Medium](https://medium.com/must-know-computer-science/system-desing-storage-d8ef4a8d952c) 110 | - [Hadoop - Pros and Cons - GeeksforGeeks](https://www.geeksforgeeks.org/hadoop-pros-and-cons/) 111 | - [Hadoop In 5 Minutes \| What Is Hadoop? | Introduction To Hadoop | Hadoop Explained |Simplilearn - YouTube](https://www.youtube.com/watch?v=aReuLtY0YMI&ab_channel=Simplilearn) 112 | - [MapReduce 101: What It Is & How to Get Started - Talend](https://www.talend.com/resources/what-is-mapreduce/) 113 | -------------------------------------------------------------------------------- /SystemDesign/transaction_isolation.md: -------------------------------------------------------------------------------- 1 | # Transaction & Isolation 2 | 3 | ## Transaction 4 | 5 | - Transaction 6 | - A transaction is a way for an application to group several reads and writes together into a logical unit. 7 | - all the reads and writes in a transaction are executed as one operation: either the entire transaction succeeds (commit) or it fails (abort, rollback). 8 | 9 | ### ACID (Atomicity, Consistency, Isolation, and Durability) 10 | 11 | - Atomicity 12 | - describes what happens if a client wants to make several writes, but a fault occurs after some of the writes have been processed (gives all-or-nothing guarantee) 13 | - if the writes are grouped together into an atomic transaction, and the transaction cannot be completed (committed) due to a fault, then the transaction is aborted and the database must discard or undo any writes it has made so far in that transaction. 14 | - Without atomicity, if an error occurs, it’s difficult to know which changes have taken effect and which haven’t. 15 | 16 | - Consistency 17 | - the idea is that you have certain statements about your data (invariants) that must always be true 18 | - However, this idea of consistency depends on the application’s notion of invariants (This is not something that the database can guarantee) 19 | 20 | - Isolation 21 | - Most databases are accessed by several clients at the same time, isolation is used to handle concurrency problems (race conditions) 22 | - it means that concurrently executing transactions are isolated from each other: they cannot step on each other’s toes 23 | - In theory textbooks formalize isolation as serializability, which mean transactions happen one by one 24 | - In practice, serializable isolation is rarely used, because it carries a performance 25 | penalty 26 | 27 | - Durability 28 | - ensures that once a transaction has committed successfully, any data it has written will not be forgotten (regardless of failures and crashes). 29 | - In a single-node database, durability typically means that the data has been written to nonvolatile storage such as a hard drive or SSD. 30 | - It usually has a write-ahead log (or similar) for recovery 31 | - In a replicated database, durability may mean that the data has been successfully copied to some number of nodes. 32 | - a database must wait until these writes or replications are complete before reporting a transaction as successfully committed 33 | - Replication durability cases: 34 | - if write to disk and machine dies, data is inaccessible until you fix it, but replicated system remain available 35 | - If a node crashes on a particular input, all replicas can be down, in memory data will be lost 36 | - In an asynchronously replicated system, recent writes may be lost when the leader becomes unavailable 37 | - Hardware disks may not be reliable (SSD due to power cut, firmware issues, etc) 38 | - File maybe corrupted after a crash due to software (file system, storage engine, etc) bugs 39 | 40 | ## Isolation 41 | 42 | ### Weak Isolation 43 | 44 | - **Read Committed**: 45 | - Two guarantees 46 | 1. No dirty reads: When reading from the database, you will only see data that has been committed. 47 | - dirty reads: read uncommitted data from other transaction 48 | 2. No dirty writes: When writing to the database, you will only overwrite data that has been committed. 49 | - dirty writes: later write overwrites an (earlier) uncommitted value 50 | - Implementation 51 | - Row level locks (to prevent dirty write): 52 | - acquire a lock on the project (when updating), hold lock until transaction is committed or aborted. 53 | - Only one transaction can hold a lock for any given object, others must wait 54 | - To prevent dirty read: 55 | - only see the old value (not the new value that is being committing to database) 56 | - Only when the new value is committed do transactions switch over to reading the new value. 57 | - Typically read committed uses a separate snapshot for each query 58 | 59 | - **read skew (nonrepeatable read)** 60 | - when two transactions are processing into different database, one can read the database in an inconsistent state. But once the transaction is complete, the values will be consistent 61 | - Read skew is considered acceptable under read committed isolation 62 | - this may be a problem for database backups or Analytic queries and integrity checks 63 | 64 | - **Snapshot isolation** is the most common solution to read skew 65 | - Transaction sees all the data that was committed in the database at the start of transaction 66 | - each transaction sees only the old data from that particular point in time 67 | - Implementation: ***multiversion concurrency control (MVCC)*** 68 | - From a performance point of view, a key principle of snapshot isolation is readers never block writers, and writers never block readers 69 | - database must potentially keep several different committed versions of an object 70 | - Typically snapshot isolation uses the same snapshot for an entire transaction 71 | - readers never block writers, and writers never block readers 72 | 73 | - MVCC implementation (for postgres) 74 | - When a transaction is started, it is given a unique, always-increasing transaction ID (*txid*). 75 | - Whenever a transaction writes anything to the database, the data it writes is tagged with the transaction ID of the writer. 76 | - Each row in a table has a *created_by* field, containing the ID of the transaction that inserted this row into the table 77 | - each row has a *deleted_by* field (initially empty) 78 | - If a transaction deletes a row, row is soft deleted by marking the *deleted_by* field to the ID of the transaction that requested the deletion (row is not deleted from database) 79 | - when it is certain that no transaction can any longer access the deleted data, a garbage collection process in the database removes the soft deleted rows 80 | - An update is internally translated into a delete and a create. 81 | - transaction IDs are used to decide which objects it can see and which are invisible for reads. Visibility rules for both creation and deletion: 82 | - At the time when the reader’s transaction started, the transaction that created the object had already committed 83 | - The object is not marked for deletion, or if it is, the transaction that requested deletion had not yet committed at the time when the reader’s transaction started. 84 | - By never updating values in place but instead creating a new version every time a value is changed, the database can provide a consistent snapshot while incurring only a small overhead. 85 | - Database index point to all versions of an object and filter out those invisible ones 86 | 87 | - To prevent **Lost update** (in a read-modify-write cycle): when two transactions do this concurrently, one of the modifications can be lost 88 | - Atomic update operations: taking an exclusive lock on the object when it is read so that no other transaction can read it until the update has been applied (***cursor stability***). 89 | - Explicit locking: explicitly lock objects that are going to be updated 90 | - Automatically detecting lost updates: allow transactions to execute in parallel, abort transaction and force it to retry if lost updated is detected 91 | - Compare-and-set: allow an update to happen only if the value has not changed since you last read it. If it changes, force retry. 92 | - For replicated databases: allow concurrent writes to create several conflicting versions of a value (also known as siblings), and to use application code or special data structures to resolve and merge these versions after the fact. 93 | 94 | - **Write Skew**: if two transactions read the same objects, and then update some of those objects (different transactions may update different objects), then dirty write or lost update may happen 95 | - Solution: explicitly lock the rows that the transaction depends on 96 | - **Lock the row for update** 97 | - **Unique constraint for create** 98 | 99 | - **phantom**: a write in one transaction changes the result of a search query in another transaction (i.e. objects that do not yet exist in the database, but which might be added in the future) 100 | - A serializable isolation level is much preferable in most cases 101 | - **materializing conflicts** (last resort if no alternative is possible): it takes a phantom and turns it into a lock conflict on a concrete set of rows that exist in the database (i.e. pre-create all the rows for different combinations in database, and new inquires becomes checking the existing rows rather than create them) 102 | 103 | 104 | ### Serializable isolation (strongest isolation level) 105 | 106 | - Literally executing transactions in a serial order 107 | - single-threaded execution 108 | - **stored procedure**: the application submit the entire transaction code to the database ahead of time, and executing all transactions on a single thread (in-memory) 109 | 110 | - **Two-phase locking (2PL)** 111 | - ***pessimistic*** concurrency control mechanism: if anything may go wrong, it's better to wait until its safe to do anything (similar to *mutual exclusion*) 112 | - Concurrent transactions are allowed to read the same object when nobody is writing to it. 113 | - When a write happens (update/delete): 114 | - the second transaction (can be read or write) must wait until first transaction (can be read or write) commits or aborts 115 | - Reading an old version is forbidden 116 | - writers don’t just block other writers; they also block readers and vice versa 117 | - Implementation: lock (in shared mode or exclusive mode) on each object in the database 118 | - Read: acquire the lock in shared mode (allow several transactions to hold it), but transaction must wait for an exclusive lock 119 | - Write: acquire the lock in exclusive mode (only 1 transaction can hold it, others must wait) 120 | - First read then write: upgrade its shared lock to an exclusive lock 121 | - First phase: acquire lock; second phase: release lock. 122 | - Performance is poor, deadlock may happen 123 | 124 | - **Predicate lock**: belongs to all objects that match some search condition 125 | - Read for some condition: acquire a shared-mode predicate lock on the conditions of the query, if another transaction has an exclusive lock on the objects matching the conditions, then the current transaction must wait 126 | - Write: first check whether either the old or the new value matches any existing predicate lock, if there is matching the current transaction must wait 127 | 128 | - **Index-range lock (next-key locking)**: 129 | - Similar to predicate locks, but is based on indices of the rows for the search condition 130 | - Better performance but may lock a bigger range of objects 131 | - If there is no suitable index where a range lock can be attached, the database can fall back to a shared lock on the entire table 132 | 133 | - Optimistic concurrency control techniques such as **Serializable Snapshot Isolation (SSI)** 134 | - ***optimistic*** concurrency control mechanism: instead of blocking if something may go wrong, transactions continue anyway 135 | - all reads within a transaction are made from a consistent snapshot of the database 136 | - when a transaction wants to commit, database checks whether isolation was violated (a query result might have changed), and if so the transaction is aborted and has to be retried 137 | - By avoiding unnecessary aborts, SSI preserves snapshot isolation’s support for long-running reads from a consistent snapshot. 138 | - How database checks whether isolation was violated 139 | - Detecting stale MVCC reads (uncommitted write occurred before the read due to visibility rules) 140 | - When the transaction wants to commit, the database checks whether any of the ignored writes have now been committed. If so, the transaction must be aborted. 141 | - Detecting writes that affect prior reads (the write occurs after the read) 142 | - Similar to the index-range locks, index is recorded for the transaction when reading; when writing, the index will be checked and notify the transaction to abort 143 | - Performance: 144 | - one trade-off is the granularity at which transactions’ reads and writes are tracked 145 | - Compared to two-phase locking, the big advantage is that one transaction doesn’t need to block waiting for locks held by another transaction. 146 | 147 | 148 | ## Reference: 149 | 150 | - [Designing Data-Intensive Applications](https://www.amazon.ca/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321/ref=sr_1_1?dchild=1&gclid=Cj0KCQjw_8mHBhClARIsABfFgpg5q5IQvE2s5OBULx6LQFDETV41haS67EE3JAfvobPADJUJHN7dUbsaAjjrEALw_wcB&hvadid=285888202784&hvdev=c&hvlocphy=9001327&hvnetw=g&hvqmt=e&hvrand=12070511852976413586&hvtargid=kwd-407664346480&hydadcr=16109_9598899&keywords=design+data+intensive+application&qid=1626587742&sr=8-1) 151 | -------------------------------------------------------------------------------- /Templates/backtrack.md: -------------------------------------------------------------------------------- 1 | # Backtracking 2 | 3 | This is essentially the traversal of a decision tree. 4 | 5 | There are three things we need to consider: 6 | 7 | 1. Path (or track) 8 | 2. the list of choices 9 | 3. End condition 10 | 11 | ## Template 12 | 13 | ```python 14 | result = [] 15 | 16 | def backtrack(track, result, choices): 17 | if satisfies_end_condition: 18 | # Here if track is a list we need to do a copy, otherwise we are just appending the pointer to the list's address 19 | result.append(track.copy()) 20 | return 21 | 22 | for choice in choices: 23 | # Make the choice 24 | track.append(choice) 25 | backtrack(track, result, choices) 26 | # undo the choice (i.e. backtrack) 27 | track.pop() 28 | ``` 29 | 30 | 31 | Reference: 32 | 33 | - [Leetcode Backtracking Template](https://leetcode.com/explore/learn/card/recursion-ii/472/backtracking/2793/) 34 | - [Backtracking Template](https://github.com/labuladong/fucking-algorithm/blob/english/think_like_computer/DetailsaboutBacktracking.md) 35 | 36 | Practice: 37 | 38 | - Permutations: 39 | - [46. Permutations](https://leetcode.com/problems/permutations/) 40 | - [784. Letter Case Permutation](https://leetcode.com/problems/letter-case-permutation/) 41 | - [47. Permutations II](https://leetcode.com/problems/permutations-ii/) 42 | - [17. Letter Combinations of a Phone Number](https://leetcode.com/problems/letter-combinations-of-a-phone-number/) 43 | -------------------------------------------------------------------------------- /Templates/binary_search.md: -------------------------------------------------------------------------------- 1 | # Binary Search 2 | 3 | ***The KEY is the search interval*** 4 | 5 | ## Generic template 6 | 7 | Find a number in a list of distinct sorted numbers, return -1 if there is no such number. 8 | 9 | Time complexity: `O(logn)` 10 | 11 | ```python 12 | def generic_binary_search(nums, target): 13 | left = 0 14 | right = len(nums) - 1 15 | 16 | while left <= right: 17 | mid = left + (right - left) // 2 18 | if nums[mid] < target: 19 | left = mid + 1 20 | elif nums[mid] > target: 21 | right = mid - 1 22 | elif nums[mid] == target: 23 | return mid 24 | 25 | return -1 26 | ``` 27 | 28 | Search interval: `[left, right]` 29 | 30 | Practice: [704. Binary Search](https://leetcode.com/problems/binary-search/submissions/) 31 | 32 | ## Search left 33 | 34 | Find the first target in a list of sorted numbers with duplication, return -1 if there is no such number. 35 | 36 | ```python 37 | def binary_search_left(nums, target): 38 | if not nums: 39 | return -1 40 | 41 | left = 0 42 | right = len(nums) 43 | 44 | while left < right: 45 | mid = left + (right - left) // 2 46 | if nums[mid] < target: 47 | left = mid + 1 48 | elif nums[mid] > target: 49 | right = mid 50 | elif nums[mid] == target: 51 | right = mid 52 | 53 | return left if nums[left] == target else -1 54 | ``` 55 | 56 | Search interval: `[left, right)` 57 | 58 | ## Search right 59 | 60 | Find the last target in a list of sorted numbers with duplication, return -1 if there is no such number. 61 | 62 | ```python 63 | def binary_search_left(nums, target): 64 | if not nums: 65 | return -1 66 | 67 | left = 0 68 | right = len(nums) 69 | 70 | while left < right: 71 | mid = left + (right - left) // 2 72 | if nums[mid] < target: 73 | left = mid + 1 74 | elif nums[mid] > target: 75 | right = mid 76 | elif nums[mid] == target: 77 | left = mid + 1 78 | 79 | return left - 1 if nums[left - 1] == target else -1 80 | ``` 81 | 82 | Search interval: `[left, right)` 83 | 84 | # Universal template 85 | 86 | ```python 87 | def generic_binary_search(nums, target): 88 | left = 0 89 | right = len(nums) - 1 90 | 91 | while left <= right: 92 | mid = left + (right - left) // 2 93 | if nums[mid] < target: 94 | left = mid + 1 95 | elif nums[mid] > target: 96 | right = mid - 1 97 | elif nums[mid] == target: 98 | return mid 99 | 100 | # # search left 101 | # right = mid - 1 102 | 103 | # # search right 104 | # left = mid + 1 105 | 106 | return -1 107 | 108 | # # search left 109 | # return -1 if left >= len(nums) or nums[left] != target else left 110 | 111 | # # search right 112 | # return -1 if right < 0 or nums[right] != target else right 113 | ``` 114 | 115 | Search interval: `[left, right]` 116 | 117 | # Python builtin function 118 | 119 | You can also use the builtin `biset` function to do the same thing. 120 | 121 | [Details here](https://docs.python.org/3/library/bisect.html) 122 | 123 | 124 | Practice: 125 | 126 | - [34. Find First and Last Position of Element in Sorted Array](https://leetcode.com/problems/find-first-and-last-position-of-element-in-sorted-array/) 127 | -------------------------------------------------------------------------------- /Templates/dijkstra.md: -------------------------------------------------------------------------------- 1 | # Dijkstra’s algorithm (Shortest Path) 2 | 3 | ## Implementation: 4 | 5 | - Use an instance variable keep track of the total cost from the start node to each destination 6 | - The instance variable will contain the current total weight of the smallest weight path from the start to the vertex in question 7 | - The algorithm iterates once for every vertex in the graph (order is in priority queue) 8 | - The PriorityQueue class stores tuples of key, value pairs 9 | - the key in the priority queue must match the key of the vertex in the graph 10 | - the value is used for deciding the priority, and thus the position of the key in the priority queue 11 | - Dijkstra’s algorithm works only when the weights are all positive, otherwise the algorithm enters infinity loop 12 | - time complexity: O((V+E)log(V)) 13 | 14 | 15 | 16 | ***Essentially, this is a BFS using priority queue instead of queue*** 17 | 18 | ## Template 19 | 20 | First check the graph traversal template [**HERE**](./graph_traversal.md) if you haven't seen it. It is highly recommended that you fully understand the graph traversal first. 21 | 22 | ```python 23 | from heapq import heappop, heappush 24 | 25 | heap = [[0, start]] 26 | seen = set() 27 | while heap: 28 | # Use dist to set priority, and vertex is the value that stores the info 29 | dist, vertex = heappop(heap) 30 | if vertex not in seen: 31 | seen.add(vertex) 32 | if vertex == end: 33 | return dist 34 | for end_vertex in vertices[vertex]: 35 | if end_vertex not in seen: 36 | heappush(heap, [dist+1, end_vertex]) 37 | ``` 38 | 39 | Note that above template doesn't iterate through the entire maze or graph or whatever input you received. 40 | This is simply because shortest path normally have a start point and an end point, so we just need to start from the start point, no need to check all points. 41 | 42 | 43 | Reference: 44 | 45 | - [Dijkstra’s Algorithm](https://runestone.academy/runestone/books/published/pythonds/Graphs/DijkstrasAlgorithm.html) 46 | 47 | Practice: 48 | 49 | - [1091. Shortest Path in Binary Matrix](https://leetcode.com/problems/shortest-path-in-binary-matrix/) 50 | - [Please Share dijkstra's algorithm questions - Leetcode](https://leetcode.com/discuss/interview-question/731911/please-share-dijkstras-algorithm-questions) 51 | -------------------------------------------------------------------------------- /Templates/graph_SCC.md: -------------------------------------------------------------------------------- 1 | # Strongly Connected Components 2 | 3 | - This is probably won't be in an interview, so code implementation is not provided, but good to know the concepts 4 | - **Strongly Connected Component (SCC):** a strongly connected component C, of a graph G, as the largest subset of vertices C⊂V such that for every pair of vertices v,w∈C we have a path from v to w and a path from w to v. 5 | ![](http://interactivepython.org/runestone/static/pythonds/_images/scc1.png) 6 | - Once the strongly connected components have been identified we can show a simplified view of the graph by combining all the vertices in one strongly connected component into a single larger vertex. 7 | ![](http://interactivepython.org/runestone/static/pythonds/_images/scc2.png) 8 | - The transposition of a graph G is defined as the graph G^T where all the edges in the graph have been reversed. 9 | 10 | ## Implementation 11 | 12 | 1. Call DFS for the graph G to compute the finish times for each vertex. 13 | 2. Compute G^T (i.e. transposition of G) 14 | 3. Call DFS for the graph G^T but in the main loop of DFS explore each vertex in decreasing order of finish time 15 | 4. Each tree in the forest computed in step 3 is a strongly connected component. Output the vertex ids for each vertex in each tree in the forest to identify the component 16 | 17 | ## Reference: 18 | 19 | - [Strongly Connected Components](https://runestone.academy/runestone/books/published/pythonds/Graphs/StronglyConnectedComponents.html) 20 | -------------------------------------------------------------------------------- /Templates/graph_traversal.md: -------------------------------------------------------------------------------- 1 | # Graph Traversal - BFS/DFS 2 | 3 | First check the matrix traversal template [**HERE**](./matrix_traversal.md) if you haven't seen it. It is highly recommended that you fully understand the matrix traversal first. 4 | 5 | We can use either BFS or DFS to traversal the graph 6 | 7 | - Key Differences Between BFS and DFS 8 | 1. BFS is vertex-based algorithm while DFS is an edge-based algorithm. 9 | 2. Queue data structure is used in BFS. On the other hand, DFS uses stack or recursion. 10 | 3. Memory space is efficiently utilized in DFS while space utilization in BFS is not effective. 11 | - DFS takes linear space because we have to remember single path with unexplored nodes, while BFS keeps every node in memory. 12 | 4. BFS is optimal algorithm while DFS is not optimal. 13 | 5. DFS constructs narrow and long trees. As against, BFS constructs wide and short tree. 14 | 15 | 16 | ## Recursive template 17 | 18 | ```python 19 | def traverse(matrix): 20 | """Recursive""" 21 | 22 | rows, cols = len(matrix), len(matrix[0]) 23 | visited = set() 24 | directions = ((0, 1), (0, -1), (1, 0), (-1, 0)) # Much faster than list 25 | 26 | def dfs(x, y): 27 | if (x, y) in visited: 28 | return 29 | visited.add((x, y)) 30 | 31 | # Traverse neighbors 32 | for dx, dy in directions: 33 | x, y = dx + _x, dy + _y 34 | if 0 <= x < rows and 0 <= y < cols: # Check boundary 35 | # Add any other checking here ^ 36 | dfs(x, y) 37 | 38 | for i in range(rows): 39 | for j in range(cols): 40 | dfs(i, j) 41 | ``` 42 | 43 | ## Iterative template 44 | 45 | ```python 46 | # BFS TEmplate for LC 200 47 | 48 | def numIslands(self, grid: List[List[str]]) -> int: 49 | count = 0 50 | 51 | if not grid: 52 | return count 53 | 54 | rows = len(grid) 55 | cols = len(grid[0]) 56 | directions = ((0, 1), (0, -1), (1, 0), (-1, 0)) 57 | visited = set() 58 | 59 | for row in range(rows): 60 | for col in range(cols): 61 | if grid[row][col] == '1': 62 | # The position here is critical 63 | # count here means counting all the "1"s 64 | count += 1 65 | visited.add((row, col)) # Mark current node as visited 66 | queue = [(row, col)] # Initiate a queue for bfs 67 | 68 | while queue: 69 | for _ in range(len(queue)): 70 | # Must use name other than row and col 71 | # Otherwise there will be collision 72 | _x, _y = queue.pop(0) # deque and popleft() is better here 73 | 74 | for dx, dy in directions: 75 | # Traversal all directions 76 | x, y = dx + _x, dy + _y 77 | 78 | if 0 <= x < rows and 0 <= y < cols and grid[x][y] == "1" and (x, y) not in visited: 79 | visited.add((x, y)) 80 | queue.append((x, y)) 81 | 82 | # if count is here (i.e. inside all loops) it'll count how many "1"s 83 | 84 | # If count is here, (i.e. out of the two for loops) 85 | # it'll count how many steps it'll need from the position 86 | # to go to all the other places (i.e. leetcode 994) 87 | 88 | return count 89 | ``` 90 | 91 | Some times it was asked to not stop until you hit the wall (i.e. lc 505), apparently it requires while loop, but the stop condition can be tricky. The trick is this: 92 | 93 | ```python 94 | while queue: 95 | _x, _y, step = queue.popleft() 96 | for dx, dy in directions: 97 | x, y, steps = _x, _y, step 98 | 99 | while 0 <= x + dx < rows and 0 <= y + dy < cols and maze[x+dx][y+dy] == 0: 100 | x += dx 101 | y += dy 102 | steps += 1 103 | ``` 104 | 105 | This way the stop condition is handled properly. 106 | 107 | 108 | Reference: 109 | 110 | - [Tips for all DFS in matrix question](https://leetcode.com/problems/pacific-atlantic-water-flow/discuss/90739/python-dfs-bests-85-tips-for-all-dfs-in-matrix-question) 111 | 112 | Practice: 113 | 114 | - [200. Number of Islands](https://leetcode.com/problems/number-of-islands/) 115 | - [130. Surrounded Regions](https://leetcode.com/problems/surrounded-regions/) 116 | - [695. Max Area of Island](https://leetcode.com/problems/max-area-of-island/) 117 | - [994. Rotting Oranges](https://leetcode.com/problems/rotting-oranges/) 118 | - [505. The Maze II](https://leetcode.com/problems/the-maze-ii/) 119 | -------------------------------------------------------------------------------- /Templates/linked_list.md: -------------------------------------------------------------------------------- 1 | # Linked List 2 | 3 | - Each node object must hold at least two pieces of information. 4 | 1. the node must contain the list item itself (i.e. data field). 5 | 2. each node must hold a reference to the next node. 6 | 7 | ## Tips and Template 8 | 9 | - Traverse a linked list 10 | 11 | ```python 12 | node = root 13 | while node: 14 | print(node.val) 15 | node = node.next 16 | ``` 17 | 18 | - When we talk about linked list, we are normally talk about singly linked list. There is also a Doubly Linked list defined as follows: 19 | 20 | ![Leetcode Linked List Learn Doubly Linked List](https://s3-lc-upload.s3.amazonaws.com/uploads/2018/04/17/screen-shot-2018-04-17-at-161130.png) 21 | 22 | ```python 23 | class Node 24 | def __init__(self, val): 25 | self.val = val 26 | self.prev = None 27 | self.next = None 28 | ``` 29 | 30 | - Linked list in-place operation can be confusing (i.e. insert or delete), its better to ***draw the structure*** and it'll become much more obvious 31 | 32 | For example when deleting a node in the middle : 33 | 34 | ![Leetcode Linked List Learn Delete Operation](https://s3-lc-upload.s3.amazonaws.com/uploads/2018/04/26/screen-shot-2018-04-26-at-203640.png) 35 | 36 | ```python 37 | node = self.get_node(index) 38 | prev_node = node.prev 39 | next_node = node.next 40 | prev_node.next = next_node 41 | next_node.prev = prev_node 42 | ``` 43 | 44 | - Its better to use a ***Dummy head*** most of time, especially when deleting node is required 45 | 46 | For example: 47 | 48 | ```python 49 | def linked_list(root): 50 | dummy = Node("x") 51 | dummy.next = root 52 | 53 | # Do your logic here 54 | 55 | return dummy.next 56 | ``` 57 | 58 | - In many cases, you need to track the previous node of the current node. 59 | 60 | ```python 61 | dummy = Node("x") 62 | dummy.next = root 63 | 64 | prev = dummy 65 | node = head 66 | 67 | while node: 68 | prev = node 69 | node = node.next 70 | ``` 71 | 72 | - ***Two pointer*** approach is widely used in linked list, such as detecting cycle, remove n-th node etc 73 | 74 | For example when detecting cycle: 75 | ```python 76 | def hasCycle(self, head: ListNode) -> bool: 77 | slow = head 78 | fast = head 79 | 80 | while fast and fast.next: 81 | slow = slow.next 82 | fast = fast.next.next 83 | 84 | if slow == fast: 85 | return True 86 | 87 | return False 88 | ``` 89 | 90 | - It is common to use ***Hashmap or hashset*** to store visited nodes, its widely use to find intersection or beginning of cyclic linked list 91 | 92 | ```python 93 | def detectCycle(self, head: ListNode) -> ListNode: 94 | visited = set() 95 | while head: 96 | if head.next in visited: 97 | return head.next 98 | visited.add(head) 99 | head = head.next 100 | return 101 | ``` 102 | 103 | 104 | Here is a great comparison of ***time complexity*** between the linked list and the array from Leetcode. 105 | 106 | ![Leetcode Linked List Learn Conclusion](https://assets.leetcode.com/uploads/2020/10/02/comparison_of_time_complexity.png) 107 | 108 | 109 | Reference: 110 | 111 | - [Leetcode Introduction to Linked List](https://leetcode.com/explore/learn/card/linked-list/) 112 | 113 | 114 | Practice: 115 | 116 | - Basic questions: 117 | - [707. Design Linked List](https://leetcode.com/problems/design-linked-list/) 118 | - [141. Linked List Cycle](https://leetcode.com/problems/linked-list-cycle/) 119 | - [206. Reverse Linked List](https://leetcode.com/problems/reverse-linked-list/) 120 | - [21. Merge Two Sorted Lists](https://leetcode.com/problems/merge-two-sorted-lists/) 121 | 122 | - Tricky questions: 123 | - [142. Linked List Cycle II - Find linked list cycle start point](https://leetcode.com/problems/linked-list-cycle-ii/) 124 | - [430. Flatten a Multilevel Doubly Linked List](https://leetcode.com/problems/flatten-a-multilevel-doubly-linked-list/) 125 | - [138. Copy List with Random Pointer](https://leetcode.com/problems/copy-list-with-random-pointer/) 126 | -------------------------------------------------------------------------------- /Templates/matrix_traversal.md: -------------------------------------------------------------------------------- 1 | # Matrix Traversal 2 | 3 | Here we are using 2D matrix as example, but the idea could be applied to multi-dimension matrix 4 | 5 | ## Template 6 | 7 | ## Generic Template 8 | 9 | ```python 10 | def traverse(matrix): 11 | rows, cols = len(matrix), len(matrix[0]) 12 | 13 | for row in range(rows): 14 | for col in range(cols): 15 | print(matrix[row][col]) 16 | ``` 17 | 18 | ## Transpose Matrix 19 | 20 | ```python 21 | def transpose(matrix): 22 | rows, cols = len(matrix), len(matrix[0]) 23 | 24 | for row in range(rows): 25 | for col in range(row, cols): 26 | matrix[row][col], matrix[col][row] = matrix[col][row], matrix[row][col] 27 | ``` 28 | 29 | ## Reverse Row 30 | 31 | ```python 32 | def reverse_row(matrix): 33 | rows, cols = len(matrix), len(matrix[0]) 34 | 35 | for row in range(rows): 36 | for col in range(cols//2): 37 | matrix[row][col], matrix[row][cols-1-col] = matrix[row][cols-1-col], matrix[row][col] 38 | ``` 39 | 40 | ## Reverse Column 41 | 42 | ```python 43 | def reverse_col(matrix): 44 | rows, cols = len(matrix), len(matrix[0]) 45 | 46 | for row in range(rows//2): 47 | for col in range(cols): 48 | matrix[row][col], matrix[rows-1-row][col] = matrix[rows-1-row][col], matrix[row][col] 49 | ``` 50 | 51 | ## Rotation Template 52 | 53 | - Rotate 90 degrees clockwise: transpose + reverse row 54 | - Rotate 90 degrees anti-clockwise: transpose + reverse column 55 | - Rotate 180 degrees: reverse row + reverse column 56 | 57 | ## Spiral Traverse 58 | 59 | This one is a bit tricky, and the idea is a little different than above. 60 | 61 | Instead of looping through each column and then row, we can iterate through the total number of coordinates and increment row and col count accordingly. 62 | 63 | This idea can also be used in above rotations. 64 | 65 | ```python 66 | def spiral_matrix(matrix): 67 | rows, cols = len(matrix), len(matrix[0]) 68 | 69 | row, col, k = 0, 0, 0 # k is the boarder level 70 | for _ in range(rows * cols): 71 | # Note the order here is important 72 | if row == k: # Top 73 | if col == cols - 1 - k: # Top right 74 | row += 1 75 | else: 76 | col += 1 77 | continue 78 | 79 | if col == cols - 1 - k: # Right 80 | if row == rows - 1 - k: # Bottom Right 81 | col -= 1 82 | else: 83 | row += 1 84 | continue 85 | 86 | if row == rows - 1 - k: # Bottom 87 | if col == k: # Bottom Left 88 | row -= 1 89 | else: 90 | col -= 1 91 | continue 92 | 93 | if col == k: # Left 94 | row -= 1 95 | if row == k: # Boarder complete, go 1 layer inside 96 | k += 1 97 | row += 1 98 | col += 1 99 | ``` 100 | 101 | 102 | Reference: 103 | 104 | - [Rotate 90 clockwise, anti-clockwise, and rotate 180 degree](https://leetcode.com/problems/rotate-image/discuss/401356/Rotate-90-clockwise-anti-clockwise-and-rotate-180-degree) 105 | 106 | Practice: 107 | 108 | - [766. Toeplitz Matrix](https://leetcode.com/problems/toeplitz-matrix/) 109 | - [74. Search a 2D Matrix](https://leetcode.com/problems/search-a-2d-matrix/) 110 | - [867. Transpose Matrix](https://leetcode.com/problems/transpose-matrix/) 111 | - [832. Flipping an Image](https://leetcode.com/problems/flipping-an-image/) 112 | - [48. Rotate Image](https://leetcode.com/problems/rotate-image/) 113 | - [54. Spiral Matrix](https://leetcode.com/problems/spiral-matrix/) 114 | -------------------------------------------------------------------------------- /Templates/merge_sort.md: -------------------------------------------------------------------------------- 1 | # Merge Sort 2 | 3 | ## Implementation: 4 | 5 | - A recursive algorithm that continually splits a list in half. 6 | - If the list is empty or has one item, it is sorted by definition (the base case). 7 | - Once the two halves are sorted, the **merge** operation, which takes two smaller sorted lists and combining them together into a single, sorted, new list. 8 | - time complexity: O(n log n) 9 | 10 | 11 | ## Template 12 | 13 | ```python 14 | class Solution: 15 | def sortArray(self, nums: List[int]) -> List[int]: 16 | """Merge sort solution""" 17 | 18 | if not nums: 19 | return nums 20 | 21 | start = 0 22 | end = len(nums) - 1 23 | 24 | return self.merge_sort(nums, start, end, temp=[-1 for i in nums]) 25 | 26 | def merge_sort(self, nums, start, end, temp): 27 | if start >= end: 28 | return nums 29 | 30 | mid = (start + end) // 2 31 | 32 | self.merge_sort(nums, start, mid, temp) 33 | self.merge_sort(nums, mid+1, end, temp) 34 | return self.merge(nums, start, mid, end, temp) 35 | 36 | def merge(self, nums, start, mid, end, temp): 37 | left = start 38 | right = mid + 1 39 | index = start 40 | 41 | while left <= mid and right <= end: 42 | if nums[left] <= nums[right]: 43 | temp[index] = nums[left] 44 | left += 1 45 | else: 46 | temp[index] = nums[right] 47 | right += 1 48 | 49 | index += 1 50 | 51 | while left <= mid: 52 | temp[index] = nums[left] 53 | index += 1 54 | left += 1 55 | 56 | while right <= end: 57 | temp[index] = nums[right] 58 | index += 1 59 | right += 1 60 | 61 | for i in range(start, end+1): 62 | nums[i] = temp[i] 63 | index += 1 64 | 65 | return nums 66 | ``` 67 | 68 | 69 | Reference: 70 | 71 | - [Merge sort in Python - stackabuse](https://stackabuse.com/merge-sort-in-python/) 72 | - [Merge sort in Python - educative](https://www.educative.io/edpresso/merge-sort-in-python) 73 | 74 | Practice: 75 | 76 | - [912. Sort an Array](https://leetcode.com/problems/sort-an-array/) 77 | -------------------------------------------------------------------------------- /Templates/monotonic_stack.md: -------------------------------------------------------------------------------- 1 | # Monotonic Stack 2 | 3 | - A monotonic stack is a special form stack inside which all elements are sorted in either ascending or descending order 4 | 5 | ## Template 6 | 7 | ```python 8 | def monotonic_stack(nums): 9 | """An increasing monotonic stack, all elements are sorted in ascending order""" 10 | 11 | stack = [] 12 | res = [-1] * len(nums) 13 | 14 | # Template for descending traversal 15 | for i in range(len(nums)-1, -1, -1): 16 | while stack and nums[stack[-1]] <= nums[i]: 17 | stack.pop() # Remove existing element that is smaller than incoming element 18 | if stack: 19 | res[i] = nums[stack[-1]] 20 | stack.append(i) # use for next iteration 21 | 22 | # # Template for ascending traversal 23 | # for i in range(len(nums)): 24 | # while stack and (nums[stack[-1]] < nums[i]): 25 | # res[stack.pop()] = nums[i] 26 | # stack.append(i) 27 | return res 28 | ``` 29 | 30 | Reference: 31 | 32 | - [Using monotonic stack w/ analysis](https://leetcode.com/problems/next-greater-element-i/discuss/1113246/Cpp-Using-monotonic-stack-w-analysis) 33 | 34 | Practice: 35 | 36 | - [496. Next Greater Element I](https://leetcode.com/problems/next-greater-element-i/) 37 | - [503. Next Greater Element II](https://leetcode.com/problems/next-greater-element-ii/) 38 | - [739. Daily Temperatures](https://leetcode.com/problems/daily-temperatures/) 39 | - [1019. Next Greater Node In Linked List](https://leetcode.com/problems/next-greater-node-in-linked-list/) 40 | - [316. Remove Duplicate Letters](https://leetcode.com/problems/remove-duplicate-letters/) 41 | - [402. Remove K Digits](https://leetcode.com/problems/remove-k-digits/) 42 | - [42. Trapping Rain Water](https://leetcode.com/problems/trapping-rain-water/) 43 | - [84. Largest Rectangle in Histogram](https://leetcode.com/problems/largest-rectangle-in-histogram/) 44 | -------------------------------------------------------------------------------- /Templates/prim_spanning_tree.md: -------------------------------------------------------------------------------- 1 | # Prim’s Spanning Tree Algorithm 2 | 3 | - This is probably won't be in an interview, so code implementation is not provided, but good to know the concepts 4 | - The algorithm is for solving the problem to efficiently transfer a piece of information to anyone and everyone who may be listening. 5 | - **uncontrolled flooding:** the broadcast host sends a single copy of the broadcast message and lets the routers sort things out (i.e. the brute force way) 6 | - Each message starts with a time to live (*ttl*) value set to some number greater than or equal to the number of edges between the broadcast host and its most distant listener 7 | - Each router gets a copy of the message and passes the message on to all of its neighboring routers. 8 | - When the message is passed on the *ttl* is decreased. Each router continues to send copies of the message to all its neighbors until the *ttl* value reaches 0. 9 | - **Minimum weight spanning tree:** define a minimum spanning tree T for a graph G=(V,E) such that T is an acyclic subset of E that connects all the vertices in V. The sum of the weights of the edges in T is minimized. 10 | - the broadcast host sends a single copy of the broadcast message into the network 11 | - Each router forwards the message to any neighbor that is part of the spanning tree, excluding the neighbor that just sent it the message 12 | 13 | ## Prim’s algorithm 14 | 15 | - Prim’s algorithm belongs to the “greedy algorithms” family because at each step it chooses the cheapest next step, which is to follow the edge with the lowest weight in the spanning tree case 16 | - Algorithm: 17 | - while T is not yet a spanning tree 18 | - Find an edge that is safe to add to the tree 19 | - Add the new edge to T 20 | - A **safe edge** is any edge that connects a vertex that is in the spanning tree to a vertex that is not in the spanning tree 21 | - The algorithm is similar to Dijkstra’s algorithm and also uses a priority queue to select the next vertex to add to the growing graph 22 | - Select a starting node, and initialize all the other vertices to infinity 23 | - A node is not considered to be part of the spanning tree until it is removed from the priority queue. 24 | - Always examine the smallest distance and update the predecessor links 25 | 26 | ## Reference: 27 | 28 | - [Prim’s Spanning Tree Algorithm](https://runestone.academy/runestone/books/published/pythonds/Graphs/PrimsSpanningTreeAlgorithm.html) 29 | -------------------------------------------------------------------------------- /Templates/quick_sort.md: -------------------------------------------------------------------------------- 1 | # Quick Sort 2 | 3 | ## Implementation: 4 | 5 | - First selects a value (**pivot value**), and then use this value to assist with splitting the list. 6 | - The actual position where the pivot value belongs in the final sorted list, commonly called the **split point**, will be used to divide the list for subsequent calls to the quick sort. 7 | - The **partition** process then finds the split point and at the same time move other items to the appropriate side of the list, either less than or greater than the pivot value. 8 | - Partitioning begins by locating two position markers (i.e. *leftmark* and *rightmark*) at the beginning and end of the remaining items in the list. It will find the split point and at the same time move other items to the appropriate side of the list, either less than or greater than the pivot value. 9 | - At the point where rightmark becomes less than leftmark, the position of rightmark is now the split point. The pivot value can be exchanged with the contents of the split point and the pivot value is now in place, all the items to the left of the split point are less than the pivot value, and all the items to the right of the split point are greater than the pivot value. The list can now be divided at the split point and the quick sort can be invoked recursively on the two halves. 10 | - **Median of three:** to choose the median value from the first, the middle, and the last element in the list. 11 | - time complexity: best case: O(n log n), worst case: O(n2) 12 | 13 | 14 | ## Template 15 | 16 | ```python 17 | class Solution: 18 | def sortArray(self, nums: List[int]) -> List[int]: 19 | """Quick sort solution""" 20 | 21 | if not nums: 22 | return nums 23 | 24 | start = 0 25 | end = len(nums) - 1 26 | 27 | self.quick_sort(nums, start, end) 28 | return nums 29 | 30 | def quick_sort(self, nums, start, end): 31 | if start >= end: 32 | return nums 33 | 34 | left, right = start, end 35 | pivot = nums[(start+end)//2] 36 | 37 | while left <= right: 38 | while left <= right and nums[left] < pivot: 39 | left += 1 40 | 41 | while left <= right and nums[right] > pivot: 42 | right -= 1 43 | 44 | if left <= right: 45 | nums[left], nums[right] = nums[right], nums[left] 46 | left += 1 47 | right -= 1 48 | 49 | self.quick_sort(nums, start, right) 50 | self.quick_sort(nums, left, end) 51 | 52 | 53 | class Solution: 54 | def sortArray(self, nums: List[int]) -> List[int]: 55 | """ 56 | Quick sort with 3 partition 57 | The main idea is that we still have the same pivot point, but 58 | instead of comparing with just smaller and greater or equal, 59 | we take out the equal case and process it individually. 60 | a[lo,lt-1] < pivot 61 | a[lt, i-1] = pivot 62 | a[i,gt] = unseen 63 | a[gt+1, hi] > pivot 64 | """ 65 | 66 | if not nums: 67 | return nums 68 | 69 | start = 0 70 | end = len(nums) - 1 71 | 72 | self.quick_sort(nums, start, end) 73 | 74 | return nums 75 | 76 | def quick_sort(self, nums, start, end): 77 | if start >= end: 78 | return nums 79 | 80 | left = start 81 | right = end 82 | index = start 83 | 84 | pivot = nums[(start+end)//2] 85 | 86 | while index <= right: 87 | if nums[index] < pivot: 88 | nums[index], nums[left] = nums[left], nums[index] 89 | left += 1 90 | index += 1 91 | elif nums[index] > pivot: 92 | nums[index], nums[right] = nums[right], nums[index] 93 | right -= 1 94 | else: 95 | index += 1 96 | 97 | self.quick_sort(nums, start, left - 1) 98 | self.quick_sort(nums, right + 1, end) 99 | ``` 100 | 101 | 102 | Reference: 103 | 104 | - [Quick sort in Python](https://stackabuse.com/quicksort-in-python/) 105 | - [Quicksort 3 way partition](https://gist.github.com/adonese/4bf34d5b57ee0358626c) 106 | 107 | Practice: 108 | 109 | - [912. Sort an Array](https://leetcode.com/problems/sort-an-array/) 110 | -------------------------------------------------------------------------------- /Templates/sliding_window.md: -------------------------------------------------------------------------------- 1 | # Sliding Window 2 | 3 | ***Fundamentally this is a two pointer approach*** 4 | 5 | How it works: 6 | 7 | - A general way is to use a hashmap assisted with two pointers. 8 | - Use two pointers: start and end to represent a window. 9 | - Move end to find a valid window. 10 | - When a valid window is found, move start to find a smaller window. 11 | - To check if a window is valid, we use a map to store (char, count) for chars in t, and use counter for the number of chars of t to be found in s. 12 | 13 | There are two keys here: 14 | 1. the two pointers both start from the beginning of the second string initially 15 | 2. move j until `j - i == len(first_string)` 16 | 17 | 18 | ```python 19 | def sliding_window(nums): 20 | left = 0 # initiate the left boundary of window 21 | for right in range(len(nums)): # iterate the right boundary of window 22 | while is_valid_window: 23 | left += 1 # Reduce left boundary to shrink window size 24 | ``` 25 | 26 | The template code is fairly simple. 27 | 28 | Basically we iterate through the given list, and create a window. 29 | 30 | When there the condition is satisfied (i.e. subset of the list satisfied the required returning condition), we reduce the left side to shrink the window, to find the best solution (most cases its the minimum subset of given list) 31 | 32 | 33 | Reference: 34 | 35 | - [Leetcode 567 detailed explanation](https://leetcode.com/problems/permutation-in-string/discuss/638531/Java-or-C%2B%2B-or-Python3-or-Detailed-explanation-or-O(N)-time) 36 | - [10-line template that can solve most 'substring' problems 37 | ](https://leetcode.com/problems/minimum-window-substring/discuss/26808/here-is-a-10-line-template-that-can-solve-most-substring-problems) 38 | - [Sliding Window algorithm template to solve all the Leetcode substring search problem.](https://leetcode.com/problems/find-all-anagrams-in-a-string/discuss/92007/Sliding-Window-algorithm-template-to-solve-all-the-Leetcode-substring-search-problem.) 39 | 40 | 41 | Practice: 42 | 43 | - [594. Longest Harmonious Subsequence](https://leetcode.com/problems/longest-harmonious-subsequence/) 44 | - [3. Longest Substring Without Repeating Characters](https://leetcode.com/problems/longest-substring-without-repeating-characters/) 45 | - [159. Longest Substring with At Most Two Distinct Characters](https://leetcode.com/problems/longest-substring-with-at-most-two-distinct-characters/) 46 | - [438. Find All Anagrams in a String](https://leetcode.com/problems/find-all-anagrams-in-a-string/) 47 | - [1156. Swap For Longest Repeated Character Substring](https://leetcode.com/problems/swap-for-longest-repeated-character-substring/) 48 | - [1004. Max Consecutive Ones III](https://leetcode.com/problems/max-consecutive-ones-iii/) 49 | - [76. Minimum Window Substring](https://leetcode.com/problems/minimum-window-substring/) 50 | - [30. Substring with Concatenation of All Words](https://leetcode.com/problems/substring-with-concatenation-of-all-words/) 51 | -------------------------------------------------------------------------------- /Templates/topological_sort.md: -------------------------------------------------------------------------------- 1 | # Topological Sort 2 | 3 | A topological sort takes a directed acyclic graph and produces a linear ordering of all its vertices such that if the graph G contains an edge (v,w) then the vertex v comes before the vertex w in the ordering. 4 | 5 | Two important steps for topological sort is: 6 | 7 | 1. Find the in degree for each node 8 | 2. Construct the adjacency list for the graph 9 | 10 | 11 | ```python 12 | def findOrder(self, numCourses: int, prerequisites: List[List[int]]) -> List[int]: 13 | """ 14 | Topological sort, works for DAG (Directed acyclic graph) 15 | 1. Calculate the in-degree for all points, initiate empty adj_list 16 | 2. Put all the 0-degree points (i.e. point that has no neighbor points) into BFS queue 17 | 3. Pop point from BFS queue and put it in the topo queue. Each time of the process visit 18 | all the neighbor points and reduce their in-degrees by 1 19 | 4. if the in-degree becomes 0, put them in the queue 20 | 5. End when the queue is empty 21 | 22 | https://sugarac.gitbooks.io/facebook-interview-handbook/jiu-zhang-dai-ma-mo-ban.html 23 | https://leetcode.com/problems/course-schedule-ii/discuss/368716/Python3-Breadth-first-search 24 | """ 25 | 26 | # Init a list stores No. of incoming edges of each vertex and 27 | # Init map (an adjacency list) to record the node's children 28 | in_degree = defaultdict(int) 29 | adj_list = defaultdict(list) 30 | 31 | # Build map to put the child into parent's list 32 | for current_course, prev_course in prerequisites: 33 | adj_list[prev_course].append(current_course) 34 | in_degree[current_course] += 1 # a directed edge 35 | 36 | # a queue of all vertices with no incoming edge 37 | queue = [] 38 | for node in range(numCourses): 39 | if node not in in_degree: 40 | queue.append(node) 41 | 42 | res = [] 43 | while queue: 44 | node = queue.pop(0) # BFS, pops vertex 45 | res.append(node) 46 | for neighbor in adj_list[node]: 47 | # for each descendant of current vertex, reduce its in-degree by 1 48 | in_degree[neighbor] -= 1 49 | 50 | if in_degree[neighbor] == 0: 51 | queue.append(neighbor) 52 | del in_degree[neighbor] 53 | return res if len(res)==numCourses else [] 54 | ``` 55 | 56 | Practice: 57 | 58 | - [207. Course Schedule](https://leetcode.com/problems/course-schedule/) 59 | - [210. Course Schedule II](https://leetcode.com/problems/course-schedule-ii/) 60 | - [1136. Parallel Courses](https://leetcode.com/problems/parallel-courses/) 61 | - [269. Alien Dictionary](https://leetcode.com/problems/alien-dictionary/) 62 | - [1203. Sort Items by Groups Respecting Dependencies](https://leetcode.com/problems/sort-items-by-groups-respecting-dependencies/) 63 | 64 | 65 | Reference: 66 | 67 | - [Topological sort explain (Chinese)](https://sugarac.gitbooks.io/facebook-interview-handbook/content/jiu-zhang-dai-ma-mo-ban.html) 68 | - [Solution for 210](https://leetcode.com/problems/course-schedule-ii/discuss/368716/Python3-Breadth-first-search) 69 | -------------------------------------------------------------------------------- /Templates/tree_traversal.md: -------------------------------------------------------------------------------- 1 | # Tree Traversal 2 | 3 | Tree traversal is critical for solving tree and graph related problems 4 | 5 | Here are a few different ways to traverse a tree 6 | 7 | - BFS (Level order) 8 | - DFS 9 | - **preorder:** visit root first, then recursively do traversal of the left subtree,then a recursive traversal of the right subtree. 10 | - root -> left -> right 11 | - **inorder:** recursively do traversal on the left subtree, then the root node, and finally do traversal of the right subtree. 12 | - left -> root -> right 13 | - **postorder:** do traversal of the left subtree and the right subtree, finally to the root node. 14 | - left -> right -> root 15 | 16 | To better understand this, here are some plots: 17 | 18 | Pre-order 19 | 20 | ![Pre-order traversal](../images/Preorder.png) 21 | 22 | Pre-order 23 | 24 | ![In-order traversal](../images/Inorder.png) 25 | 26 | Pre-order 27 | 28 | ![Post-order traversal](../images/Postorder.png) 29 | 30 | DFS Summary 31 | 32 | ![DFS traversal summary](../images/Tree-DFS.png) 33 | 34 | ## Template 35 | 36 | ### BFS 37 | 38 | ```python 39 | def bfs_traverse(root): 40 | if not root: 41 | return [] 42 | 43 | res = [] 44 | queue = [root] # Its better to use deque here, i.e. deque([root]) 45 | while queue: 46 | level = [] 47 | queue_size = len(queue) 48 | for i in range(queue_size): 49 | node = queue.pop(0) 50 | level.append(node.val) 51 | if node.left: 52 | queue.append(node.left) 53 | 54 | if node.right: 55 | queue.append(node.right) 56 | 57 | res.append(level) 58 | 59 | return res 60 | ``` 61 | 62 | ### DFS 63 | 64 | ```python 65 | # Recursive 66 | def dfs_traverse(root): 67 | if not root: 68 | return 69 | 70 | # Pre-Order 71 | dfs_traverse(root.left) 72 | # In-Order 73 | dfs_traverse(root.right) 74 | # Post-Order 75 | 76 | # Iterative 77 | def preorder(root): 78 | stack = [] 79 | res = [] 80 | 81 | while root or stack: 82 | if root: 83 | res.append(root.val) 84 | stack.append(root) 85 | root = root.left 86 | elif stack: 87 | node = stack.pop() 88 | root = node.right 89 | 90 | return res 91 | 92 | def inorder(root): 93 | stack = [] 94 | res = [] 95 | 96 | while root or stack: 97 | if root: 98 | stack.append(root) 99 | root = root.left 100 | elif stack: 101 | node = stack.pop() 102 | res.append(node.val) 103 | root = node.right 104 | 105 | return res 106 | 107 | def postorder(root): 108 | """ 109 | There are multiple ways to do a post order traversal iteratively, such as 110 | using two stacks, one stack, reversing pre-order, or Morris traversal 111 | Here is a template for using one stack 112 | """ 113 | stack = [] 114 | res = [] 115 | 116 | while stack or root: 117 | 118 | while root: 119 | if root.right: 120 | stack.append(root.right) 121 | stack.append(root) 122 | root = root.left 123 | 124 | root = stack.pop() 125 | 126 | if root.right and stack and stack[-1] == root.right: 127 | stack.pop() 128 | stack.append(root) 129 | root = root.right 130 | 131 | else: 132 | res.append(root.val) 133 | root = None 134 | return res 135 | ``` 136 | 137 | 138 | Reference: 139 | 140 | - [Iterative Postorder Traversal | Set 2 (Using One Stack)](https://www.geeksforgeeks.org/iterative-postorder-traversal-using-stack/) 141 | - [Morris traversal (O(1) space pre-order traversal)](https://www.educative.io/edpresso/what-is-morris-traversal) 142 | - [Leetcode Binary Tree](https://leetcode.com/explore/learn/card/data-structure-tree/) 143 | 144 | Practice: 145 | 146 | - [144. Binary Tree Preorder Traversal](https://leetcode.com/problems/binary-tree-preorder-traversal/) 147 | - [94. Binary Tree Inorder Traversal](https://leetcode.com/problems/binary-tree-inorder-traversal/) 148 | - [145. Binary Tree Postorder Traversal](https://leetcode.com/problems/binary-tree-postorder-traversal/) 149 | - [102. Binary Tree Level Order Traversal](https://leetcode.com/problems/binary-tree-level-order-traversal/) 150 | - [106. Construct Binary Tree from Inorder and Postorder Traversal](https://leetcode.com/problems/construct-binary-tree-from-inorder-and-postorder-traversal/) 151 | - [105. Construct Binary Tree from Preorder and Inorder Traversal](https://leetcode.com/problems/construct-binary-tree-from-preorder-and-inorder-traversal/) 152 | - [236. Lowest Common Ancestor of a Binary Tree](https://leetcode.com/problems/lowest-common-ancestor-of-a-binary-tree/) 153 | - [297. Serialize and Deserialize Binary Tree](https://leetcode.com/problems/serialize-and-deserialize-binary-tree/) 154 | -------------------------------------------------------------------------------- /Templates/trie.md: -------------------------------------------------------------------------------- 1 | # Trie (Prefix Tree) 2 | 3 | ```python 4 | class TrieNode: 5 | def __init__(self): 6 | self.children = collections.defaultdict(TrieNode) 7 | self.is_word = False 8 | 9 | class Trie: 10 | 11 | def __init__(self): 12 | self.root = TrieNode() 13 | 14 | def insert(self, word): 15 | current = self.root 16 | for letter in word: 17 | current = current.children[letter] 18 | current.is_word = True 19 | 20 | def search(self, word): 21 | current = self.root 22 | for letter in word: 23 | current = current.children.get(letter) 24 | if current is None: 25 | return False 26 | return current.is_word 27 | 28 | def startsWith(self, prefix): 29 | current = self.root 30 | for letter in prefix: 31 | current = current.children.get(letter) 32 | if current is None: 33 | return False 34 | return True 35 | ``` 36 | 37 | Practice: 38 | 39 | - [720. Longest Word in Dictionary](https://leetcode.com/problems/longest-word-in-dictionary/) 40 | - [208. Implement Trie (Prefix Tree)](https://leetcode.com/problems/implement-trie-prefix-tree/) 41 | - [648. Replace Words](https://leetcode.com/problems/replace-words/) 42 | - [677. Map Sum Pairs](https://leetcode.com/problems/map-sum-pairs/) 43 | - [1268. Search Suggestions System](https://leetcode.com/problems/search-suggestions-system/) 44 | - [676. Implement Magic Dictionary](https://leetcode.com/problems/implement-magic-dictionary/) 45 | - [1023. Camelcase Matching](https://leetcode.com/problems/camelcase-matching/) 46 | 47 | Reference: 48 | 49 | - [Implementing a Trie in Python (in less than 100 lines of code)](https://towardsdatascience.com/implementing-a-trie-data-structure-in-python-in-less-than-100-lines-of-code-a877ea23c1a1) 50 | -------------------------------------------------------------------------------- /Templates/union_find.md: -------------------------------------------------------------------------------- 1 | # Union Find 2 | 3 | - Union Find (or disjoint-set) is a very elegant data structure 4 | - Essentially it utilizes a list representation for the joint data points 5 | - the index of the data point indicates its linkage status 6 | - For detailed explanation, please see this [Lecture Notes](https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf) 7 | 8 | ## Template 9 | 10 | ```python 11 | class UnionFind: 12 | def __init__(self, n): 13 | self.count = n 14 | self.parent = list(range(n)) # root list 15 | self.size = [1] * n # weight 16 | 17 | def find(self, i): 18 | while self.parent[i] != i: 19 | self.parent[i] = self.parent[self.parent[i]] # Path compression 20 | i = self.parent[i] 21 | return i 22 | 23 | def union(self, p, q): 24 | i, j = self.find(p), self.find(q) 25 | 26 | if i == j: 27 | return 28 | 29 | # merge smaller tree into larger tree to obtain a flat structure 30 | if self.size[i] < self.size[j]: 31 | self.parent[i] = j 32 | self.size[j] += self.size[i] 33 | else: 34 | self.parent[j] = i 35 | self.size[i] += self.size[j] 36 | 37 | self.count -= 1 38 | ``` 39 | 40 | 41 | Reference: 42 | 43 | - [Union Find with Explanations (Java / Python)](https://leetcode.com/problems/redundant-connection/discuss/123819/Beats-97.96-Union-Find-Java-with-Explanations) 44 | - [[Python] Solved by Union Find Template](https://leetcode.com/problems/couples-holding-hands/discuss/391618/python-solved-by-union-find-template) 45 | - [Python, Weighted Union+Find with Path Compression](https://leetcode.com/problems/number-of-provinces/discuss/1244070/Python-Weighted-Union%2BFind-with-Path-Compression) 46 | - [[Python3] Classic Union Find Solution](https://leetcode.com/problems/number-of-islands-ii/discuss/1083904/Python3-Classic-Union-Find-Solution) 47 | 48 | Practice: 49 | 50 | - [684. Redundant Connection](https://leetcode.com/problems/redundant-connection/) 51 | - [547. Number of Provinces](https://leetcode.com/problems/number-of-provinces/) 52 | - [765. Couples Holding Hands](https://leetcode.com/problems/couples-holding-hands/) 53 | - [305. Number of Islands II](https://leetcode.com/problems/number-of-islands-ii/) 54 | -------------------------------------------------------------------------------- /images/Inorder.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zmcddn/coding-interview-guide/0da5588f52ba92a2ce8fc03d6fca9a1ae394e4b5/images/Inorder.png -------------------------------------------------------------------------------- /images/Postorder.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zmcddn/coding-interview-guide/0da5588f52ba92a2ce8fc03d6fca9a1ae394e4b5/images/Postorder.png -------------------------------------------------------------------------------- /images/Preorder.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zmcddn/coding-interview-guide/0da5588f52ba92a2ce8fc03d6fca9a1ae394e4b5/images/Preorder.png -------------------------------------------------------------------------------- /images/System-Components.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zmcddn/coding-interview-guide/0da5588f52ba92a2ce8fc03d6fca9a1ae394e4b5/images/System-Components.png -------------------------------------------------------------------------------- /images/Tree-DFS.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zmcddn/coding-interview-guide/0da5588f52ba92a2ce8fc03d6fca9a1ae394e4b5/images/Tree-DFS.png -------------------------------------------------------------------------------- /images/how-to-use-the-repo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zmcddn/coding-interview-guide/0da5588f52ba92a2ce8fc03d6fca9a1ae394e4b5/images/how-to-use-the-repo.png --------------------------------------------------------------------------------