├── KeyValueStore.md ├── README.md ├── api-design-neetcode.md ├── api-design.md ├── api-paradigms.md ├── browser-storages.md ├── cdns.md ├── communication.md ├── configuration.md ├── design-requirements-neetcode.md ├── diagramming.md ├── estimation.md ├── event-loop.md ├── gathering-system-requirements.md ├── leader-election.md ├── logging-and-monitoring.md ├── only-cloud-services-neetocde.md ├── peer-to-peer-networking.md ├── planning.md ├── pollingstreaming.md ├── pub-sub.md ├── rate-limiting.md ├── replication-sharding.md ├── security-https.md ├── special-storages.md ├── web-sockets.md └── web-workers.md /KeyValueStore.md: -------------------------------------------------------------------------------- 1 | # Key Value Store 2 | 3 | Relational Databases have strict structures and schemas, they can be good, but sometimes we need more flexibility. Key-Value Stores are a type of NoSQL database that allows you to store data without a strict schema. This means you can store any type of data in any format, and you can change the format at any time. This flexibility makes Key-Value Stores a good choice for certain types of data, such as user preferences, session data, and other unstructured data. 4 | 5 | Key Value stores are one of the most popular types of NoSQL databases. They are used by many large companies, such as Amazon, Facebook, and Google, to store and manage large amounts of data. 6 | 7 | One of the use cases is caching, where you can store the results of expensive operations in a Key-Value Store, so you don’t have to perform the operation again. 8 | 9 | Feature flagging is another use case, where you can store feature flags in a Key-Value Store, so you can turn features on and off without having to redeploy your application. 10 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Design Fundamentals 2 | 3 | Four categories: 4 | 5 | - Foundational knowledge: Basic concepts to even begin tackling a design. 6 | - Key characteristics: Latency, throughput, consistency, availability, partition tolerance. 7 | - Actual components: Load balancers, databases, caches, proxies, etc. 8 | - Actual Technologies: Redis, S3, Memcached, Cassandra, etc. 9 | 10 | # Client-Server Model 11 | 12 | Clients sends requests to servers, servers process requests and send back responses. 13 | 14 | Let's take an example, a web browser and the Youtube server. 15 | 16 | At first, client knows nothing about the server. That's where DNS comes in. It resolves the domain name to an IP address. 17 | 18 | IP address is a unique identifier for the server. It's like a phone number. In the world of internet, communication happens through IP addresses. 19 | 20 | Once the client knows the IP address, it can send an HTTP request to the server. 21 | 22 | HTTP stands for HyperText Transfer Protocol. It's a protocol that defines how the client and server should communicate. 23 | 24 | When you make a request to a server, you don't just have to specify the IP address, you also have to specify the port number. Port number is like an extension number. It's a way to specify which service you want to talk to on the server. 25 | 26 | It's because every server can run multiple services. For example, a web server can also run an email server. So, you have to specify which service you want to talk to. 27 | 28 | # Network Protocols 29 | 30 | A protocol is just a set of rules that define how two entities should communicate. 31 | 32 | ## IP 33 | 34 | IP stands for Internet Protocol. It's a protocol that defines how data should be sent across the internet. 35 | 36 | Modern internet runs on IP. Machines communicate with each other using IP. They send IP packets to each other. 37 | 38 | There are multiple versions of IP, main ones are 4 and 6. IPv4 and IPv6. 39 | 40 | An IP packet is a small chunk of data. IP packet contains: 41 | 42 | - header: metadata about the packet e.g. source, destination, total size, version of IP. Between 20 and 60 bytes. 43 | - rest of the part is the actual data. Maximum size is 2^16 bytes. It's a problem because it's too small for modern applications. That's why we need to send multiple packets to send a single file. 44 | 45 | ## TCP 46 | 47 | This is where protocols like TCP come in. 48 | 49 | TCP stands for Transmission Control Protocol. It's a protocol that defines how to send data across the internet. It solves the problem of IP packets being too small. It allows us to send large files across the internet and ensures that the data is delivered in the correct order. 50 | 51 | In the data part of the IP packet, we have a TCP header and the actual data. TCP header contains metadata about the data e.g. source port, destination port, sequence number, acknowledgment number, etc. 52 | 53 | TCP handshake: Before sending data, the client and server have to establish a connection. This is done through a three-way handshake: 54 | 55 | 1. Client sends a SYN packet to the server. SYN stands for synchronize. It's a request to establish a connection. 56 | 2. Server sends a SYN-ACK packet to the client. It's a response to the client's request. It acknowledges the client's request and also sends a request to establish a connection. 57 | 3. Client sends an ACK packet to the server. It acknowledges the server's request. 58 | 59 | We need HTTP however, because TCP is just a protocol for sending data. It doesn't define how the data should be structured. That's where HTTP comes in. 60 | 61 | ## HTTP 62 | 63 | HTTP stands for HyperText Transfer Protocol. It's a protocol that defines how the client and server should communicate. 64 | 65 | It's needed on top of TCP because TCP is just a protocol for sending data. It doesn't define how the data should be structured. 66 | 67 | Requests typically look like this: 68 | 69 | ``` 70 | host: www.youtube.com 71 | port: 80 72 | method: GET 73 | path: /watch?v=123 74 | headers: pair list 75 | body: sequence of bytes 76 | ``` 77 | 78 | Responses typically look like this: 79 | 80 | ``` 81 | status code: 200 82 | headers: pair list 83 | body: sequence of bytes 84 | ``` 85 | 86 | # Storage 87 | 88 | A database is just a server. It's a server that stores data. Your computer itself can be a database server. 89 | 90 | If a power outage happen and database is back, will data be there? Well, not always, and it is an incorrect assumption to think that it will be there. 91 | 92 | This leads to two fundamental concepts in databases: 93 | 94 | - Disk 95 | - Memory 96 | 97 | Writing data to disk guarantees that the data will be there even if the power goes out. But writing to disk is slow. Reading from disk is also slow. 98 | 99 | Writing to memory is fast. Reading from memory is also fast. But memory is volatile. If the power goes out, the data is lost. 100 | 101 | # Latency and Throughput 102 | 103 | ## Latency 104 | 105 | How long it takes for data to travel from one point to another e.g. how long it takes for a request to travel from the client to the server. 106 | 107 | Reading 1MB sequentially from memory takes about 250 microseconds. Reading 1MB sequentially from SSD takes about 1 millisecond which is 4 times slower than memory aka 1000 microseconds. 108 | 109 | 1 MB over 1 Gbps link takes about 10 000 microseconds. 110 | 111 | Packet round trip time from California to Netherlands and back takes about 150 milliseconds aka 150 000 microseconds. 112 | 113 | Video games would want to have low latency. They want to have a fast response time. Lag in games is a result of high latency. 114 | 115 | ## Throughput 116 | 117 | How much data can be transferred from one point to another in a given amount of time e.g. how many requests can be processed per second. 118 | 119 | This is measured in bits per second. 1 Gbps link can transfer 1 billion bits per second. 120 | 121 | # Availability 122 | 123 | Availability: How much time is the system operational? 124 | 125 | If you purchased a membership on a platform, you expect it to be available 24/7. If it's not, you're not getting your money's worth, and the business itself is losing money. 126 | 127 | ## High Availability 128 | 129 | Some systems require stronger availability guarantees. For example, a hospital's database. If the database goes down, the hospital can't access patient records. This can be life-threatening. Another example is a plane's control system. If the control system goes down, the plane can crash. 130 | 131 | ## Nines 132 | 133 | Availability is often measured in nines. For example, 99.9% availability means that the system is operational 99.9% of the time. This means that the system can be down for 8.76 hours per year. 134 | 135 | This is called, two nones, three nines, four nines, five nines, etc. Those translate into downtime per year, month, week and per day. 136 | 137 | ## SLA 138 | 139 | Service Level Agreement. It's a contract between a service provider and a customer. It defines the level of service that the customer should expect. SLAs often include availability guarantees. SLAs are made of multiple SLOs. 140 | 141 | ## SLO 142 | 143 | SLO and SLA are often used interchangeably, however, they are different. SLO stands for Service Level Objective. It's a target value for a service level. It's a part of the SLA. 144 | 145 | ## How to achieve high availability? 146 | 147 | One important thing is not having a single point of failure. If a single component goes down, the entire system goes down. For example, if a single server goes down, the entire system goes down. This is a single point of failure. 148 | 149 | To counter this, redundancy is used. Redundancy means having multiple components that can do the same thing. For example, having multiple servers that can serve the same data. Though in this case we'd need a load balancer to distribute the load between the servers. 150 | 151 | But wait, now the load balancer is a single point of failure? Yes, it is. So, we need to have multiple load balancers. 152 | 153 | This type of redundancy is called passive redundancy. It's called passive because the redundant components are not doing anything. They're just sitting there, waiting for the main component to fail. 154 | 155 | Active redundancy is when the redundant components are doing something. For example, having multiple servers that can serve the same data and are all serving the data at the same time. 156 | 157 | # Caching 158 | 159 | Caching is a technique that stores data in memory. It's a way to speed up the response time of a request. 160 | 161 | Caching is used to improved latency of a system. 162 | 163 | You can do caching at multiple levels. Client, Server, DB, etc. 164 | 165 | Example where its useful: Client makes a lot of requests to the server. The server makes a lot of requests to the database. If the server caches the data, it doesn't have to make requests to the database. This speeds up the response time of the server. Or you could cache between client and server. 166 | 167 | Static files are often cached. For example, images, CSS, JavaScript, etc. These files don't change often. So, there's no need to request them from the server every time. They can be cached in the client's browser. 168 | 169 | At the server level, you can cache the results of a database query. If the same query is made again, the server can just return the cached result. This speeds up the response time of the server. 170 | 171 | ## write through cache 172 | 173 | When you write to the cache, you also write to the database. This ensures that the cache and the database are always in sync. 174 | 175 | Pros: 176 | 177 | - Data is always in sync. 178 | - If the cache goes down, the data is still in the database. 179 | 180 | Cons: 181 | 182 | - Writing to the database is slow. 183 | - If the database goes down, the cache is useless. 184 | 185 | ## Write back cache 186 | 187 | When you write to the cache, you don't write to the database. You just write to the cache. The cache is responsible for writing to the database. 188 | 189 | Pros: 190 | 191 | - Faster than write through cache. 192 | - If the database goes down, the data is still in the cache. 193 | 194 | Cons: 195 | 196 | - If the cache goes down, the data is lost. 197 | 198 | ## Consistency 199 | 200 | Caching can cause consistency issues. For example, if the server caches the results of a database query, and the database is updated, the cache is now out of date. 201 | 202 | You have to ask yourself, how important is consistency? For example, if you're building a social media platform, consistency is not that important. If a user posts a photo and it takes a few seconds for the photo to show up, it's not a big deal. But if you're building a banking system, consistency is very important. If a user transfers money from one account to another, the money should be transferred immediately. 203 | 204 | ## General rules 205 | 206 | - Cache data that doesn't change often. 207 | - Cache data that's accessed often. 208 | - Cache data that's expensive to compute. 209 | - Cache data that's far away. 210 | - Cache data that's large. 211 | 212 | ## Cache eviction 213 | 214 | When the cache is full, and you want to add new data to the cache, you have to remove some data from the cache. This is called cache eviction. 215 | 216 | There are multiple ways to do cache eviction. The most common way is LRU (Least Recently Used). 217 | 218 | LRU evicts the data that was accessed the least recently. It's called LRU because it evicts the data that was accessed the least recently. 219 | 220 | LRU eviction is a simple way to do cache eviction. It's easy to implement and it works well. 221 | 222 | # Proxies 223 | 224 | A forward proxy is a server that sits between the client and the server. The client sends requests to the proxy, and the proxy forwards the requests to the server. The server sends responses to the proxy, and the proxy forwards the responses to the client. The forard proxy acts on behalf of the client. This is useful for e.g. privacy and security, serving as a way to hide the client's IP address. 225 | 226 | A reverse proxy is a server that sits between the client and the server. The client sends requests to the reverse proxy, and the reverse proxy forwards the requests to the server. The server sends responses to the reverse proxy, and the reverse proxy forwards the responses to the client. The reverse proxy acts on behalf of the server. This is useful for e.g. load balancing, caching, security, etc. 227 | 228 | Forward proxy -> Acts on behalf of the client. 229 | Reverse proxy -> Acts on behalf of the server. 230 | 231 | They serve different roles. 232 | 233 | Reverse proxy is often used for load balancing. Load balancing is a technique that distributes the load between multiple servers. This is useful for improving the throughput of a system. 234 | 235 | Forward proxy is often used for caching. Caching is a technique that stores data in memory. This is useful for improving the response time of a system. Forward proxy is also famously used for VPNs. 236 | 237 | # Load Balancers 238 | 239 | Let's say we start with a client and server. Multiple clients now make requests to the server. The server can't handle all the requests. It's overloaded. 240 | 241 | We need to scale, vertically or horizontally. Horizonal scaling is adding more servers. Vertical scaling is adding more resources to the server. 242 | 243 | If we add more servers (horizontal scaling), we need a way to distribute the load between the servers. This is where load balancers come in. 244 | 245 | Load Balancers sit between the client and the server. The client sends requests to the load balancer, and the load balancer looks at all the servers and decides which server to send the request to. The server sends responses to the load balancer, and the load balancer forwards the responses to the client. 246 | 247 | You think of load balancers as reverse proxies. They act on behalf of the server. They distribute the load between the servers. 248 | 249 | When adding a new server, you configure the load balancer to distribute the load between the new server and the old servers. 250 | 251 | Load balancers can distribute the load in multiple ways. For example, round robin, least connections, IP hash, etc. 252 | 253 | What happens if the load balancer itself becomes overloaded? You can add more load balancers. You can also use DNS load balancing. This is a technique that distributes the load between multiple load balancers. 254 | 255 | # Hashing 256 | 257 | Hashing is a technique that maps data to a fixed size. It's a way to convert data into a fixed size. 258 | 259 | For example, 4 clients and 4 servers. Load balancer uses hashing to distribute the load between the servers. It hashes the client's IP address and maps it to a server. This lets us take advantage of caching, we'd know which server to go to for a specific client. Each server would have an in memory cache. If the client makes a request to the server, the server would check the cache first. If the data is in the cache, it would return the data. If the data is not in the cache, it would go to the database and store the data in the cache. 260 | 261 | If we add a new server, we'd have to rehash and remod the clients to the servers. This is why this approach is a problem. Because if a server goes down or a new server is added, we'd have to remod the clients to the servers. The in memory caches we had in each server would be useless. 262 | 263 | This is where algorithms like consistent hashing come in. 264 | 265 | ## Consistent Hashing 266 | 267 | First, we have a circle. The circle represents the hash space. Each server is represented by a point on the circle. Each client would be hashed to a point on the circle. The client would be mapped to the server that's closest to it in the clockwise direction. 268 | 269 | So each client would look at the circle and find the server that's closest to it in the clockwise direction. This is the server that the client would be mapped to. 270 | 271 | If a server goes down, the clients that were mapped to that server would be remapped to the next server in the clockwise direction. 272 | 273 | If a new server is added, the clients that were mapped to the next server in the clockwise direction would be remapped to the new server. 274 | 275 | Let's say you wanna even more evenly want to distributed the load between the servers. You can add virtual nodes. Each server would be represented by multiple points on the circle. This would make the circle more evenly distributed. This is good because it would make the load more evenly distributed between the servers. 276 | 277 | For example, if you've multiple servers, but Server A is much more powerful than Server B, you can add more virtual nodes for server A. This would give server A more points on the circle, and thus more load. 278 | 279 | ## Rendezvous Hashing 280 | 281 | Rendezvous hashing is different from consistent hashing. In consistent hashing, the client is mapped to the server that's closest to it in the clockwise direction. In rendezvous hashing, the client is mapped to the server that has the highest score. 282 | 283 | A server's score is calculated by hashing the server's name and the client's name. The server with the highest score is the server that the client is mapped to. 284 | 285 | Every client would calculate the score for every server and choose the server with the highest score. 286 | 287 | ## When to choose a hashing strategy? 288 | 289 | - Consistent hashing: When you want to evenly distribute the load between the servers. 290 | - Rendezvous hashing: When you want to choose the server with the highest score. 291 | - When servers have an in memory cache and you want to take advantage of caching. 292 | 293 | # Relational Databases 294 | 295 | Relational databases are databases that store data in tables. Each table has rows and columns. Each row is a record. Each column is a field. A record is a collection of fields. A field is a piece of data. 296 | 297 | For example: 298 | 299 | ``` 300 | +----+-------+-----+ 301 | | id | name | age | 302 | +----+-------+-----+ 303 | | 1 | John | 25 | 304 | | 2 | Alice | 30 | 305 | | 3 | Bob | 35 | 306 | +----+-------+-----+ 307 | ``` 308 | 309 | Relational Databases are very structured. Each table has a schema. A schema is a set of rules that define the structure of the table. For example, the schema of the table above is: 310 | 311 | ``` 312 | id: integer 313 | name: string 314 | age: integer 315 | ``` 316 | 317 | The schema defines the structure of the table. It defines the data types of the fields. It also defines the relationships between the fields. 318 | 319 | For example, when it comes to relationships, you can have a one-to-one relationship, a one-to-many relationship, or a many-to-many relationship. 320 | 321 | Relational databses imposes strict rules on the data. For example, you can't have a record in a table that doesn't have a value for a field. You can't have a record in a table that has a value for a field that's not in the schema. 322 | 323 | ## SQL 324 | 325 | SQL stands for Structured Query Language. It's a language that's used to interact with relational databases. It's a language that's used to create, read, update, and delete data in a relational database. 326 | 327 | SQL is a declarative language. It means that you tell the database what you want to do, and the database figures out how to do it. 328 | 329 | When dealing with relational databases, you can perform powerful SQL queries. For example, you can perform joins, unions, intersections, etc. 330 | 331 | ## ACID 332 | 333 | ACID stands for Atomicity, Consistency, Isolation, Durability. It's a set of properties that guarantee that database transactions are processed reliably. 334 | 335 | - Atomicity: A transaction is atomic if it's either completely processed or not processed at all. 336 | - Consistency: A transaction is consistent if it takes the database from one consistent state to another consistent state. 337 | - Isolation: A transaction is isolated if it's processed independently of other transactions. 338 | - Durability: A transaction is durable if its effects are permanent. 339 | 340 | If memorizing ACID is hard, a way to think about it is the naive way of thinking. When you say, I'll make transaction from bank account A to bank account B, you expect it to either happen or not happen, you don't care about the underlying logic. Similar to how you expect it to be permanent after it happens. 341 | 342 | This naive way of how we've been thinking about DB operations is what ACID is about after all. 343 | 344 | ## Database Index 345 | 346 | What if you have the problem of having to search through a large table? It's slow. This is where database indexes come in. 347 | 348 | You can instead for example, turn O of n into O of log n. This is done by creating an index. An index is a data structure that's used to speed up the search of a table. 349 | 350 | You can imagine it as a book index. You look up a word in the index, and the index tells you which page the word is on. This is much faster than looking through the entire book. 351 | 352 | However, indexes should be used with caution. They can speed up the search of a table, but they can also slow down the write of a table. This is because when you write to a table, you also have to write to the index. Why do you have to write to the index? Because the index has to be updated. For example, if you add a record to a table, you also have to add a record to the index. 353 | 354 | So, when is it befitting to add an index? 355 | 356 | - When you have a large table and you want to speed up the search of the table. 357 | - When you perform JOIN operations on a table, these are operations that combine records from two tables. For example, if you have a table of employees and a table of departments, and you want to find the employees that work in a specific department, you'd perform a JOIN operation. This is a slow operation. You can speed it up by adding an index to the table. With an index, the database can look up the records in the table much faster. 358 | - When you have a table that's sorted. For example, if you have a table of employees and you want to find the employees that are older than 30, you can speed up the search by adding an index to the table. With an index, the database can look up the records in the table much faster. 359 | 360 | ## SQL Indexes 361 | 362 | How to create a database index in SQL? 363 | 364 | ```sql 365 | CREATE INDEX index_name ON table_name (column_name); 366 | ``` 367 | 368 | This creates an index on the column_name of the table_name. 369 | 370 | Let's look at another example: 371 | 372 | ```sql 373 | CREATE INDEX name_index ON employees (name); 374 | ``` 375 | 376 | Here, this would help if you're searching for employees by name. If you want to find all the employees that have a name that starts with "A", you can use the following query: 377 | 378 | ```sql 379 | SELECT * FROM employees WHERE name LIKE 'A%'; 380 | ``` 381 | 382 | This would be much faster than searching through the entire table. 383 | 384 | What about a JOIN index? 385 | 386 | ```sql 387 | CREATE INDEX department_index ON employees (department_id); 388 | ``` 389 | 390 | This would help if you're performing JOIN operations on the employees table and the departments table. For example, if you want to find all the employees that work in a specific department, you can use the following query: 391 | 392 | ```sql 393 | SELECT * FROM employees JOIN departments ON employees.department_id = departments.id WHERE departments.name = 'Engineering'; 394 | ``` 395 | 396 | This query tells us that we want to find all the employees that work in the Engineering department. This is a slow operation. You can speed it up by adding an index to the employees table. 397 | 398 | ## Transactions in SQL 399 | 400 | An example of a transaction in SQL: 401 | 402 | ```sql 403 | BEGIN TRANSACTION; 404 | UPDATE employees SET salary = salary + 1000 WHERE age > 30; 405 | UPDATE employees SET salary = salary + 2000 WHERE age > 40; 406 | COMMIT; 407 | ``` 408 | 409 | In this example, we're updating the salary of employees. We're updating the salary of employees that are older than 30 and 40. This is a transaction. If we don't commit the transaction, the changes won't be permanent. 410 | 411 | While the transaction is running, if other changes in the database are made, they won't be visible to the transaction. This is called isolation. It's a property of transactions. 412 | 413 | Consistency in this case would be that the changes are consistent. For example, if the transaction fails, the changes are rolled back. If the transaction succeeds, the changes are permanent. 414 | 415 | ## Strong vs Eventual Consistency 416 | 417 | Strong consistency: If a transaction is processed, the changes are immediately visible to other transactions. This is called strong consistency. It's a property of transactions. 418 | 419 | Eventual consistency is another consistency model. Unlike strong consistency, eventual consistency doesn't guarantee that the changes are immediately visible to other transactions. It guarantees that the changes are eventually visible to other transactions. 420 | -------------------------------------------------------------------------------- /api-design-neetcode.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | Imagine the Twitter API. You can use it to post tweets, get tweets, and even delete tweets. The Twitter API is an example of an external API. It's provided by a third-party service, Twitter, that allows us to interact with their service programmatically. 4 | 5 | We need a way to interact with the Twitter API. How we do that is where API design comes in. 6 | 7 | In this post, we'll focus on the basics of API design. 8 | 9 | # CRUD 10 | 11 | When designing an API, we often talk about CRUD operations. CRUD stands for Create, Read, Update, and Delete. These are the basic operations that we can perform on a resource. 12 | 13 | Often times, it'll be helpful to think about the operations as function e.g. createTweet, getTweet, updateTweet, deleteTweet. 14 | 15 | However, in REST APIs, the HTTP methods dictate the operation. `POST` is used to create a resource, `GET` is used to read a resource, `PUT` is used to update a resource, and `DELETE` is used to delete a resource. 16 | 17 | # Resource 18 | 19 | A resource is an object or representation of something, which has data associated with it. For example, a user is a resource, and a user's name, email, and address are data associated with that resource. 20 | 21 | Referring back to the Twitter example, a Tweet is a resource. A Tweet has data associated with it, like the tweet's content, the user who posted the tweet, and the timestamp when the tweet was posted. 22 | 23 | # Why API Design Matters 24 | 25 | When designing an API, we need to think about how the API will be used and evolve over time. We can't simply change the API whenever we want. Once an API is released and used, we need to be careful about making changes. We need to maintain backward compatibility so that existing clients can continue to use the API without any issues. 26 | 27 | # Example of introducing a new feature 28 | 29 | Let's say we have `createTweet(userId, content)` API for creating a tweet. 30 | 31 | In REST, it'd look something like `https://api.twitter.com/tweets` with a `POST` request. 32 | 33 | We want to add a feature where users can reply to a tweet. We can't simply add another field like `parentId` to the `createTweet` API. It wouldn't be backward compatible. We could make the field optional to maintain backward compatibility, but that's not a good design. It's not a good design because it's not clear that the `parentId` field is for replying to a tweet. 34 | 35 | But, you can imagine in a different scenario where introducing an optional field is a good design to not break existing clients. 36 | 37 | Instead, we could create a new API like `createReply(userId, content, parentId)`. 38 | 39 | In REST, it'd look something like `https://api.twitter.com/replies` with a `POST` request. 40 | 41 | # Versioning 42 | 43 | If we have to introduce breaking changes, we can version our API. We can have different versions of our API running at the same time. This way, existing clients can continue to use the old version, and new clients can use the new version. 44 | 45 | For example, the Twitter API might look like `https://api.twitter.com/v1/tweets` for version 1 and `https://api.twitter.com/v2/tweets` for version 2. Existing clients can continue to use `v1`, and new clients can use `v2`. 46 | 47 | # What to pass in the request 48 | 49 | Let's take a look at how a Tweet may look like: 50 | 51 | ``` 52 | { 53 | id: string 54 | userId: string 55 | content: string 56 | createdAt: Date 57 | likes: int 58 | } 59 | ``` 60 | 61 | If the user wants to create a tweet, they'd send a `POST` request to `https://api.twitter.com/tweets` with the following payload: 62 | 63 | ``` 64 | { 65 | userId: "123", 66 | content: "Hello, world!" 67 | } 68 | ``` 69 | 70 | They would only need to pass the `userId` and `content` fields. The `id`, `createdAt`, and `likes` fields would be generated by the server. 71 | 72 | It's important to understand that not everything in a resource needs to be passed in the request. Some fields should be generated by the server. 73 | 74 | # Get a specific resource 75 | 76 | What if we wanted to get a specific tweet? We could make a `GET` request to `https://api.twitter.com/tweets/:id`, where `:id` is the ID of the tweet. It would look something like `https://api.twitter.com/tweets/123`. 77 | 78 | To the same endpoint, we could make a `DELETE` request to delete the tweet. 79 | 80 | # Get a list of resources by a related resource 81 | 82 | What if we wanted to get a list of tweets by a specific user? We could make a `GET` request to `https://api.twitter.com/users/:userId/tweets`, where `:userId` is the ID of the user. It would look something like `https://api.twitter.com/users/123/tweets`. 83 | 84 | ## Pagination 85 | 86 | What happens if the user has thousands of tweets? It would be inefficient to return all the tweets at once. Instead, we can use pagination. We can limit the number of tweets returned per page and provide a way to get the next page of tweets. 87 | 88 | This is where query parameters like `limit` and `offset` come in. We could make a `GET` request to `https://api.twitter.com/users/123/tweets?limit=10&offset=0` to get the first 10 tweets, and `https://api.twitter.com/users/123/tweets?limit=10&offset=10` to get the next 10 tweets. 89 | 90 | `limit` specifies how many tweets to return, and `offset` specifies where to start fetching the tweets. 91 | 92 | Internally, we may order the tweets by `createdAt` in descending order to get the latest tweets first. 93 | 94 | # Idempotence 95 | 96 | `GET` requests are supposed to be idempotent, meaning they should not have any side effects. They should only retrieve data, not modify or create data. 97 | 98 | `POST`, `PUT`, and `DELETE` requests are not idempotent. They can have side effects. For example, making the same `POST` request multiple times will create multiple resources. 99 | 100 | # Conclusion 101 | 102 | The most important points to remember when designing an API are: 103 | 104 | - Keep your API backward compatible. 105 | - Make sure your API is idempotent where it should be. 106 | - Endpoints should be intuitive and easy to understand. 107 | - Use nouns for resources and HTTP methods for operations. 108 | - If you have to introduce breaking changes, version your API. 109 | - Use query parameters for filtering, sorting, and pagination. 110 | -------------------------------------------------------------------------------- /api-design.md: -------------------------------------------------------------------------------- 1 | # API Design 2 | 3 | ## What is an API? 4 | 5 | API stands for Application Programming Interface. It is a set of rules that allows one software application to talk to another software application. It's a way to allow software applications to communicate with each other. 6 | 7 | ## Why do we need APIs? 8 | 9 | APIs are important because they allow different software applications to communicate with each other. This means that different software applications can work together, and share data with each other. 10 | 11 | ## Why APIs are important for companies 12 | 13 | You can imagine a company like Stripe, which provides a payment processing service. Stripe has an API that allows other software applications to communicate with it. This means that other software applications can use Stripe's payment processing service, and they don't have to build their own payment processing service from scratch. 14 | 15 | This makes API a core part of the modern software ecosystem. It allows companies to build on top of each other's services, and it allows software applications to work together. 16 | 17 | ## Types of APIs 18 | 19 | There are different types of APIs, such as: 20 | 21 | - RESTful APIs: These are APIs that follow the principles of REST (Representational State Transfer). They use HTTP methods (GET, POST, PUT, DELETE) to perform operations on resources. 22 | - SOAP APIs: These are APIs that use the Simple Object Access Protocol (SOAP) to send and receive messages. 23 | - GraphQL APIs: These are APIs that use the GraphQL query language to send and receive messages. 24 | - RPC APIs: These are APIs that use Remote Procedure Calls (RPC) to send and receive messages. 25 | - WebSockets: These are APIs that allow for full-duplex communication between the client and the server. 26 | 27 | ## Changes to an API 28 | 29 | If your customers are using your API, then you have to be careful about making changes to it. If you make a breaking change to your API, then your customers will have to update their software applications to work with the new version of your API. 30 | 31 | When you make changes to your API, you have to think about how it will affect your customers. You have to think about how it will affect their software applications, and how it will affect their users. 32 | 33 | When making changes, typically you would version your API. This means that you would have different versions of your API, and your customers can choose which version they want to use. 34 | 35 | This allows you to make changes to your API, without breaking your customers' software applications. 36 | 37 | ## Swagger 38 | 39 | Swagger is a tool that allows you to document your API. It allows you to describe your API, and it allows you to generate client libraries for your API. This makes it easier for your customers to use your API. It also makes it easier for you to make changes to your API, because you can see how your API is being used. This allows you to make changes to your API without breaking your customers' software applications. 40 | 41 | OpenAPI is the new name for Swagger. It's a specification for describing RESTful APIs. It allows you to describe your API, and it allows you to generate client libraries for your API. 42 | -------------------------------------------------------------------------------- /api-paradigms.md: -------------------------------------------------------------------------------- 1 | # API 2 | 3 | API stands for Application Programming Interface. It simply is a way to access the functionality of a program or a service. APIs are used to interact with other software, similar to how user interfaces are used to interact with humans. 4 | 5 | When we talk about APIs, we often refer to working with external APIs. These are APIs provided by third-party services that allow us to interact with their services programmatically. For example, we can use the Twitter API to post tweets, or the Google Maps API to get directions. 6 | 7 | However, APIs can also be "internal". Consider arrays in programming languages like JavaScript. Arrays have methods like `push`, `pop`, `shift`, and `unshift` that allow us to interact with them. These methods are part of the array's API. 8 | 9 | APIs are just a way to interact with a "thing". This "thing" can be a service, a library, a framework, or even a programming language itself. 10 | 11 | In this post, we'll focus on external APIs. We'll dive into REST, GraphQL and gRPC. 12 | 13 | # REST 14 | 15 | ## Introduction 16 | 17 | REST stands for Representational State Transfer. REST is built on top of HTTP. This doesn't mean REST is a new protocol, but rather a set of rules to follow when building APIs. People often use REST and HTTP interchangeably, but they are not the same thing. You've likely worked with REST APIs before, even if you didn't know it. 18 | 19 | ## Stateless 20 | 21 | The main thing about REST is that it's stateless. This means that each request from a client to a server must contain all the information needed to understand the request. The server cannot store any information about the client between requests. This makes REST APIs easy to cache and scale. It becomes easy to cache and scale, because each request is independent of the others. 22 | 23 | ## Resources 24 | 25 | REST APIs are built around resources. A resource is an object or representation of something, which has data associated with it. For example, a user is a resource, and a user's name, email, and address are data associated with that resource. 26 | 27 | Take a look at the following example: `GET https://youtube.com/videos`. In this example, `videos` is a resource. The `GET` method is used to retrieve the list of videos. The response will contain a list of videos. 28 | 29 | ## Example 30 | 31 | Let's say for example, the response from Youtube contains 10 videos. What if we want to get 10 new videos? 32 | 33 | The stateful approach would be for the server to keep track of the last video we saw, and return the next 10 videos. 34 | 35 | The stateless approach would be to include a query parameter in the request, like `GET https://youtube.com/videos?start=10`. `start` is the query parameter that tells the server where to start fetching the videos. You might also want `limit` to decide how many videos you want to fetch and not be limited to 10. This way, the server doesn't need to keep track of the last video we saw. And that's what we want in REST! 36 | 37 | ## Status Codes 38 | 39 | REST APIs use status codes to indicate the result of a request. These are from the HTTP protocol. For example, `200 OK` means the request was successful, `404 Not Found` means the resource was not found, and `500 Internal Server Error` means something went wrong on the server. 40 | 41 | ## Every endpoint related to some resource 42 | 43 | Let's look at the example of Youtube again `GET https://youtube.com/videos`. This endpoint is related to the `videos` resource as mentioned. But you might wonder, why we don't have `/getVideos` instead of `/videos`? This is because REST APIs use nouns instead of verbs. The HTTP methods like `GET`, `POST`, `PUT`, `DELETE` are the verbs. The endpoints are the nouns. 44 | 45 | ## JSON 46 | 47 | By far, the most popular format for data exchange in REST APIs is JSON. JSON is a lightweight data-interchange format that is easy for humans to read and write, and easy for machines to parse and generate. It looks like a JavaScript object, but it's a string. 48 | 49 | Example of user data in JSON: 50 | 51 | ```json 52 | { 53 | "name": "John Doe", 54 | "email": "john@doe.com", 55 | "address": "123 Main St" 56 | } 57 | ``` 58 | 59 | # GraphQL 60 | 61 | ## Introduction 62 | 63 | GraphQL is a query language for APIs and a runtime for executing those queries by using a type system you define for your data. It was developed by Facebook in 2012 and released as an open-source project in 2015. 64 | 65 | ## The idea 66 | 67 | GraphQL is built around the idea of asking for what you need, and getting exactly that. With REST, you might need to make multiple requests to different endpoints to get the data you need. With GraphQL, you can make a single request to get all the data you need. 68 | 69 | ## Example of REST's problem 70 | 71 | Let's say we have a REST API for a blog. For each blog post, we need to make a request to get the post, then another request to get the author, and another request to get the comments. This is inefficient because we are making multiple requests to get the data we need. This is where GraphQL shines. 72 | 73 | What if we could make a single request to get the blog post, the author, and the comments? Additionally, what if we could specify exactly what fields we need for each of these resources? 74 | 75 | ## Only POST requests 76 | 77 | It's built on top of HTTP. However, you send a POST request to a single endpoint, usually `/graphql`, and you send a query in the body of the request. The query is a string that describes the data you want to get back. 78 | 79 | ## Two types of operations 80 | 81 | In GraphQL, there are two types of operations: queries and mutations. 82 | 83 | Queries are used to read data. For example, you might want to get a list of blog posts. 84 | 85 | Mutations are used to write data. For example, you might want to create a new blog post, update a blog post, or delete a blog post. 86 | 87 | ## Caching 88 | 89 | GraphQL is built on top of HTTP POST requests. POST requests are not idempotent, meaning the same request can have different results. This makes caching more difficult. 90 | 91 | However, GraphQL has a solution for this. You can use a technique called persisted queries. This is where you send a hash of the query instead of the query itself. The server can then look up the query based on the hash. If the query is not found, it can execute the query and store the result in a cache. 92 | 93 | ## Schema 94 | 95 | In GraphQL, you define a schema that describes the data you can query. It defines the types of data you can query, and the relationships between those types. For example, you might have a `Post` type that has a `title` field and an `author` field. The `author` field is a `User` type that has a `name` field. 96 | 97 | # gRPC 98 | 99 | ## Introduction 100 | 101 | gRPC is a high-performance, open-source universal RPC framework. It was developed by Google and released as an open-source project in 2015. gRPC is built on top of HTTP/2, which is a major revision of the HTTP protocol. 102 | 103 | ## Problem with REST 104 | 105 | REST APIs are great for many use cases, but they have some limitations. 106 | 107 | One of the limitations is that they are text-based. This means that the data is sent over the network as text, which can be inefficient because text takes up more space than binary data. gRPC solves this problem by using Protocol Buffers. 108 | 109 | Another limitation is that REST APIs are synchronous. This means that the client has to wait for the server to respond before it can continue. gRPC solves this problem by using HTTP/2, which allows for bidirectional streaming. 110 | 111 | ## HTTP/2 112 | 113 | HTTP/2 is the second major version of the HTTP protocol. It comes with several improvements over HTTP/1.1, to name a few: 114 | 115 | - Multiplexing: allows multiple requests and responses to be sent and received at the same time. 116 | - Header compression: reduces the size of the headers, which reduces the amount of data that needs to be sent over the network. 117 | - Server push: allows the server to send resources to the client before the client requests them. 118 | - Bidirectional streaming: allows the client and server to send a stream of messages to each other at the same time. 119 | 120 | It's important to mention HTTP/2 because gRPC needs HTTP/2 to work. The reason gRPC needs HTTP/2 is because it uses bidirectional streaming. 121 | 122 | ### Web Sockets become unnecessary 123 | 124 | With HTTP/2, we can use bidirectional streaming. This means that we can send a stream of messages to the server and receive a stream of messages from the server at the same time. This makes Web Sockets unnecessary. 125 | 126 | A "stream" is a sequence of messages. For example, you might have a stream of chat messages, where each message is sent as a separate message in the stream. A stream is different from a request-response, where you can only send one message at a time. With a stream, you can send multiple messages at the same time. 127 | 128 | ## gRPC-Web 129 | 130 | gRPC needs detailed control over the HTTP/2 connection, which is not possible in the browser. This is why you need a proxy server to convert the gRPC-Web requests to gRPC requests. The proxy server is called `Envoy`. 131 | 132 | ## Protocol Buffers 133 | 134 | Protocol Buffers is a method of serializing structured data. It's similar to JSON, but it's more efficient because it's binary. This means it takes up less space on the network. Protocol Buffers is used to define the messages that are sent and received in gRPC. 135 | 136 | The messages are defined in a `.proto` file. Here's an example of a `.proto` file: 137 | 138 | ```proto 139 | syntax = "proto3"; 140 | 141 | package example; 142 | 143 | message User { 144 | string name = 1; 145 | string email = 2; 146 | } 147 | ``` 148 | 149 | This defines a `User` message with two fields: `name` and `email`. 150 | 151 | If we want to send a `User` message in a gRPC request, we define a service in the `.proto` file. Here's an example of a service: 152 | 153 | ```proto 154 | service UserService { 155 | rpc GetUser(UserRequest) returns (UserResponse) {} 156 | } 157 | 158 | message UserRequest { 159 | string id = 1; 160 | } 161 | 162 | message UserResponse { 163 | User user = 1; 164 | } 165 | ``` 166 | 167 | The downside of Protocol Buffers is that it's not human-readable like JSON. However, it's more efficient because it's in binary format. 168 | 169 | ## Error handling 170 | 171 | In REST, we have status codes to indicate the result of a request. In gRPC, we don't status codes. Instead, we have error messages, and based on those error messages, we handle what went wrong. 172 | -------------------------------------------------------------------------------- /browser-storages.md: -------------------------------------------------------------------------------- 1 | # Browser Storages 2 | 3 | In this post, we'll go over the different types of browser storages and how to use them. 4 | 5 | # Cookies 6 | 7 | Cookies are small pieces of data stored in the browser. They are sent with every request to the server, which can be used to e.g. identify a user. 8 | 9 | Cookies have a size limit of 4KB and can be set to expire after a certain time. If you don't set an expiration date, the cookie will be deleted when the current session ends, which happens when the browser is closed. 10 | 11 | Let's take a look at how to set a cookie in JavaScript: 12 | 13 | ```javascript 14 | document.cookie = "name=John"; 15 | ``` 16 | 17 | This will call the cookie's setter function and set the cookie "name" to "John". Because we haven't set an expiration date, this cookie will be deleted when the browser is closed. 18 | 19 | Let's set another cookie: 20 | 21 | ```javascript 22 | document.cookie = "age=30"; 23 | ``` 24 | 25 | When we set a new cookie, it won't be deleted, but rather appended to the existing cookie string. So now, the cookie string will look like this: `name=John; age=30`. 26 | 27 | ## One string, different cookies 28 | 29 | It may appear that we have only one cookie because `document.cookie` returns a single string. However, this string contains multiple cookies separated by a semicolon. 30 | 31 | ## Overriding cookies 32 | 33 | To override a cookie, you can set it again with the same name. For example: 34 | 35 | ```javascript 36 | document.cookie = "name=Jane"; 37 | ``` 38 | 39 | The cookie "name" will now be set to "Jane". 40 | 41 | ## Expiration date 42 | 43 | For each cookie, you can set an expiration date. The date must be in UTC format. For example, if we want the cookie "name" to expire right away: 44 | 45 | ```javascript 46 | document.cookie = `name=John; expires=${new Date().toUTCString()}`; 47 | ``` 48 | 49 | This will set the expiration date to the current time in UTC which will cause the cookie to be deleted immediately. 50 | 51 | ## Max age 52 | 53 | Another way to set an expiration date is by using the `max-age` attribute. This attribute specifies the number of seconds until the cookie expires. For example, if we want the cookie "name" to expire in 1 hour: 54 | 55 | ```javascript 56 | document.cookie = `name=John; max-age=${60 * 60}`; 57 | ``` 58 | 59 | 60 \* 60 seconds equals 1 hour. The name cookie will expire in 1 hour. 60 | 61 | ## Path 62 | 63 | `path` specifies the URL path for which the cookie is valid. For example, if we want the cookie "name" to be valid only for the `/blog` path: 64 | 65 | ```javascript 66 | document.cookie = `name=John; path=/blog`; 67 | ``` 68 | 69 | This can be useful if you want to restrict the cookie to a specific part of your website. But if you want the cookie to be valid for the entire website, you can set the path to `/`. 70 | 71 | ## Secure 72 | 73 | `secure` ensures the cookie will only be sent over HTTPS connections. For example, if we want the cookie "name" to be secure: 74 | 75 | ```javascript 76 | document.cookie = `name=John; secure`; 77 | ``` 78 | 79 | As you can see, it doesn't take any value. The presence of the attribute is enough to make the cookie secure. 80 | 81 | ## SameSite 82 | 83 | `SameSite` helps prevent CSRF attacks by restricting when the cookie is sent. 84 | 85 | It takes three possible values: 86 | 87 | - `Strict`: The cookie will only be sent in a first-party context. This means the cookie will only be sent if the site is directly accessed. 88 | - `Lax`: The cookie will be sent in a first-party context and in a cross-site context if the user navigates to the site from a link. 89 | - `None`: The cookie will be sent in all contexts (never use this value unless you know what you're doing). 90 | 91 | Many times, you'll want to set the `SameSite` attribute to `Lax`. This lets the cookie be sent in a cross-site context if the user navigates to the site from a link. However, if the external link or button is a malicious one making a POST request, the cookie won't be sent. 92 | 93 | CSRF attacks are a big security risk, so it's important to set the `SameSite` attribute. They happen when a malicious website or element on a website sends a POST request to a legitimate website where the user is authenticated. If the user is logged in, the browser will send the cookies along with the request, which can lead to unauthorized actions. 94 | 95 | For example, if you get an email with what appears to be a link to your bank's website, but it's actually a button that sends a POST request to your bank's website, the browser will send the cookies along with the request. If you're logged in, the bank's website will think you're making the request and will execute it. 96 | 97 | Here is how to set the `SameSite` attribute: 98 | 99 | ```javascript 100 | document.cookie = `name=John; SameSite=Lax`; 101 | ``` 102 | 103 | ## Get a cookie 104 | 105 | Let's say we want to get the "name" cookie: 106 | 107 | ```javascript 108 | // Get all cookies -> ["name=John", "age=30"] 109 | const cookies = document.cookie.split("; "); 110 | 111 | // Find the "name" cookie 112 | const nameCookie = cookies.find((cookie) => cookie.startsWith("name=")); 113 | 114 | // Get the value of the "name" cookie 115 | // nameCookie is "name=John", so we split it by "=" and get the second element 116 | const name = nameCookie.split("=")[1]; 117 | ``` 118 | 119 | ## HttpOnly 120 | 121 | If you set the `HttpOnly` attribute, the cookie will only be accessible through HTTP requests. This means JavaScript won't be able to access it. This is useful for preventing XSS attacks. 122 | 123 | You'll want to do this especially if you're sending sensitive information in the cookie from the server to the client. 124 | 125 | ## Recap 126 | 127 | - Cookies have a size limit of 4KB. 128 | - Cookies are sent with every request to the server. 129 | - Cookies are small pieces of data stored in the browser. 130 | - Cookies can be set to expire after a certain time, if not, they will be deleted when the browser is closed. 131 | 132 | # Web Storage 133 | 134 | `localStorage` and `sessionStorage` are both a part of the Web Storage API, which provides a way to store data in the browser. The data is stored as key-value pairs. The difference between localStorage and sessionStorage is that localStorage persists even after the browser is closed, while sessionStorage is deleted when the browser is closed. 135 | 136 | Besides that, localStorage and sessionStorage have the same methods and properties. 137 | 138 | An upside compared to cookies is that they can store more data. The limit is around 5MB, which is much more than the 4KB limit of cookies. 139 | 140 | Some downsides to be aware of: 141 | 142 | - Lack of expiration control. 143 | - Not automatically sent with every request to the server. 144 | - Potential for XSS attacks because the data is accessible through JavaScript. Therefore, you should never store sensitive information in them. 145 | 146 | ## The API 147 | 148 | Let's take a look at how to use `localStorage`. That should be enough to understand `sessionStorage` as well. 149 | 150 | ```javascript 151 | // Set an item 152 | localStorage.setItem("name", "John"); 153 | 154 | // Get an item 155 | const name = localStorage.getItem("name"); // "John" 156 | 157 | // Remove an item 158 | localStorage.removeItem("name"); 159 | 160 | // Clear all items 161 | localStorage.clear(); 162 | ``` 163 | 164 | One important thing to mention is that the data is stored as strings. So if you want to store objects or arrays, you'll need to convert them to strings first. 165 | 166 | ```javascript 167 | const person = { 168 | name: "John", 169 | age: 30, 170 | }; 171 | 172 | // Convert the object to a string 173 | localStorage.setItem("person", JSON.stringify(person)); 174 | 175 | // Get the string and convert it back to an object 176 | const personString = localStorage.getItem("person"); 177 | 178 | const person = JSON.parse(personString); 179 | ``` 180 | 181 | ## Recap 182 | 183 | - Web Storage API provides a way to store data in the browser. 184 | - `localStorage` and `sessionStorage` are both part of the Web Storage API. 185 | - `localStorage` persists even after the browser is closed, while `sessionStorage` is deleted when the browser is closed. 186 | -------------------------------------------------------------------------------- /cdns.md: -------------------------------------------------------------------------------- 1 | # The problem 2 | 3 | Say you have your origin server in the US and you have users in Europe. When a user in Europe tries to access your website, the request has to travel all the way to the US and back. This can cause latency and slow down your website. 4 | 5 | This is where CDNs come in. 6 | 7 | # The solution 8 | 9 | A CDN is a network of servers distributed across the globe. You can only cache static assets like images, CSS, and JavaScript files on a CDN. When a user in Europe tries to access your website, the request is routed to the nearest server in the CDN. This reduces latency and speeds up your website. 10 | 11 | You can't run code on CDNs. They are only used to cache static assets. If you need to run code close to your users, you can use edge computing services like Cloudflare Workers. 12 | 13 | # Pull CDNs vs Push CDNs 14 | 15 | ## Pull CDNs 16 | 17 | In a pull CDN, the CDN server will fetch the content from your origin server the first time a user requests it. The CDN server will then cache the content and serve it to subsequent users. 18 | 19 | For example, if a user in Europe tries to access your website in the US, the request will be routed to the nearest CDN server. The CDN server will then fetch the content from your origin server and cache it. The next time a user in Europe tries to access your website, the content will be served from the CDN server instead of the origin server. 20 | 21 | ## Push CDNs 22 | 23 | In a push CDN, the origin server pushes the content to the CDN servers. This is useful when you have a large number of users accessing the same content because the content would be pushed to all CDN servers. 24 | 25 | For example, if a user uploads a video to YouTube, the video will first be uploaded to the origin server. The origin server will then push the video to the CDN servers. This way, when a user tries to access the video, it is served from the CDN servers instead of the origin server. 26 | -------------------------------------------------------------------------------- /communication.md: -------------------------------------------------------------------------------- 1 | # Communication in system design interviews 2 | 3 | Communication is key in system design interviews. You need to communicate with the interviewer to gather the requirements and constraints of the system. You need to communicate your plan to the interviewer before you start designing the system. You need to communicate your thoughts as you're designing the system. You need to communicate your trade-offs. You need to communicate your assumptions. You need to communicate your reasoning. 4 | 5 | A bad question is a vague one, e.g. how big is the system we're designing? 6 | 7 | A better question is a specific one, e.g. how many users are we expecting to have and where are they located? 8 | -------------------------------------------------------------------------------- /configuration.md: -------------------------------------------------------------------------------- 1 | # Configuration 2 | 3 | When working in a distributed systems environment, you will often need to configure your system. This is because distributed systems are often complex and require a lot of configuration. This is because distributed systems often have many moving parts. For example, you might have a database, a cache, a load balancer, and a web server. Each of these parts will need to be configured. This is because each part will need to know how to talk to the other parts. For example, the web server will need to know how to talk to the database. The database will need to know how to talk to the cache. The cache will need to know how to talk to the load balancer. The load balancer will need to know how to talk to the web server. This is why configuration is important. 4 | 5 | Configuration files are often written in a format that is easy to read and write. For example, you might write your configuration files in JSON, YAML, or TOML. These formats are easy to read and write. This is because they are human readable. This is important because you will often need to read and write configuration files. This is because you will often need to change your configuration files. This is because you will often need to change your configuration. For example, you might need to change your database password. You might need to change your cache size. You might need to change your load balancer algorithm. This is why configuration files are often written in a format that is easy to read and write. 6 | 7 | ## Static configuration 8 | 9 | Static configuration is when your configuration is hard coded into your system. Changes here are slow because you have to recompile and redeploy. 10 | 11 | ## Dynamic configuration 12 | 13 | Dynamic configuration lives outside of your system, and can be changed without recompiling and redeploying your system. 14 | -------------------------------------------------------------------------------- /design-requirements-neetcode.md: -------------------------------------------------------------------------------- 1 | # Design Requirements 2 | 3 | At the fundamental level of a system, we are: 4 | 5 | 1. Moving data: Moving data from one place to another, whether it's from a user to a server, from a server to a database, or from a database to a user. 6 | 2. Storing data: Storing data in a way that makes it easy to retrieve and use, e.g. using database, blob storage, etc. 7 | 3. Transforming Data: Transforming data to something more useful e.g. transforming raw data into a graph, or transforming a graph into a list. 8 | 9 | ## What's a good design? 10 | 11 | We have to think in terms of trade offs, but even then, how do we know what's a good design? 12 | 13 | How do we measure the trade offs? 14 | 15 | ### Availability 16 | 17 | One way to do so is to measure availability of the system. 18 | 19 | Availability is the probability that a system is operational at a given point in time. 20 | 21 | For example, if our system was up 23 hours of 24 hours, every day, this would mean our system is up 96% of the time. 100% availability is not possible, e.g. a disaster could happen. 22 | 23 | If your system is available for 99% of the time, it means you have a downtime of 1%. Improving your availability to 99.99% would mean you have a downtime of 0.01%, this is a very good improvement, even though it looks small, don't let it fool you. 24 | 25 | Availability is measured in nines, e.g. 99.9% is three nines, 99.99% is four nines, etc. These nines are yearly, so 99.9% availability means your system is down for 8.76 hours a year. 99.999%, 5 nines, means your system is down for 5.26 minutes a year. 26 | 27 | ### SLO 28 | 29 | Availability is an SLO, a Service Level Objective. It's a target that you set for your system. It's a promise you make to your users. 30 | 31 | SLOs are important because they help you measure how well your system is doing. They help you measure how well you are meeting your users' expectations. 32 | 33 | SLOs are a part of SLAs, Service Level Agreements. SLAs are contracts between you and your users. They define what your users can expect from your system. 34 | 35 | e.g. AWS S3 has an SLA of 99.99% availability. This means that AWS promises that S3 will be available 99.99% of the time. If it's not, AWS will give you a partial refund. 36 | 37 | SLA is not a goal, it's an agreement. SLO is a goal. 38 | 39 | ## Reliability 40 | 41 | Reliability is the ability of a system to perform its functions under normal conditions. It's the probability that a system will work as expected. 42 | 43 | Reliability is closely related to availability. If a system is reliable, it's available. But if a system is available, it doesn't mean it's reliable. 44 | 45 | For example, a system may be available, meaning it is up and running, but it may not be reliable, meaning it's not performing as expected. 46 | 47 | If you've a single server, a way to increase reliability is to add more servers. This is called redundancy. If a server goes down, the system can still work, because the other servers are still running. 48 | 49 | ## Throughput 50 | 51 | Throughput is the number of requests a system can handle in a given amount of time. For example, if a system can handle 1000 requests per second, its throughput is 1000 requests per second. 52 | 53 | You can measure throughput in requests per second, or in bytes per second, etc. 54 | 55 | You can increase throughput by adding more servers. 56 | 57 | We could scale the system vertically, but that still has a limit and a single point of failure. We could scale the system horizontally, by adding more servers. This introduces complexity, because we need e.g. a load balancer, but it's a good trade off. 58 | 59 | By scaling the system horizontally, we can increase throughput, availability, and reliability. 60 | 61 | Throughput is defined by bytes/second or request/second or queries/second also knows as QPS. 62 | 63 | ## Latency 64 | 65 | Latency is the time it takes for a request to be processed. It's the time it takes for a request to go from the client to the server, and for the server to process the request and send a response back to the client. 66 | 67 | You can increase latency by e.g. having servers distributed across the world closer to the users. 68 | 69 | Edge locations can help reduce latency, those are locations not in the main data center, but closer to the users. 70 | -------------------------------------------------------------------------------- /diagramming.md: -------------------------------------------------------------------------------- 1 | # Diagramming 2 | 3 | Diagramming is an important thing in system design interviews. 4 | 5 | It helps both the interviewer and the interviewee to understand the flow of the system and the components involved. 6 | 7 | If you take notes, do the them at the side, to not clutter the diagram. 8 | 9 | Try to use different shapes and colors. This helps to differentiate between different components. 10 | -------------------------------------------------------------------------------- /estimation.md: -------------------------------------------------------------------------------- 1 | # Estimation in system design interviews 2 | 3 | By estimation, we mean estimating the needs of the system. This can be the number of servers, the amount of storage, the amount of bandwidth, etc. 4 | 5 | Example: Imagine you have a system where low latency matters a lot and you decide to cache the APIs data in memory. You need to estimate how much memory you need to cache the data. 6 | 7 | You'll have to learn about the units. 8 | 9 | 1000 bytes is a kilobyte (KB). 10 | 11 | 1000 KB is a megabyte (MB). 12 | 13 | 1000 MB is a gigabyte (GB). 14 | 15 | 1000 GB is a terabyte (TB). 16 | 17 | 1000 TB is a petabyte (PB). 18 | 19 | General values good to have in mind: 20 | 21 | - A character -> 1 byte 22 | - Typical metadata excluding images 1 - 10 kb 23 | - A high quality image -> 2 MB, can be compressed by 10-20x 24 | - 20 minutes of hd video -> 1gb 25 | 26 | - Storage: 10TB disk space 27 | - 256gb-1tb of ram 28 | 29 | - How long does it take for a regylar http request to make a round trip, not bound by bandwidth? 30 | - Within a country 50ms - 150ms 31 | - Across continents 200ms - 500ms 32 | 33 | - Bandwidth cheat sheet 34 | -------------------------------------------------------------------------------- /event-loop.md: -------------------------------------------------------------------------------- 1 | # Concurrency in JavaScript 2 | 3 | When running JavaScript in a browser, it may appear that JavaScript is multi-threaded, but it's not. JavaScript is a single-threaded programming language, which means it has a single call stack and can only execute one task at a time. When a script is running, it blocks other scripts from running until it completes. 4 | 5 | So, why can it appear that JavaScript is multi-threaded? 6 | 7 | The answer lies in the JavaScript Runtime Environment. 8 | 9 | # JavaScript Engine 10 | 11 | Let's start by taking a look at the JavaScript engine. The JavaScript engine is responsible for executing JavaScript code. It consists of two main components: 12 | 13 | 1. **Call Stack**: The call stack is a data structure that stores the execution context of the running code. It follows the Last In, First Out (LIFO) principle, meaning that the last function added to the stack is the first one to be executed. 14 | 2. **Heap**: The heap is a memory space where objects are stored. And to remind you, arrays are also objects in JavaScript. 15 | 16 | Let's take a look at a simple example to understand how the call stack works: 17 | 18 | ```javascript 19 | function bark() { 20 | console.log("Woof!"); 21 | } 22 | 23 | function meow() { 24 | console.log("Meow!"); 25 | } 26 | 27 | function speak() { 28 | console.log("Speaking"); 29 | bark(); 30 | meow(); 31 | console.log("Done speaking"); 32 | } 33 | 34 | speak(); 35 | ``` 36 | 37 | 1. `speak` function is called, and it's added to the call stack. 38 | 2. `speak`'s first console log is added to the call stack and executed. When the console log is executed, it's removed from the call stack. 39 | 3. `bark` function is called and added to the call stack. 40 | 4. `bark`'s console log is added to the call stack and executed. 41 | 5. `bark` is removed from the call stack. 42 | 6. `meow` function is called and added to the call stack. 43 | 7. `meow`'s console log is added to the call stack and executed. 44 | 8. `meow` is removed from the call stack. 45 | 9. `speak`'s last console log is added to the call stack and executed. 46 | 10. `speak` is removed from the call stack. 47 | 48 | That's the entire process when calling the `speak` function. 49 | 50 | # JavaScript Runtime Environment 51 | 52 | Assume we want to run `setTimeout(foo, 500)`, what would happen if we pushed it to the call stack? 53 | 54 | ```javascript 55 | function foo() { 56 | console.log("Hello"); 57 | } 58 | 59 | setTimeout(foo, 500); 60 | ``` 61 | 62 | If we pushed `setTimeout(foo, 500)` to the call stack, it would block the call stack for 500 milliseconds. This is not what we want. Instead, we want to run `foo` after 500 milliseconds. But how do we keep track of when to run `foo` if we can't use the call stack? 63 | 64 | The JavaScript Engine isn't running code in complete isolation. It's running it in what we call a JavaScript Runtime Environment. This environment provides a set of extra functionality on top of JavaScript called Web APIs. These APIs include: 65 | 66 | - Timers (setTimeout, setInterval) 67 | - HTTP requests (fetch) 68 | - DOM manipulation functions 69 | 70 | When we call `setTimeout(foo, 500)`, the `setTimeout` function is pushed to the call stack. The `setTimeout` function is then removed from the call stack and sent to the Web API environment to handle the timer. But how does the JavaScript Engine know when to run `foo`? 71 | 72 | This is where the callback queue comes in. The callback queue is a FIFO (First In, First Out) data structure that stores callback functions. Once the timer is complete, the Web API environment pushes the callback function (`foo`) to the callback queue. 73 | 74 | The event loop is responsible for checking the call stack and callback queue. If the call stack is empty, it pushes the first function in the callback queue to the call stack. Once that's done, it checks the callback queue again for the next function. This process continues until the callback queue is empty. 75 | 76 | The process of the Event Loop: 77 | 78 | 1. Dequeue the first function in the callback queue. 79 | 2. Enqueue the function to the call stack. 80 | 3. Execute the function. 81 | 4. Render any changes to the DOM. 82 | 5. Remove the function from the call stack. 83 | 6. Repeat the process until the callback queue is empty. 84 | 85 | # setTimeout(func, 0) 86 | 87 | You might think that `setTimeout(func, 0)` runs the function immediately, but that's not the case. 88 | 89 | As we mentioned earlier, `setTimeout` is a part of the Web API environment. When you call `setTimeout(func, 0)`, the function is sent to the Web API environment to handle the timer. The timer is set to 0 milliseconds, but it doesn't mean the function will run immediately. The function is still sent to the callback queue, and the event loop will run it when the call stack is empty. 90 | 91 | Let's look at some code: 92 | 93 | ```javascript 94 | console.log("Start"); 95 | 96 | setTimeout(() => { 97 | console.log("Inside setTimeout"); 98 | }, 0); 99 | 100 | console.log("End"); 101 | ``` 102 | 103 | 1. `console.log("Start")` is added to the call stack and executed. 104 | 2. `setTimeout` is added to the call stack and sent to the Web API environment to handle the timer. 105 | 3. `console.log("End")` is added to the call stack and executed. 106 | 4. The event loop checks the call stack and callback queue. Since the call stack is empty, it dequeues the function from the callback queue and adds it to the call stack. 107 | 5. `console.log("Inside setTimeout")` is added to the call stack and executed. 108 | 109 | # Microtask Queue 110 | 111 | When promises were added to JavaScript, they introduced a new queue called the microtask queue. The microtask queue has a higher priority than the callback queue. When a promise is resolved or rejected, the callback function is added to the microtask queue. 112 | 113 | Let's look at some code to understand how everything works together: 114 | 115 | ```javascript 116 | console.log("Start"); 117 | 118 | setTimeout(() => { 119 | console.log("Inside setTimeout"); 120 | }, 0); 121 | 122 | Promise.resolve().then(() => { 123 | console.log("Inside Promise"); 124 | }); 125 | 126 | console.log("End"); 127 | ``` 128 | 129 | Just to remind you, whenever something is "executed" in the call stack, it's removed from the call stack. 130 | 131 | 1. `console.log("Start")` is added to the call stack and executed. 132 | 2. `setTimeout` is added to the call stack and sent to the Web API environment to handle the timer. 133 | 3. `Promise.resolve().then` is added to the call stack and executed. The callback function (`console.log("Inside Promise")`) is added to the microtask queue. 134 | 4. `console.log("End")` is added to the call stack and executed. 135 | 5. The event loop checks the call stack, microtask queue, and callback queue. Since the call stack is empty, it dequeues the function from the microtask queue and adds it to the call stack. 136 | 6. `console.log("Inside Promise")` is added to the call stack and executed. 137 | 7. The event loop does its job again and sees that microtask queue is empty. It then dequeues the function from the callback queue and adds it to the call stack. 138 | 8. `console.log("Inside setTimeout")` is added to the call stack and executed. 139 | 140 | Pseudo code of the event loop may look something like: 141 | 142 | ```javascript 143 | while (true) { 144 | if (callStack.isEmpty()) { 145 | if (!microtaskQueue.isEmpty()) { 146 | callStack.push(microtaskQueue.dequeue()); 147 | } else if (!callbackQueue.isEmpty()) { 148 | callStack.push(callbackQueue.dequeue()); 149 | } 150 | } else { 151 | // This would execute the last function added to the call stack 152 | // And then remove it from the call stack 153 | callStack.execute(); 154 | } 155 | } 156 | ``` 157 | -------------------------------------------------------------------------------- /gathering-system-requirements.md: -------------------------------------------------------------------------------- 1 | # Gathering System Requirements in System Design Interviews 2 | 3 | System design interviews are intentionally vague. The interviewer will not give you a clear problem statement. Instead, they will ask you to design a system that can handle a certain load or a certain set of requirements. It is your job to ask the right questions to gather the requirements and constraints of the system. 4 | 5 | They might ask you to design "Youtube". You need to ask questions like: 6 | 7 | - What is the expected number of videos uploaded per day? 8 | - What is the expected number of videos watched per day? 9 | - What is the expected number of concurrent users? 10 | - What is the expected number of comments per video? 11 | - What is the expected number of likes per video? 12 | - What part of Youtube are we designing? The video upload part, the video serving part, the comment serving part, etc. 13 | 14 | The interviewer will not give you a clear answer to these questions. They will give you a range. For example, they might say "The number of videos uploaded per day can range from 1000 to 1000000". It is your job to design a system that can handle the upper limit of this range. 15 | 16 | The scale may not have to do with users, but it can to do with the data itself, like the number of videos, the number of comments, the number of likes, etc. 17 | 18 | ## System characteristics 19 | 20 | You also wanna ask about the characteristics of the system such as latency, throughput, consistency, availability, durability, etc. 21 | 22 | If asked to design Youtube, you might ask: 23 | 24 | - What is the expected latency for video upload? 25 | - What is the expected latency for video serving? 26 | - What is the expected throughput for video upload? 27 | - What is the expected throughput for video serving? 28 | - What is the expected consistency level for video serving? 29 | - What is the expected availability for video serving? 30 | -------------------------------------------------------------------------------- /leader-election.md: -------------------------------------------------------------------------------- 1 | # Leader Election 2 | 3 | Let's say you've a product that let's users subscribe e.g. Netflix. Then you've a database that stores the user's subscription information. 4 | 5 | Imagine you've a third party service such as Stripe, Paypal, etc. that charges the user's credit card every month. However, you don't want to have the third party service directly charge the user's credit card. Instead, you want to have a service that acts as a middleman between your database and the third party service. This middleman service is responsible for charging the user's credit card every month. 6 | 7 | Now, if you only have one middleman service, what happens if it goes down? We have to imagine when building distributed systems that servers and nodes can fail. If the middleman service goes down, then no one is charging the user's credit card. This is a problem. So we can introduce redundancy by having multiple middleman services. But then we have to decide which middleman service is responsible for charging the user's credit card, because we don't want to charge the user's credit card multiple times. 8 | 9 | So, we're now gonna have a leader. The leader is responsible for charging the user's credit card. If the leader goes down, then we have to elect a new leader. This is called leader election. 10 | 11 | Leader election may sound simple, we just pick a new leader when the current leader goes down. But it's not that simple. We have to consider a lot of things such as network partitions, split brain, etc. 12 | 13 | The real difficulty is having the followers agree on who the leader is. We have to make sure that the followers agree on who the leader is. This is called consensus. 14 | 15 | This is where consensus algorithms come in. Consensus algorithms are algorithms that allow a group of nodes to agree on a value. Leader election is a special case of consensus. Popular consensus algorithms include Paxos, Raft, and Zab. 16 | 17 | ## Tools for consensus 18 | 19 | Now, you won't implement consensus algorithms from scratch. Instead, you'll use libraries that implement consensus algorithms. These libraries are called distributed consensus systems. Examples of distributed consensus systems include ZooKeeper, etcd, and Consul. 20 | 21 | ## Etcd 22 | 23 | Etcd is a distributed key value store that's highly available and strongly consistent. It's used for shared configuration and service discovery. It's written in Go and uses the Raft consensus algorithm. 24 | 25 | Strong consistency here means writing and reading from Etcd will always return the latest value. This is important for leader election. If we have multiple middleman services, we want to make sure that the leader is the latest leader. 26 | -------------------------------------------------------------------------------- /logging-and-monitoring.md: -------------------------------------------------------------------------------- 1 | # Logging and Monitoring 2 | 3 | What is logging and monitoring? 4 | 5 | Imagine you're running a website like Youtube, and you have millions of users. You want to know how many users are using your website, how many videos are being watched, and how many users are uploading videos. You also want to know how many users are getting errors, and what those errors are. This is where logging and monitoring come in. 6 | 7 | Let's say for Youtube Premium, someone may subscribe, and it may tell them success, but they can't access the premium content, they reach out to your support team, and they can't figure out what's wrong. You need to know what's happening in your system, and this is where logging and monitoring come in. 8 | 9 | ## Logging 10 | 11 | Let's imagine this is happening locally and you're debugging, you would then as a developer print out the error message to the console. But when you're running a distributed system, you can't just print out to the console, you need to log it to a file, or a database, or a log management system. This is because the system is deployed and the error is happening on a server, and you can't just go to the server and look at the console. 12 | 13 | Logging is the process of recording events that happen in your system. These events can be anything from a user logging in, to a server error. Logging is important because it allows you to see what's happening in your system. This is important because you need to know what's happening in your system. You might need to know how many users are getting errors. This is why logging is important. 14 | 15 | When adding logs to your system that will be visible in e.g. a log management system, can be provided by a logging library. This library will allow you to log messages at different levels, such as `info`, `warning`, `error`, etc. This is important because you might want to log different messages at different levels. For example, you might want to log an error message at the `error` level, and a warning message at the `warning` level. 16 | 17 | The format of logs will typically be syslog or JSON. This is because these formats are easy to read and write. 18 | 19 | ## Monitoring 20 | 21 | Monitoring is similar to logging, its a tool or system that watches your system. It watches your system to see if it's working as expected. 22 | 23 | This includes things like: 24 | 25 | - Health checks 26 | - Uptime checks 27 | - Error rate 28 | - Latency 29 | - Traffic 30 | 31 | You can imagine if you're building a system like Youtube, you may want to know the latency of your system, how long it takes for a user to upload a video, how long it takes for a user to start watching a video, etc. 32 | 33 | You will wanna configure alerts, so if error rate goes above a certain threshold, you will get an alert. This way you can quickly respond to issues in your system. 34 | 35 | You can also hook up your monitoring system in a place where you can see it and get the alerts e.g. if you're using Slack, you can hook it up to Slack. 36 | 37 | ## Time series databases 38 | 39 | To gather metrics, you will often use a time series database. This is because time series databases are optimized for time series data. This is important because you will often need to store time series data. For example, you might need to store the number of users that are using your system at a given time. This is why time series databases are important. 40 | -------------------------------------------------------------------------------- /only-cloud-services-neetocde.md: -------------------------------------------------------------------------------- 1 | # The only cloud services you actually need to know 2 | 3 | ## VM 4 | 5 | You can think of a VM as renting a computer in the cloud. You can install whatever you want on it, and it's yours to use as you please. You can use it to run a web server, a database, or anything else you can think of. 6 | 7 | While you might not actually rent a computer, that's the perspective when working with VMs. 8 | 9 | VM stands for "Virtual Machine". 10 | 11 | Every other cloud service is built on top of VMs. They're the foundation of cloud computing. 12 | 13 | This is an example of an unmanaged service. The only thing the cloud providers does for you is provisioning the resource to use, but not managing it for you. 14 | 15 | You will likely connect to the VM e.g. via SSH. 16 | 17 | Examples: AWS EC2, Google Compute Engine, Azure Virtual Machines. 18 | 19 | ## Object stores 20 | 21 | What if you want to store files? 22 | 23 | That's where object stores come in. Your only concern is reading and writing files, you don't need to worry about the underlying infrastructure. As compared to VMs where you have to specify the size of the VM, the amount of memory, the number of CPUs, etc. 24 | 25 | With an object store, you just upload and download files. The cloud provider takes care of the rest. 26 | 27 | You're still under the hood gonna need memory, CPU, etc. but don't worry about it. This is an example of a managed service. The cloud provider takes care of the infrastructure for you, and you just use it. 28 | 29 | This is also sometimes called serverless, because you don't need to worry about servers. Under the hood, files are replicated across multiple servers, so you don't need to worry about losing your data as compared if you had a sinlge point of failure. 30 | 31 | Examples: AWS S3, Google Cloud Storage, Azure Blob Storage. 32 | 33 | ## Database services 34 | 35 | What if you had actual application data? 36 | 37 | Databases are complex. You will wanna run something like MySQL, PostgreSQL, or MongoDB. Databases are just a really complex wrapper around particularly disk space. You could run a database on a VM, specify the size etc, replicate it across multiple VMs, sharding, but that's a lot of manual and complex work. 38 | 39 | That's why cloud providers offer you a managed database service. 40 | 41 | In the world of AWS, relational database would be RDS, and non-relational database would be DynamoDB. 42 | 43 | ### Proprietary services 44 | 45 | In many cases, AWS isn't doing anything creative, they're just taking open source software and running it for you. But in some cases, they have proprietary services. 46 | 47 | For example, DynamoDB is a proprietary service. It's not open source, and it's not something you could run on your own. If you decided to migrate from DynamoDB to a different non relational database because it's e.g. too expensive, you would have to migrate your data, and you would have to change your application code. 48 | 49 | This is also referred to as vendor lock-in. Generally, you don't want to be locked into a single vendor. However, in some cases, the proprietary service is so good that it's worth the trade-off. 50 | 51 | ## Knowing one cloud sets you off 52 | 53 | Every cloud provider offer similar services. 54 | 55 | Google Cloud, AWS, Azure, they all offer VMs, object stores, databases, functions, etc. 56 | 57 | So the foundation of cloud computing is the same across all cloud providers. 58 | 59 | Differences however may be: 60 | 61 | - Pricing 62 | - Performance 63 | - Features 64 | - How you interact with the service 65 | - Etc. 66 | 67 | ## Functions as services 68 | 69 | Cloud providers also offer you the ability to run code without having to worry about the underlying infrastructure. This is known as Functions as a Service (FaaS). 70 | 71 | They let you run compute without worrying. This is built on top of CPU. So if you're not concerned about disk and just want to run code, this is what you'd use e.g. AWS Lambda, Google Cloud Functions, Azure Functions. 72 | 73 | This is also an example of managed service. When people refer to serverless, this is what they're talking about. It's the highest level of abstraction. You just write code, and the cloud provider takes care of the rest. 74 | 75 | ## That's actually enough 76 | 77 | You will typically not even need to deal with VMs. An object store for files, database and functions is enough for most applications. With these three services, you can build a lot of things already. 78 | 79 | ## Observability 80 | 81 | When you're running your application in the cloud, you will want to know what's going on. You will want to know how many requests are coming in, how long they're taking, how much memory you're using, etc. That's where observability comes in. 82 | 83 | Logging: You will want to log what's happening in your application. This can help you debug issues in production and understand what's going on, e.g. if a user complains that something isn't working. 84 | 85 | Monitoring: You will want to monitor your application. You will want to know if your application is up or down, if it's slow (how long queries are taking), if it's using too much memory, etc. 86 | 87 | Alerts: You will want to be alerted if something goes wrong, e.g. if you're hitting more 400 errors than usual, latency is too high, etc. 88 | 89 | Cloud providers offer you services to help you with observability. For example, AWS CloudWatch, Google Cloud Monitoring, Azure Monitor. 90 | 91 | However, there are popular services that fully focus on observability. Datadog is very a popular one. 92 | 93 | ## Data warehouses 94 | 95 | Data warehouses are a type of database that are optimized for analytics. They're not for transactional data, but for running complex queries over large amounts of data. They're often used by data analysts and data scientists to run queries and generate reports. 96 | 97 | They're often used by companies to run complex queries over large amounts of data. For example, a company might want to know how many users are using a certain feature, or how many users have bought a certain product. 98 | 99 | Cloud providers offer you managed data warehouses. For example, AWS Redshift, Google BigQuery, Azure Synapse Analytics. 100 | 101 | However, here external services are also popular. For example, Snowflake, Databricks, etc. 102 | 103 | ## Cloud wrappers 104 | 105 | Cloud wrappers are services that make it easier to work with cloud providers. They're often used by developers to deploy their applications to the cloud. People still often complain about how it's hard to work with cloud providers, and that's where cloud wrappers come in. 106 | 107 | Popular ones include Vercel and Netlify. They make it easy to deploy your application to the cloud. You just run a command, and your application is deployed. 108 | 109 | ## Regional vs Global services 110 | 111 | Some services are regional, and some are global. Regional services are usually cheaper, but they're not always available in all regions. Global services are available in all regions, but they're usually more expensive. 112 | 113 | A CDN is inherently a global service, because you've got servers all over the world where you want to cache your static content. So you can't really have a regional CDN. 114 | 115 | VMs are regional. You can't just have a VM that's split across multiple regions. You have to specify the region when you create the VM. 116 | 117 | It gets a bit fuzzy when talking about services like DynamoDB, Google Cloud Spanner, etc. 118 | 119 | RDS for example is primarily a regional service. This means when you deploy RDS instance, it is deployed in a single region. However, you can create read replicas in other regions. This is a way to make it more highly available, but it's not the same as a global service. 120 | 121 | DynamoDB can be considered a global service. You can create a table in a single region, and then enable global tables. DynamoDB Global Tables provide a fully managed, multi-region, and multi-active database option that delivers fast, localized read and write performance for massively scaled, global applications. 122 | -------------------------------------------------------------------------------- /peer-to-peer-networking.md: -------------------------------------------------------------------------------- 1 | # Peer-to-Peer Networking 2 | 3 | Imagine you want to deploy and transfer large files to thousands of machines at once. For example, let's say you get videos from a security camera and you want to transfer these videos to thousands of machines. 4 | 5 | 5GB files on a 40Gbps network would take 1.25 seconds to transfer. However, if you have 1000 machines, then it would take 20 minutes to transfer the file to all machines. This is because the network is the bottleneck. 6 | 7 | To recap thorughput: Gbps is gigabits per second, and GB is gigabytes. 1 byte is 8 bits. So 40Gbps is 5GB/s. This means 1000 machines would take 5GB/s \* 1000 = 5TB/s. 5TB is 5000GB. So 5000GB / 5GB/s = 1000s = 20 minutes. 8 | 9 | Imagine you want the security camera to send the videos every 5 minutes. If you have 1000 machines, then it would take 1000 \* 5 minutes = 5000 minutes = 83 hours. This is because the network is the bottleneck. 10 | 11 | ## First step 12 | 13 | So as a first step, we know redundancy is a good step. Instead of having a single server, we decide to have multiple servers. This way, if one server goes down, then we can still transfer the file to the other servers. On top of that, we can transfer files concurrently to all servers. 14 | 15 | Let's say we add 10 servers, so we have 10 servers. Now the throughput is 5GB/s \* 10 = 50GB/s. This means 1000 machines would take 5TB / 50GB/s = 100s = 1 minute and 40 seconds. This is a huge improvement from 20 minutes. 16 | 17 | However, we run into a trouble, it's more complex, we have to replicate files across servers and we still haven't solved the problem of the network being the bottleneck. 18 | 19 | ## Peer to Peer Networking comes to play 20 | 21 | So, let's go back to the beginning. We have one server. The idea behind peer to peer networking is to have the machines that are receiving the file to also send the file to other machines. This way, we can leverage the network of machines to transfer the file. 22 | 23 | You recall we had 5 gb files and the network throughput was 40Gbps. This means we can transfer 5GB in 1.25 seconds. If we have 1000 machines, then it would take 5GB / 40Gbps = 1.25s. This is because the network is the bottleneck. 24 | 25 | What if we split down the 5GB file into 1000 pieces, and then each machine sends a piece to another machine. This way, we can transfer the file in 1.25s. This is because the network is the bottleneck. 26 | 27 | Then each machine can send the its piece to another machine. Every one of the 1000 machines would be talking to another machine. While they're sending their piece, they're also receiving a piece from another machine. This way, we can take advantage of the network. 28 | 29 | This means for every second, we are not just sending out a new chunk/piece the 5MB file from the 1000 we broke down into, but the existing machines that have received a piece are also sending out the piece they have received to another machine. This way, we can take advantage of the network. 30 | 31 | ## How to know which machine to send to? 32 | 33 | This get quickly very complicated. 34 | 35 | There are two main ways to do this: 36 | 37 | 1. Centralized: We have a central server that keeps track of which machine has which piece. This way, when a machine wants a piece, it can ask the central server. This is how BitTorrent works. BitTorrent has a central server called a tracker (often what this central server is generally called in peer to peer networking) that keeps track of which machine has which piece. When a machine wants a piece, it can ask the tracker. The tracker then tells the machine which machine has the piece. The machine can then ask the other machine for the piece. 38 | 39 | 2. Gossip: We can use a gossip protocol. A gossip protocol is a protocol that allows machines to talk to each other. When a machine wants a piece, it can ask another machine. The other machine can then tell the machine which machine has the piece. The machine can then ask the other machine for the piece. This is how the Cassandra database works. Cassandra uses a gossip protocol to allow machines to talk to each other. It's named gossip because it's like how people gossip, like how a rumor spreads. 40 | 41 | DHT: Peers often operate by using a distributed hash table (DHT). A DHT is a distributed system that provides a lookup service similar to a hash table. A DHT is a key-value store. The key is the hash of the file, and the value is the machine that has the file. When a machine wants a piece, it can ask the DHT. The DHT then tells the machine which machine has the piece. The machine can then ask the other machine for the piece. This is how the Kademlia DHT works. Kademlia is a DHT that's used in peer to peer networking. 42 | -------------------------------------------------------------------------------- /planning.md: -------------------------------------------------------------------------------- 1 | # Planning in system design interviews 2 | 3 | When asked e.g. to design Youtube, you want to start off by asking questions to gather the requirements and constraints of the system. 4 | 5 | Then before you even dive into designing the actual system, you want to write a high level plan of what you're going to do. This is important because it shows the interviewer that you have a plan and that you're not just going to start coding without thinking. 6 | 7 | It also helps you to organize your thoughts and to make sure that you're not missing anything. 8 | 9 | It's also easier for the interviewer to follow along with your thought process if you have a plan. And if you're wrong somewhere, the interviewer can correct you before you've spent a lot of time designing the system. 10 | -------------------------------------------------------------------------------- /pollingstreaming.md: -------------------------------------------------------------------------------- 1 | # Polling and Streaming 2 | 3 | What if we have to build a system where our clients need the updated data frequently, e.g. monitoring temperature, stock prices, etc.? 4 | 5 | We're dealing with data clients need to see regularly updated. 6 | 7 | This is where both polling and streaming comes in. 8 | 9 | ## Polling 10 | 11 | Polling is when the client asks the server for data at regular intervals. For example, the client can ask the server for data every 5 seconds. 12 | 13 | Polling is good for when the data doesn't change often. For example, if the data changes every 5 minutes, then polling every 5 seconds is a waste of resources. 14 | 15 | If you're building a real time app like a chat app, polling may not be the best thing to do. This is because the data changes frequently, and polling every second would be a waste of resources. It's gonna be a trade off where its much load on the server. 16 | 17 | This is where streaming comes in. 18 | 19 | ## Streaming 20 | 21 | You can have your client open a long lived connection to the server. This way, the server can send the data to the client as soon as it's available. This is typically done using websockets. You can imagine it like a portal from the server to the client. The server can send the data to the client as soon as it's available. 22 | 23 | Servers proactively sending data to clients is often referred to as "push" or "server push". 24 | -------------------------------------------------------------------------------- /pub-sub.md: -------------------------------------------------------------------------------- 1 | # Publish/Subscribe Pattern 2 | 3 | Imagine you've this problem: you have a system where you have a lot of users, and you want to send them notifications. You could send them notifications directly, but this would be inefficient. Instead, you could use the publish/subscribe pattern. 4 | 5 | The pub sub pattern consists of 4 main components: 6 | 7 | - Publisher: This is the component that sends messages. In our example, this would be the component that sends notifications to users. 8 | - Subscriber: This is the component that receives messages. In our example, this would be the component that receives notifications from the publisher. 9 | - Topic: This is the channel that the publisher sends messages to. In our example, this would be the channel that the publisher sends notifications to. 10 | - Message: This is the data that the publisher sends to the subscriber. In our example, this would be the notification that the publisher sends to the subscriber. 11 | 12 | The pub sub pattern is useful because it allows you to decouple the publisher from the subscriber. This means that the publisher doesn't need to know who the subscriber is, and the subscriber doesn't need to know who the publisher is. This is useful because it allows you to add new publishers and subscribers without changing the existing publishers and subscribers. 13 | 14 | The publishers publish message to a specific topic. Each topic can be seen as a channel of specific type of messages, for example, a topic could be "notifications", and the messages could be "new video uploaded", "new comment", etc. 15 | 16 | Subscribers can subscribe to a specific topic, and they will receive all messages published to that topic. For example, a user could subscribe to the "notifications" topic, and they would receive all notifications published to that topic. 17 | 18 | ## Message Delivery Guarantees 19 | 20 | We have different types of delivery guarantees: 21 | 22 | - At most once: The message is delivered at most once, or it may not be delivered at all. 23 | - At least once: The message is always delivered, but it may be delivered more than once. 24 | - Exactly once: The message is always delivered exactly once. 25 | 26 | Let's say a subscriber is offline, and a message is published to a topic. When the subscriber comes back online, it will receive the message. This is an example of at least once delivery. 27 | 28 | Let's say a message is published to a topic, and the subscriber receives the message. The subscriber then sends an acknowledgment to the publisher. If the publisher doesn't receive the acknowledgment, it will send the message again. This is an example of at most once delivery. 29 | 30 | Let's say a message is published to a topic, and the subscriber receives the message. The subscriber then sends an acknowledgment to the publisher. If the publisher doesn't receive the acknowledgment, it will send the message again. The subscriber will then check if it has already processed the message, and if it has, it will ignore the message. This is an example of exactly once delivery. 31 | 32 | ## Idempotent 33 | 34 | When a message is delivered more than once, it's important that the message is idempotent. This means that the message can be processed multiple times without changing the result. This is important because it ensures that the message can be processed multiple times without causing any problems. For example, if a message is delivered more than once, and the message is idempotent, then the message can be processed multiple times without causing any problems. 35 | 36 | ## Ordering 37 | 38 | When messages are published to a topic, they are delivered to the subscribers in the order they were published. This is important because it ensures that the messages are delivered in the correct order. For example, if a message is published before another message, then it should be delivered before the other message. 39 | 40 | Some pub/sub solutions out there will give you the ability to replay/rewind messages, so if you have a new subscriber, it can go back and read all the messages that were published to a topic before it subscribed. 41 | 42 | ## Why multiple topics? 43 | 44 | You might be wondering why we need multiple topics. The reason is that it allows you to have different types of messages. For example, you might have a "notifications" topic for notifications, and a "comments" topic for comments. This is useful because it allows you to have different types of messages, and different subscribers for those messages. 45 | 46 | ## Popular tools 47 | 48 | Some popular pub/sub tools are: 49 | 50 | - Google Cloud Pub/Sub 51 | - Amazon SQS 52 | - RabbitMQ 53 | - Redis 54 | - Kafka 55 | 56 | Some of them offer nice solutions e.g. replayability or Google Cloud Pub/Sub offers scalability out of the box, you don't need to worry about scaling topics or redundancy of topics. 57 | -------------------------------------------------------------------------------- /rate-limiting.md: -------------------------------------------------------------------------------- 1 | # Rate Limiting 2 | 3 | Rate limiting is a technique that limits the number of requests that can be made to a server in a given time period. It's a way to prevent abuse and ensure that the server can handle a large number of requests. 4 | 5 | For example, if you have a client, server and database, you may want to limit the number of requests that the client can make to the server in a given time period. This is to prevent the server from being overwhelmed by too many requests. 6 | 7 | For example, you may accept 100 requests per second from a single client. If the client exceeds this limit, you may want to return an error message to the client. The error status code `429 Too Many Requests` is often used for this purpose. 8 | 9 | Rate limiting a server protects it from e.g. DoS (Denial of Service) attacks, where an attacker tries to overwhelm the server with too many requests. They do this in order to make the server unavailable to other users. From bad actors to bugs, rate limiting can help protect your server from being overwhelmed. 10 | 11 | You can rate limit based on different things, region, IP Address, user, etc. You can also rate limit based on different time periods, such as per second, per minute, per hour, etc. 12 | 13 | Rate limiting isn't the ultimate way of protecting your server, but it's a good start. It's a way to prevent abuse and ensure that the server can handle a large number of requests. 14 | 15 | For example, DDoS (Distributed Denial of Service) attacks are a type of DoS attack where an attacker uses multiple machines to overwhelm a server. Rate limiting can help protect your server from DoS attacks, but DDoS attacks are more difficult to protect against, here you will need more advanced techniques such as load balancing, firewalls, etc. 16 | -------------------------------------------------------------------------------- /replication-sharding.md: -------------------------------------------------------------------------------- 1 | # Replication and Sharding 2 | 3 | ## Replication 4 | 5 | Imagine you've one database. What do you do if the database goes down? You're out of luck. You can't access your data. You can't write new data. You can't read existing data. You're stuck. 6 | 7 | That's where replication comes in. Replication is the process of copying data from one database to another. The goal of replication is to increase the availability of your data. If one database goes down, you can still access your data from another database. 8 | 9 | Another benefit of replication is having the database in different locations. This can help with latency. If you have a database in the US and another in Europe, users in Europe can access the European database, and users in the US can access the US database. This can reduce the latency for users in Europe. 10 | 11 | Replication helps you achieve: 12 | 13 | - **High availability**: If one database goes down, you can still access your data from another database. 14 | - **Disaster recovery**: If one data center goes down, you can still access your data from another data center. 15 | - **Latency reduction**: If you have a database in the US and another in Europe, users in Europe can access the European database, and users in the US can access the US database. This can reduce the latency for users in Europe. 16 | 17 | ## Sharding 18 | 19 | Okay, that's replication, huh, we're saved? 20 | 21 | Well, what if our database has grown so large that it doesn't fit on a single machine? What if we have so much data that we need to spread it across multiple machines? 22 | 23 | This is where sharding comes in. Sharding is the process of splitting up a large database into smaller, more manageable pieces called shards. Each shard is stored on a separate machine. 24 | 25 | Sharding is often used in combination with replication. You can shard your data across multiple machines, and then replicate each shard to increase the availability of your data. 26 | 27 | There are multiple different strategies for sharding your data, an example is e.g. users with ID 1-1000 go to shard 1, users with ID 1001-2000 go to shard 2, and so on. 28 | 29 | ### Hotspot 30 | 31 | You've to be careful with the sharding strategy you choose. If you shard your data in a way that causes a hotspot, you can end up with a performance bottleneck. A hotspot is a shard that receives a disproportionate amount of traffic. For example, if you shard your data by user ID, and one user has a disproportionate amount of traffic, you can end up with a hotspot. 32 | -------------------------------------------------------------------------------- /security-https.md: -------------------------------------------------------------------------------- 1 | # Security and HTTPS 2 | 3 | ## HTTP 4 | 5 | HTTP is a protocol built on top of TCP/IP. It is a stateless protocol, meaning that each command is executed independently, without any knowledge of the commands that came before it. It's nice because it's simple, but it's not secure. 6 | 7 | HTTP is not secure. It's vulnerable to man-in-the-middle attacks, where an attacker intercepts the communication between the client and the server. Someone who comes in the middle, even though we expect the communication to be secure and private. This is because HTTP is a stateless protocol, meaning that each command is executed independently, without any knowledge of the commands that came before it. This means that there's no way to verify the identity of the server, and no way to verify the integrity of the data. 8 | 9 | How does man in the middle attacks happen? When you send a request to a server, the request is sent in plain text. This means that anyone who can intercept the request can read it. They can also modify the request, and send it on to the server. The server will then send a response back to the attacker, who can modify the response, and send it on to the client. The client will then receive the modified response, and have no way of knowing that it has been tampered with. 10 | 11 | ## HTTPS 12 | 13 | HTTPS is a secure version of HTTP. It's a combination of HTTP and SSL/TLS (Secure Sockets Layer/Transport Layer Security). It's secure because it encrypts the data that's being sent, and it verifies the identity of the server. 14 | 15 | To understand HTTPS, we first have to dive into how encryption works. 16 | 17 | The idea behind encryption is to take some data, and scramble it in such a way that only someone who has the key can unscramble it. The key is a piece of information that's used to encrypt and decrypt the data. There are two types of encryption: symmetric and asymmetric. 18 | 19 | ## Symmetric Encryption 20 | 21 | Symmetric encryption is a type of encryption where the same key is used to encrypt and decrypt the data. This means that the key has to be kept secret, because anyone who has the key can decrypt the data. The most common algorithm used for symmetric encryption is the Advanced Encryption Standard (AES). 22 | 23 | The drawback of symmetric encryption is that the key has to be shared between the client and the server. This means that the key has to be sent over the network, which is not secure. If an attacker intercepts the key, they can decrypt the data. 24 | 25 | Or if the key gets compromised, the attacker can decrypt all the data that was encrypted with that key. 26 | 27 | ## Asymmetric Encryption 28 | 29 | Asymmetric encryption is a type of encryption where two keys are used: a public key and a private key. The public key is used to encrypt the data, and the private key is used to decrypt the data. This means that the public key can be shared with anyone, and the private key is kept secret. 30 | 31 | e.g. If Alice wants to send a message to Bob, she can use Bob's public key to encrypt the message. Bob can then use his private key to decrypt the message. 32 | 33 | If a client wants to send a message to a server, it can use the server's public key to encrypt the message. The server can then use its private key to decrypt the message. 34 | 35 | When the server sends a response back to the client, it can use the client's public key to encrypt the response. The client can then use its private key to decrypt the response. 36 | 37 | This is nice because the public key can be shared with anyone, and the private key is kept secret. This means that the public key can be sent over the network, and it's not a security risk. 38 | 39 | The downside of asymmetric encryption is that it's slower than symmetric encryption. This is because the keys are longer, and the algorithms are more complex. This means that asymmetric encryption is not suitable for large amounts of data. 40 | 41 | ## SSL/TLS 42 | 43 | SSL is a protocol that was developed by Netscape in the 1990s. It's a way to secure the communication between the client and the server. It's based on the idea of using asymmetric encryption to establish a secure connection, and then using symmetric encryption to send the data. 44 | 45 | SSL is good because it verifies the identity of the server, and it encrypts the data that's being sent. This means that it's secure against man-in-the-middle attacks. 46 | 47 | SSL Flow: 48 | 49 | 1. The client sends a request to the server, asking for a secure connection. 50 | 2. The server sends its public key to the client. 51 | 3. The client verifies the server's identity, and then generates a symmetric key. 52 | 4. The client encrypts the symmetric key with the server's public key, and sends it to the server. 53 | 5. The server decrypts the symmetric key with its private key. 54 | 6. The client and the server now have a shared symmetric key, which they can use to encrypt and decrypt the data. 55 | 7. The client and the server can now communicate securely. 56 | 8. The server sends a response back to the client, and the data is encrypted with the shared symmetric key. 57 | 58 | The problem with SSL is that it has security vulnerabilities. This means that it's not secure against modern attacks. This is why SSL has been deprecated, and it's no longer considered secure. 59 | 60 | TLS was built on top of SSL, because SSL had drawbacks like security vulnerabilities. TLS is the newer and more secure version of SSL. 61 | 62 | ## TLS Flow 63 | 64 | 1. The client sends a request to the server, asking for a secure connection. 65 | 2. The server sends its public key to the client. 66 | 3. The client verifies the server's identity, and then generates a symmetric key. 67 | 4. The client encrypts the symmetric key with the server's public key, and sends it to the server. 68 | 5. The server decrypts the symmetric key with its private key. 69 | 6. The client and the server now have a shared symmetric key, which they can use to encrypt and decrypt the data. 70 | 7. The client and the server can now communicate securely. 71 | 8. The server sends a response back to the client, and the data is encrypted with the shared symmetric key. 72 | 73 | HTTPS is HTTP over TLS, which means that the data is encrypted and the server is verified. 74 | 75 | ## TLS Handshake 76 | 77 | 1. The client sends a request to the server, asking for a secure connection. 78 | 2. The server sends its public key to the client. 79 | 3. The client verifies the server's identity, and then generates a symmetric key. Symmetric key is a random number that's generated by the client. It's used to encrypt and decrypt the data. 80 | 4. The client encrypts the symmetric key with the server's public key, and sends it to the server. 81 | 5. The server decrypts the symmetric key with its private key. 82 | 6. The client and the server now have a shared symmetric key, which they can use to encrypt and decrypt the data. 83 | -------------------------------------------------------------------------------- /special-storages.md: -------------------------------------------------------------------------------- 1 | # Specialized Storage Paradigms 2 | 3 | Databases are one type of storage system, but there are many others. Each type of storage system has its own strengths and weaknesses, and is best suited for different types of data and use cases. 4 | 5 | ## Blob Storage 6 | 7 | Blob is an ancroym for Binary Large Object. Blob is arbitrary piece of data, such as a file, an image, or a video. Blob storage is a type of storage system that is optimized for storing and retrieving blobs. It is often used to store large files, such as images, videos, and backups. 8 | 9 | A blob doesn't fit into SQL database, because it's not a structured data. It's just a file. So, you can't query a blob like you can query a SQL database. Instead, you can only retrieve the entire blob at once. 10 | 11 | Common blob storage systems include Amazon S3, Google Cloud Storage, and Azure Blob Storage. 12 | 13 | They're optimized for storing and retrieving large of unsctructured data, and they're often used to store files, images, videos, and backups. 14 | 15 | Usually you would access a blog via a key-value interface, where you provide a key and get back the blob associated with that key. 16 | 17 | ## Time Series Databases 18 | 19 | Time series database is a type of database that is optimized for storing and retrieving time series data. Time series data is data that is indexed by time, such as stock prices, sensor readings, and server logs. 20 | 21 | Time series databases are used to store and analyze data that is indexed by time. They're often used to store logs, logs, and other time-based data. 22 | 23 | Common use cases: Monitoring, IoT, and financial data. 24 | 25 | Common time series databases include InfluxDB, Prometheus, and Graphite. 26 | 27 | ## Graph Databases 28 | 29 | Graph database relies at its core on graph structures for semantic queries with nodes, edges, and properties to represent and store data. A graph database is a type of database that is optimized for storing and retrieving graph data. Graph data is data that is represented as a graph, with nodes and edges. 30 | 31 | If you're storing data where there are many relationships between different items, a graph database might be a good choice. For example, social networks, recommendation engines, and fraud detection. 32 | 33 | Many-to-many relationships are a good fit for graph databases, but only if you need to query the relationships between the items. 34 | 35 | If relations are the core of your data, a graph database might be a good choice. 36 | 37 | The most popular graph databases are Neo4j, Amazon Neptune, and Azure Cosmos DB. 38 | 39 | For example, in Neo4j, you have a special query language called Cypher, which is designed to work with graph data. You can use Cypher to query and manipulate graph data. 40 | 41 | ## Spatial Databases 42 | 43 | Spatial database is a type of database that is optimized for storing and retrieving spatial data. Spatial data is data that is indexed by location, such as maps, GPS coordinates, and geographic information. 44 | 45 | If you're making queries e.g. "find all the restaurants within 5 miles of my current location", a spatial database might be a good choice. SQL databases can do this, but spatial databases are optimized for these types of queries. SQl databases may not be a good choice when you have a lot of spatial data. 46 | 47 | Quadtree is a common data structure used in spatial databases. It's a tree-like data structure that is used to store and retrieve spatial data. Quadtrees are used to efficiently store and retrieve data that is indexed by location. 48 | 49 | How quadtrees work: You start with a single square that represents the entire area. Then you divide the square into four smaller squares. Then you divide each of those squares into four smaller squares, and so on. Each square is called a "node" in the quadtree. Each node can have zero or more children. Each node represents a region of space, and each child represents a smaller region of space. 50 | 51 | You keep dividing the squares until each square contains a small number of data points. Then you store the data points in the squares. When you want to find all the data points within a certain region of space, you can use the quadtree to efficiently find the data points that are in that region. 52 | 53 | You can find a location in big o log n time, where n is the number of data points in the quadtree. This is because the quadtree works like a binary search tree, but it has four children instead of two. 54 | -------------------------------------------------------------------------------- /web-sockets.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | Imagine we want to build the chat of Twitch. We want to see the chats of users in real-time. One approach we could take is to make HTTP requests every second to get the latest messages. This approach is not efficient because we are making a lot of requests to the server. HTTP is built on top of TCP, which is a connection-oriented protocol. This means that every time we make a request, we need to establish a connection with the server. TCP establishes a connection by performing a three-way handshake. For real-time applications, this approach is not efficient. 4 | 5 | # WebSockets 6 | 7 | WebSockets is a protocol that provides a two-way communication channel between a client and a server over a single, long-lived connection. The client first makes an HTTP request to the server, then the server upgrades the connection to a WebSocket connection. The status code `101 Switching Protocols` is used to indicate that the server is switching protocols. Once the connection is established, the client and server can send messages to each other in real-time. 8 | -------------------------------------------------------------------------------- /web-workers.md: -------------------------------------------------------------------------------- 1 | # JavaScript is Single-Threaded 2 | 3 | JavaScript is a single-threaded programming language, which means it has a single call stack and can only execute one task at a time. When a script is running, it blocks other scripts from running until it completes. 4 | 5 | This can lead to performance issues, especially when dealing with long-running or computationally intensive tasks, as they can freeze the user interface and make the application unresponsive. 6 | 7 | # Example of problematic code 8 | 9 | Here's an example of problematic code that can freeze the user interface: 10 | 11 | ```javascript 12 | function longRunningTask() { 13 | let sum = 0; 14 | for (let i = 0; i < 1000000000; i++) { 15 | sum += i; 16 | } 17 | console.log(sum); 18 | } 19 | 20 | longRunningTask(); 21 | 22 | function otherTask() { 23 | console.log("This is another task"); 24 | } 25 | 26 | otherTask(); 27 | ``` 28 | 29 | `longRunningTask` is a computationally intensive task. For the sake of simplicity, we're just summing numbers in a loop. This could be something different. 30 | 31 | The problem we encounter here: We can't execute `otherTask` until `longRunningTask` completes. This can cause the user interface to freeze, and the application becomes unresponsive. 32 | 33 | # Multi-Threaded Execution with Web Workers 34 | 35 | Web Workers API allows you to run JavaScript code in the background, in a separate thread. You can run multiple scripts concurrently. This is called multi-threaded execution. Multi-threaded means that multiple threads can run simultaneously, allowing you to perform multiple tasks at the same time. 36 | 37 | # Speed up the slow example with Web Workers 38 | 39 | Let's speed up the slow example with Web Workers. We'll move the `longRunningTask` to a Web Worker, so it doesn't block the main thread: 40 | 41 | ```javascript 42 | function longRunningTask() { 43 | const worker = new Worker("worker.js"); 44 | worker.postMessage("start"); 45 | worker.onmessage = (event) => { 46 | console.log(event.data); 47 | worker.terminate(); 48 | }; 49 | } 50 | 51 | longRunningTask(); 52 | 53 | function otherTask() { 54 | console.log("This is another task"); 55 | } 56 | 57 | otherTask(); 58 | ``` 59 | 60 | In the updated code, we create a worker with `new Worker("worker.js")`. The worker runs the code in a separate file called `worker.js`. This could be any file name. The code will run in a separate thread, so it doesn't block the main thread. 61 | 62 | `worker.postMessage("start")` sends a message to the worker to start the task. The worker will perform the task and send the result back to the main thread with `worker.onmessage`. 63 | 64 | This will make more sense when we look at the `worker.js` file: 65 | 66 | ```javascript 67 | self.onmessage = (event) => { 68 | if (event.data === "start") { 69 | let sum = 0; 70 | for (let i = 0; i < 1000000000; i++) { 71 | sum += i; 72 | } 73 | self.postMessage(sum); 74 | } 75 | }; 76 | ``` 77 | 78 | The message you send with `worker.postMessage` is received in the worker with `self.onmessage`. `event.data` contains the message you sent. In this case, we're checking if the message is "start". 79 | 80 | If you wanted to, you could send more complex data to the worker. For example, you could send an object with multiple properties. 81 | 82 | When the worker completes the task, it sends the result back to the main thread with `self.postMessage`. The main thread receives the result with `worker.onmessage` and calls `worker.terminate()` to stop the worker. 83 | 84 | This lets the main thread continue executing other tasks, like `otherTask`, without being blocked by `longRunningTask`. 85 | 86 | # Downsides 87 | 88 | Web Workers are great for offloading heavy tasks to a separate thread, but they come with some downsides: 89 | 90 | - Web Workers can't access the DOM directly. They run in a separate global context, so they can't interact with the main thread's DOM. 91 | - Web Workers can't access global variables or functions from the main thread. You need to pass data between the main thread and the worker using messages. This increases complexity. 92 | - Debugging Web Workers can be more challenging compared to regular JavaScript code, as they run in a separate thread with limited access to the browser's developer tools. 93 | 94 | # Conclusion 95 | 96 | If you need to run computationally intensive tasks in the background without blocking the main thread, Web Workers are a great solution. They allow you to run JavaScript code concurrently in separate threads, improving performance and user experience. 97 | --------------------------------------------------------------------------------