├── README.md ├── SUMMARY.md ├── basics ├── caching.md ├── cap-theorem.md ├── client-server-communication.md ├── consistent-hashing.md ├── indexes.md ├── key-characteristics.md ├── load-balancing.md ├── proxies.md ├── queues.md ├── redundancy.md ├── sharding.md ├── sql-vs-nosql.md └── system-design-basics.md ├── bin └── publish-gh-pages.sh ├── designs ├── additional-resources.md ├── dropbox.md ├── facebook-messenger.md ├── facebook-newsfeed.md ├── instagram.md ├── pastebin.md ├── rate-limiter.md ├── short-url.md ├── step-by-step-guide.md ├── ticketmaster.md ├── twitter-search.md ├── twitter.md ├── typeahead.md ├── uber-backend.md ├── web-crawler.md ├── yelp.md └── youtube.md └── img ├── .keep ├── basics ├── .keep ├── RedundancyAndReplication.png ├── ajax-polling.svg ├── cap-theorem.png ├── consistent-hashing-1.png ├── consistent-hashing-2.png ├── consistent-hashing-3.png ├── consistent-hashing-4.png ├── consistent-hashing-5.png ├── http-request.svg ├── index-table.svg ├── load-balancer-1.svg ├── load-balancer-2.png ├── load-balancer-3.svg ├── long-polling.svg ├── proxy.png ├── server-sent-events.svg ├── vertical-horizontal-scaling.svg └── web-sockets.svg ├── dropbox-1.png ├── dropbox-2.png ├── dropbox-3.png ├── dropbox-4.png ├── facebook-messenger-1.png ├── facebook-messenger-10.png ├── facebook-messenger-2.png ├── facebook-messenger-3.png ├── facebook-messenger-4.png ├── facebook-messenger-5.png ├── facebook-messenger-6.png ├── facebook-messenger-7.png ├── facebook-messenger-8.png ├── facebook-messenger-9.png ├── high-level-design.png ├── instagram-1.png ├── instagram-2.png ├── instagram-3.png ├── instagram-4.png ├── newsfeed-1.png ├── newsfeed-2.png ├── pastebin-1.png ├── pastebin-2.svg ├── pastebin-3.png ├── rate-limiter-1.png ├── rate-limiter-2.png ├── rate-limiter-3.png ├── rate-limiter-4.png ├── rate-limiter-5.png ├── rate-limiter-6.png ├── rate-limiter-7.png ├── rate-limiter-8.png ├── rate-limiter-9.png ├── ticketmaster-1.png ├── ticketmaster-10.png ├── ticketmaster-11.png ├── ticketmaster-2.png ├── ticketmaster-3.png ├── ticketmaster-4.png ├── ticketmaster-5.png ├── ticketmaster-6.png ├── ticketmaster-7.png ├── ticketmaster-8.png ├── ticketmaster-9.png ├── twitter-1.png ├── twitter-2.png ├── twitter-3.png ├── twitter-4.png ├── twitter-search-1.png ├── twitter-search-2.png ├── typeahead-1.png ├── typeahead-2.png ├── typeahead-3.png ├── uber-1.png ├── url-shortener-1.svg ├── url-shortener-10.png ├── url-shortener-11.png ├── url-shortener-12.png ├── url-shortener-13.png ├── url-shortener-14.png ├── url-shortener-15.png ├── url-shortener-16.png ├── url-shortener-17.png ├── url-shortener-18.png ├── url-shortener-19.png ├── url-shortener-2.png ├── url-shortener-20.png ├── url-shortener-21.png ├── url-shortener-22.png ├── url-shortener-23.png ├── url-shortener-3.png ├── url-shortener-4.png ├── url-shortener-5.png ├── url-shortener-6.png ├── url-shortener-7.png ├── url-shortener-8.png ├── url-shortener-9.png ├── web-crawler-1.png ├── web-crawler-2.png ├── yelp-1.png ├── yelp-2.png ├── yelp-3.png ├── youtube-1.png └── youtube-2.png /README.md: -------------------------------------------------------------------------------- 1 | Navigating the System Design Interview 2 | ==== 3 | 4 | ## Interview Process 5 | - Scope the problem 6 | - Don’t make assumptions. 7 | - Ask clarifying questions to understand the constraints and use cases. 8 | - Steps 9 | - Requirements clarifications 10 | - System interface definition 11 | - Sketch up an abstract design 12 | - Building blocks of the system 13 | - Relationships between them 14 | - Steps 15 | - Back-of-the-envelope estimation 16 | - Defining data model 17 | - High-level design 18 | - Identify and address the bottlenecks 19 | - Use the fundamental principles of scalable system design 20 | - Steps 21 | - Detailed design 22 | - Identifying and resolving bottlenecks 23 | 24 | ## Distributed System Design Basics 25 | - [System Design Basics](basics/system-design-basics.md) 26 | - [Key Characterics](basics/key-characteristics.md) 27 | - [Loading balancing](basics/load-balancing.md) 28 | - [Caching](basics/caching.md) 29 | - [Sharding](basics/sharding.md) 30 | - [Indexes](basics/indexes.md) 31 | - [Proxies](basics/proxies.md) 32 | - [Queues](basics/queues.md) 33 | - [Redundancy](basics/redundancy.md) 34 | - [SQL vs. NoSQL](basics/sql-vs-nosql.md) 35 | - [CAP Theorem](basics/cap-theorem.md) 36 | - [Consistent Hashing](basics/consistent-hashing.md) 37 | - [Client Server Communication](basics/client-server-communication.md) 38 | 39 | ## System Designs 40 | - [Short URL Service](designs/short-url.md) 41 | - [Pastebin](designs/pastebin.md) 42 | - [Instagram](designs/instagram.md) 43 | - [Dropbox](designs/dropbox.md) 44 | - [Facebook Messenger](designs/facebook-messenger.md) 45 | - [Twitter](designs/twitter.md) 46 | - [Youtube](designs/youtube.md) 47 | - [Typehead Suggestion](designs/typeahead.md) 48 | - [API Rate Limiter](designs/rate-limiter.md) 49 | - [Twitter Search](designs/twitter-search.md) 50 | - [Web Crawler](designs/web-crawler.md) 51 | - [Facebook Newsfeed](designs/facebook-newsfeed.md) 52 | - [Yelp](designs/yelp.md) 53 | - [Uber Backend](designs/uber-backend.md) 54 | - [Ticketmaster](designs/ticketmaster.md) 55 | - [Additional Resources](designs/additional-resources.md) 56 | -------------------------------------------------------------------------------- /SUMMARY.md: -------------------------------------------------------------------------------- 1 | * [Contents](README.md) 2 | 3 | * Basics 4 | - [System Design Basics](basics/system-design-basics.md) 5 | - [Key Characterics](basics/key-characteristics.md) 6 | - [Loading balancing](basics/load-balancing.md) 7 | - [Caching](basics/caching.md) 8 | - [Sharding](basics/sharding.md) 9 | - [Indexes](basics/indexes.md) 10 | - [Proxies](basics/proxies.md) 11 | - [Queues](basics/queues.md) 12 | - [Redundancy](basics/redundancy.md) 13 | - [SQL vs. NoSQL](basics/sql-vs-nosql.md) 14 | - [CAP Theorem](basics/cap-theorem.md) 15 | - [Consistent Hashing](basics/consistent-hashing.md) 16 | - [Client Server Communication](basics/client-server-communication.md) 17 | 18 | * Designs 19 | - [System Design Interviews: A step by step guide](designs/step-by-step-guide.md) 20 | - [Short URL Service](designs/short-url.md) 21 | - [Pastebin](designs/pastebin.md) 22 | - [Instagram](designs/instagram.md) 23 | - [Dropbox](designs/dropbox.md) 24 | - [Facebook Messenger](designs/facebook-messenger.md) 25 | - [Twitter](designs/twitter.md) 26 | - [Youtube](designs/youtube.md) 27 | - [Typehead Suggestion](designs/typeahead.md) 28 | - [API Rate Limiter](designs/rate-limiter.md) 29 | - [Twitter Search](designs/twitter-search.md) 30 | - [Web Crawler](designs/web-crawler.md) 31 | - [Facebook Newsfeed](designs/facebook-newsfeed.md) 32 | - [Yelp](designs/yelp.md) 33 | - [Uber Backend](designs/uber-backend.md) 34 | - [Ticketmaster](designs/ticketmaster.md) 35 | - [Additional Resources](designs/additional-resources.md) 36 | -------------------------------------------------------------------------------- /basics/caching.md: -------------------------------------------------------------------------------- 1 | Caching 2 | ==== 3 | Load balancing helps you scale horizontally across an ever-increasing number of servers, but caching will enable you to make vastly better use of the resources you already have as well as making otherwise unattainable product requirements feasible. Caches take advantage of the locality of reference principle: recently requested data is likely to be requested again. They are used in almost every layer of computing: hardware, operating systems, web browsers, web applications, and more. A cache is like short-term memory: it has a limited amount of space, but is typically faster than the original data source and contains the most recently accessed items. Caches can exist at all levels in architecture, but are often found at the level nearest to the front end where they are implemented to return data quickly without taxing downstream levels. 4 | 5 | ## Application server cache 6 | Placing a cache directly on a request layer node enables the local storage of response data. Each time a request is made to the service, the node will quickly return local cached data if it exists. If it is not in the cache, the requesting node will query the data from disk. The cache on one request layer node could also be located both in memory (which is very fast) and on the node’s local disk (faster than going to network storage). 7 | 8 | What happens when you expand this to many nodes? If the request layer is expanded to multiple nodes, it’s still quite possible to have each node host its own cache. However, if your load balancer randomly distributes requests across the nodes, the same request will go to different nodes, thus increasing cache misses. Two choices for overcoming this hurdle are global caches and distributed caches. 9 | 10 | ## Content Distribution Network (CDN) 11 | CDNs are a kind of cache that comes into play for sites serving large amounts of static media. In a typical CDN setup, a request will first ask the CDN for a piece of static media; the CDN will serve that content if it has it locally available. If it isn’t available, the CDN will query the back-end servers for the file, cache it locally, and serve it to the requesting user. 12 | 13 | If the system we are building isn’t yet large enough to have its own CDN, we can ease a future transition by serving the static media off a separate subdomain (e.g. static.yourservice.com) using a lightweight HTTP server like Nginx, and cut-over the DNS from your servers to a CDN later. 14 | 15 | ## Cache Invalidation 16 | While caching is fantastic, it does require some maintenance for keeping cache coherent with the source of truth (e.g., database). If the data is modified in the database, it should be invalidated in the cache; if not, this can cause inconsistent application behavior. 17 | 18 | Solving this problem is known as cache invalidation; there are three main schemes that are used: 19 | 20 | **Write-through cache**: Under this scheme, data is written into the cache and the corresponding database at the same time. The cached data allows for fast retrieval and, since the same data gets written in the permanent storage, we will have complete data consistency between the cache and the storage. Also, this scheme ensures that nothing will get lost in case of a crash, power failure, or other system disruptions. 21 | 22 | Although, write through minimizes the risk of data loss, since every write operation must be done twice before returning success to the client, this scheme has the disadvantage of higher latency for write operations. 23 | 24 | **Write-around cache**: This technique is similar to write through cache, but data is written directly to permanent storage, bypassing the cache. This can reduce the cache being flooded with write operations that will not subsequently be re-read, but has the disadvantage that a read request for recently written data will create a “cache miss” and must be read from slower back-end storage and experience higher latency. 25 | 26 | **Write-back cache**: Under this scheme, data is written to cache alone and completion is immediately confirmed to the client. The write to the permanent storage is done after specified intervals or under certain conditions. This results in low latency and high throughput for write-intensive applications, however, this speed comes with the risk of data loss in case of a crash or other adverse event because the only copy of the written data is in the cache. 27 | 28 | ## Cache eviction policies 29 | Following are some of the most common cache eviction policies: 30 | 31 | 1. First In First Out (FIFO): The cache evicts the first block accessed first without any regard to how often or how many times it was accessed before. 32 | 2. Last In First Out (LIFO): The cache evicts the block accessed most recently first without any regard to how often or how many times it was accessed before. 33 | 3. Least Recently Used (LRU): Discards the least recently used items first. 34 | 4. Most Recently Used (MRU): Discards, in contrast to LRU, the most recently used items first. 35 | 5. Least Frequently Used (LFU): Counts how often an item is needed. Those that are used least often are discarded first. 36 | 6. Random Replacement (RR): Randomly selects a candidate item and discards it to make space when necessary. 37 | 38 | Following links have some good discussion about caching: 39 | * [1] [Cache](https://en.wikipedia.org/wiki/Cache_(computing)) 40 | * [2] [Introduction to architecting systems](https://lethain.com/introduction-to-architecting-systems-for-scale/) 41 | -------------------------------------------------------------------------------- /basics/cap-theorem.md: -------------------------------------------------------------------------------- 1 | [CAP Theorem](https://en.wikipedia.org/wiki/CAP_theorem) 2 | ==== 3 | CAP theorem states that it is impossible for a distributed software system to simultaneously provide more than two out of three of the following guarantees (CAP): Consistency, Availability, and Partition tolerance. When we design a distributed system, trading off among CAP is almost the first thing we want to consider. CAP theorem says while designing a distributed system we can pick only two of the following three options: 4 | 5 | **Consistency**: All nodes see the same data at the same time. Consistency is achieved by updating several nodes before allowing further reads. 6 | 7 | **Availability**: Every request gets a response on success/failure. Availability is achieved by replicating the data across different servers. 8 | 9 | **Partition tolerance**: The system continues to work despite message loss or partial failure. A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages. 10 | 11 | ![](../img/basics/cap-theorem.png) 12 | 13 | We cannot build a general data store that is continually available, sequentially consistent, and tolerant to any partition failures. We can only build a system that has any two of these three properties. Because, to be consistent, all nodes should see the same set of updates in the same order. But if the network suffers a partition, updates in one partition might not make it to the other partitions before a client reads from the out-of-date partition after having read from the up-to-date one. The only thing that can be done to cope with this possibility is to stop serving requests from the out-of-date partition, but then the service is no longer 100% available. 14 | 15 | - CAP theorem implies that in the presence of a network partition, one has to choose between consistency and availability 16 | - CAP is frequently misunderstood as if one has to choose to abandon one of the three guarantees at all times. In fact, the choice is really between consistency and availability only when a network partition or failure happens; at all other times, no trade-off has to be made. 17 | - [ACID](https://en.wikipedia.org/wiki/ACID) databases choose consistency over availability. 18 | - [BASE](https://en.wikipedia.org/wiki/Eventual_consistency) systems choose availability over consistency. 19 | -------------------------------------------------------------------------------- /basics/client-server-communication.md: -------------------------------------------------------------------------------- 1 | Client-Server Communication 2 | ==== 3 | # Long-Polling vs WebSockets vs Server-Sent Events 4 | What is the difference between Long-Polling, WebSockets, and Server-Sent Events? 5 | 6 | Long-Polling, WebSockets, and Server-Sent Events are popular communication protocols between a client like a web browser and a web server. First, let’s start with understanding what a standard HTTP web request looks like. Following are a sequence of events for regular HTTP request: 7 | 8 | 1. The client opens a connection and requests data from the server. 9 | 2. The server calculates the response. 10 | 3. The server sends the response back to the client on the opened request. 11 | 12 | ![](../img/basics/http-request.svg) 13 | 14 | ## Ajax Polling 15 | Polling is a standard technique used by the vast majority of AJAX applications. The basic idea is that the client repeatedly polls (or requests) a server for data. The client makes a request and waits for the server to respond with data. If no data is available, an empty response is returned. 16 | 17 | 1. The client opens a connection and requests data from the server using regular HTTP. 18 | 2. The requested webpage sends requests to the server at regular intervals (e.g., 0.5 seconds). 19 | 3. The server calculates the response and sends it back, just like regular HTTP traffic. 20 | 4. The client repeats the above three steps periodically to get updates from the server. 21 | 22 | The problem with Polling is that the client has to keep asking the server for any new data. As a result, a lot of responses are empty, creating HTTP overhead. 23 | 24 | ![](../img/basics/ajax-polling.svg) 25 | 26 | ## HTTP Long-Polling 27 | This is a variation of the traditional polling technique that allows the server to push information to a client whenever the data is available. With Long-Polling, the client requests information from the server exactly as in normal polling, but with the expectation that the server may not respond immediately. That’s why this technique is sometimes referred to as a “Hanging GET”. 28 | 29 | * If the server does not have any data available for the client, instead of sending an empty response, the server holds the request and waits until some data becomes available. 30 | * Once the data becomes available, a full response is sent to the client. The client then immediately re-request information from the server so that the server will almost always have an available waiting request that it can use to deliver data in response to an event. 31 | 32 | The basic life cycle of an application using HTTP Long-Polling is as follows: 33 | 34 | 1. The client makes an initial request using regular HTTP and then waits for a response. 35 | 2. The server delays its response until an update is available or a timeout has occurred. 36 | 3. When an update is available, the server sends a full response to the client. 37 | 4. The client typically sends a new long-poll request, either immediately upon receiving a response or after a pause to allow an acceptable latency period. 38 | 5. Each Long-Poll request has a timeout. The client has to reconnect periodically after the connection is closed due to timeouts. 39 | 40 | ![](../img/basics/long-polling.svg) 41 | 42 | ## WebSockets 43 | WebSocket provides [Full duplex](https://en.wikipedia.org/wiki/Duplex_(telecommunications)#Full_duplex) communication channels over a single TCP connection. It provides a persistent connection between a client and a server that both parties can use to start sending data at any time. The client establishes a WebSocket connection through a process known as the WebSocket handshake. If the process succeeds, then the server and client can exchange data in both directions at any time. The WebSocket protocol enables communication between a client and a server with lower overheads, facilitating real-time data transfer from and to the server. This is made possible by providing a standardized way for the server to send content to the browser without being asked by the client and allowing for messages to be passed back and forth while keeping the connection open. In this way, a two-way (bi-directional) ongoing conversation can take place between a client and a server. 44 | 45 | ![](../img/basics/web-sockets.svg) 46 | 47 | ## Server-Sent Events (SSEs) 48 | Under SSEs the client establishes a persistent and long-term connection with the server. The server uses this connection to send data to a client. If the client wants to send data to the server, it would require the use of another technology/protocol to do so. 49 | 50 | 1. Client requests data from a server using regular HTTP. 51 | 2. The requested webpage opens a connection to the server. 52 | 3. The server sends the data to the client whenever there’s new information available. 53 | 54 | SSEs are best when we need real-time traffic from the server to the client or if the server is generating data in a loop and will be sending multiple events to the client. 55 | 56 | ![](../img/basics/server-sent-events.svg) 57 | 58 | # Summary 59 | ## Standard HTTP Web Request 60 | 1. Client opens a connection and requests data from server. 61 | 2. Server calculates the response. 62 | 3. Server sends the response back to the client on the opened request. 63 | 64 | ## Ajax Polling 65 | The client repeatedly polls (or requests) a server for data, and waits for the server to respond with data. If no data is available, an empty response is returned. 66 | 67 | 1. Client opens a connection and requests data from the server using regular HTTP. 68 | 2. The requested webpage sends requests to the server at regular intervals (e.g., 0.5 seconds). 69 | 3. The server calculates the response and sends it back, like regular HTTP traffic. 70 | 4. Client repeats the above three steps periodically to get updates from the server. 71 | 72 | Problems 73 | - Client has to keep asking the server for any new data. 74 | - A lot of responses are empty, creating HTTP overhead. 75 | 76 | ## HTTP Long-Polling 77 | The client requests information from the server exactly as in normal polling, but with the expectation that the server may not respond immediately. 78 | 79 | 1. The client makes an initial request using regular HTTP and then waits for a response. 80 | 2. The server delays its response until an update is available, or until a timeout has occurred. 81 | 3. When an update is available, the server sends a full response to the client. 82 | 4. The client typically sends a new long-poll request, either immediately upon receiving a response or after a pause to allow an acceptable latency period. 83 | 84 | Each Long-Poll request has a timeout. The client has to reconnect periodically after the connection is closed, due to timeouts. 85 | 86 | ## WebSockets 87 | - A persistent full duplex communication channels over a single TCP connection. Both server and client can send data at any time. 88 | - A connection is established through WebSocket handshake. 89 | - Low communication overhead. 90 | - Real-time data transfer. 91 | 92 | ## Server-Sent Event (SSE) 93 | 1. Client requests data from a server using regular HTTP. 94 | 2. The requested webpage opens a connection to the server. 95 | 3. Server sends the data to the client whenever there’s new information available. 96 | 97 | - Use case: 98 | - When real-time traffic from server to client is needed. 99 | - When server generates data in a loop and sends multiple events to client. 100 | -------------------------------------------------------------------------------- /basics/consistent-hashing.md: -------------------------------------------------------------------------------- 1 | Consistent Hashing 2 | ==== 3 | Distributed Hash Table (DHT) is one of the fundamental components used in distributed scalable systems. Hash Tables need a key, a value, and a hash function where hash function maps the key to a location where the value is stored. 4 | 5 | ###
index = hash_function(key)
6 | 7 | Suppose we are designing a distributed caching system. Given ‘n’ cache servers, an intuitive hash function would be ‘key % n’. It is simple and commonly used. But it has two major drawbacks: 8 | 9 | 1. It is NOT horizontally scalable. Whenever a new cache host is added to the system, all existing mappings are broken. It will be a pain point in maintenance if the caching system contains lots of data. Practically, it becomes difficult to schedule a downtime to update all caching mappings. 10 | 2. It may NOT be load balanced, especially for non-uniformly distributed data. In practice, it can be easily assumed that the data will not be distributed uniformly. For the caching system, it translates into some caches becoming hot and saturated while the others idle and are almost empty. 11 | 12 | In those situations, consistent hashing is a good way to improve the caching system. 13 | 14 | ## What is Consistent Hashing? 15 | Consistent hashing is a very useful strategy for distributed caching system and DHTs. It allows us to distribute data across a cluster in such a way that will minimize reorganization when nodes are added or removed. Hence, the caching system will be easier to scale up or scale down. 16 | 17 | In Consistent Hashing, when the hash table is resized (e.g. a new cache host is added to the system), only ‘k/n’ keys need to be remapped where ‘k’ is the total number of keys and ‘n’ is the total number of servers. Recall that in a caching system using the ‘mod’ as the hash function, all keys need to be remapped. 18 | 19 | In Consistent Hashing, objects are mapped to the same host if possible. When a host is removed from the system, the objects on that host are shared by other hosts; when a new host is added, it takes its share from a few hosts without touching other’s shares. 20 | 21 | ## How does it work? 22 | As a typical hash function, consistent hashing maps a key to an integer. Suppose the output of the hash function is in the range of [0, 256). Imagine that the integers in the range are placed on a ring such that the values are wrapped around. 23 | 24 | Here’s how consistent hashing works: 25 | 26 | 1. Given a list of cache servers, hash them to integers in the range. 27 | 2. To map a key to a server, 28 | - Hash it to a single integer. 29 | - Move clockwise on the ring until finding the first cache it encounters. 30 | - That cache is the one that contains the key. See animation below as an example: key1 maps to cache A; key2 maps to cache C. 31 | 32 |
33 | Phase 1 34 | 35 | ![](../img/basics/consistent-hashing-1.png) 36 | 37 |
38 |
39 | Phase 2 40 | 41 | ![](../img/basics/consistent-hashing-2.png) 42 | 43 |
44 |
45 | Phase 3 46 | 47 | ![](../img/basics/consistent-hashing-3.png) 48 | 49 |
50 |
51 | Phase 4 52 | 53 | ![](../img/basics/consistent-hashing-4.png) 54 | 55 |
56 |
57 | Phase 5 58 | 59 | ![](../img/basics/consistent-hashing-5.png) 60 | 61 |
62 | 63 | To add a new server, say D, keys that were originally residing at C will be split. Some of them will be shifted to D, while other keys will not be touched. 64 | 65 | To remove a cache or, if a cache fails, say A, all keys that were originally mapped to A will fall into B, and only those keys need to be moved to B; other keys will not be affected. 66 | 67 | For load balancing, as we discussed in the beginning, the real data is essentially randomly distributed and thus may not be uniform. It may make the keys on caches unbalanced. 68 | 69 | To handle this issue, we add “virtual replicas” for caches. Instead of mapping each cache to a single point on the ring, we map it to multiple points on the ring, i.e. replicas. This way, each cache is associated with multiple portions of the ring. 70 | 71 | If the hash function “mixes well,” as the number of replicas increases, the keys will be more balanced. 72 | 73 | # Summary 74 | ## Simple hashing 75 | Problems of simple hashing function `key % n` (`n` is the number of servers): 76 | - It is not horizontally scalable. Whenever a new cache host is added to the system, all existing mappings are broken. 77 | - It may not be load balanced, especially for non-uniformly distributed data. Some servers will become hot spots. 78 | 79 | ## Consistent Hashing 80 | - Consistent hashing maps a key to an integer. 81 | - Imagine that the integers in the range are placed on a ring such that the values are wrapped around. 82 | - Given a list of servers, hash them to integers in the range. 83 | - To map a key to a server: 84 | - Hash it to a single integer. 85 | - Move clockwise on the ring until finding the first cache it encounters. 86 | - When the hash table is resized (a server is added or deleted), only `k/n` keys need to be remapped (`k` is the total number of keys, and `n` is the total number of servers). 87 | - To handle hot spots, add “virtual replicas” for caches. 88 | - Instead of mapping each cache to a single point on the ring, map it to multiple points on the ring (replicas). This way, each cache is associated with multiple portions of the ring. 89 | - If the hash function is “mixes well,” as the number of replicas increases, the keys will be more balanced. 90 | -------------------------------------------------------------------------------- /basics/indexes.md: -------------------------------------------------------------------------------- 1 | Indexes 2 | ==== 3 | Indexes are well known when it comes to databases. Sooner or later there comes a time when database performance is no longer satisfactory. One of the very first things you should turn to when that happens is database indexing. 4 | 5 | The goal of creating an index on a particular table in a database is to make it faster to search through the table and find the row or rows that we want. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records. 6 | 7 | ## Example: A library catalog 8 | A library catalog is a register that contains the list of books found in a library. The catalog is organized like a database table generally with four columns: book title, writer, subject, and date of publication. There are usually two such catalogs: one sorted by the book title and one sorted by the writer name. That way, you can either think of a writer you want to read and then look through their books or look up a specific book title you know you want to read in case you don’t know the writer’s name. These catalogs are like indexes for the database of books. They provide a sorted list of data that is easily searchable by relevant information. 9 | 10 | Simply saying, an index is a data structure that can be perceived as a table of contents that points us to the location where actual data lives. So when we create an index on a column of a table, we store that column and a pointer to the whole row in the index. Let’s assume a table containing a list of books, the following diagram shows how an index on the ‘Title’ column looks like: 11 | 12 | ![](../img/basics/index-table.svg) 13 | 14 | Just like a traditional relational data store, we can also apply this concept to larger datasets. The trick with indexes is that we must carefully consider how users will access the data. In the case of data sets that are many terabytes in size, but have very small payloads (e.g., 1 KB), indexes are a necessity for optimizing data access. Finding a small payload in such a large dataset can be a real challenge, since we can’t possibly iterate over that much data in any reasonable time. Furthermore, it is very likely that such a large data set is spread over several physical devices—this means we need some way to find the correct physical location of the desired data. Indexes are the best way to do this. 15 | 16 | ## How do Indexes decrease write performance? 17 | An index can dramatically speed up data retrieval but may itself be large due to the additional keys, which slow down data insertion & update. 18 | 19 | When adding rows or making updates to existing rows for a table with an active index, we not only have to write the data but also have to update the index. This will decrease the write performance. This performance degradation applies to all insert, update, and delete operations for the table. For this reason, adding unnecessary indexes on tables should be avoided and indexes that are no longer used should be removed. To reiterate, adding indexes is about improving the performance of search queries. If the goal of the database is to provide a data store that is often written to and rarely read from, in that case, decreasing the performance of the more common operation, which is writing, is probably not worth the increase in performance we get from reading. 20 | 21 | For more details, see [Database Indexes](https://en.wikipedia.org/wiki/Database_index). 22 | -------------------------------------------------------------------------------- /basics/key-characteristics.md: -------------------------------------------------------------------------------- 1 | # Key Characteristics of Distributed Systems 2 | Key characteristics of a distributed system include: 3 | * Scalability 4 | * Reliability 5 | * Availability 6 | * Efficiency 7 | * Manageability 8 | 9 | Let’s briefly review them. 10 | 11 | ## Scalability 12 |
13 | Click to expand summary. 14 | 15 | - The capability of a system to grow and manage increased demand. 16 | - A system that can continuously evolve to support growing amount of work is scalable. 17 | - Horizontal scaling: by adding more servers into the pool of resources. 18 | - Vertical scaling: by adding more resource (CPU, RAM, storage, etc) to an existing server. This approach comes with downtime and an upper limit. 19 | 20 |
21 | 22 | Scalability is the capability of a system, process, or a network to grow and manage increased demand. Any distributed system that can continuously evolve in order to support the growing amount of work is considered to be scalable. 23 | 24 | A system may have to scale because of many reasons like increased data volume or increased amount of work, e.g., number of transactions. A scalable system would like to achieve this scaling without performance loss. 25 | 26 | Generally, the performance of a system, although designed (or claimed) to be scalable, declines with the system size due to the management or environment cost. For instance, network speed may become slower because machines tend to be far apart from one another. More generally, some tasks may not be distributed, either because of their inherent atomic nature or because of some flaw in the system design. At some point, such tasks would limit the speed-up obtained by distribution. A scalable architecture avoids this situation and attempts to balance the load on all the participating nodes evenly. 27 | 28 | **Horizontal vs. Vertical Scaling**: Horizontal scaling means that you scale by adding more servers into your pool of resources whereas Vertical scaling means that you scale by adding more power (CPU, RAM, Storage, etc.) to an existing server. 29 | 30 | With horizontal-scaling it is often easier to scale dynamically by adding more machines into the existing pool; Vertical-scaling is usually limited to the capacity of a single server and scaling beyond that capacity often involves downtime and comes with an upper limit. 31 | 32 | Good examples of horizontal scaling are [Cassandra](https://en.wikipedia.org/wiki/Apache_Cassandra) and [MongoDB](https://en.wikipedia.org/wiki/MongoDB) as they both provide an easy way to scale horizontally by adding more machines to meet growing needs. Similarly, a good example of vertical scaling is MySQL as it allows for an easy way to scale vertically by switching from smaller to bigger machines. However, this process often involves downtime. 33 | 34 | ![](../img/basics/vertical-horizontal-scaling.svg) 35 | 36 | ## Reliability 37 |
38 | Click to expand summary. 39 | 40 | - Reliability is the probability that a system will fail in a given period. 41 | - A distributed system is reliable if it keeps delivering its service even when one or multiple components fail. 42 | - Reliability is achieved through redundancy of components and data (remove every single point of failure). 43 | 44 |
45 | 46 | By definition, reliability is the probability a system will fail in a given period. In simple terms, a distributed system is considered reliable if it keeps delivering its services even when one or several of its software or hardware components fail. Reliability represents one of the main characteristics of any distributed system, since in such systems any failing machine can always be replaced by another healthy one, ensuring the completion of the requested task. 47 | 48 | Take the example of a large electronic commerce store (like [Amazon](https://en.wikipedia.org/wiki/Amazon_(company))), where one of the primary requirement is that any user transaction should never be canceled due to a failure of the machine that is running that transaction. For instance, if a user has added an item to their shopping cart, the system is expected not to lose it. A reliable distributed system achieves this through redundancy of both the software components and data. If the server carrying the user’s shopping cart fails, another server that has the exact replica of the shopping cart should replace it. 49 | 50 | Obviously, redundancy has a cost and a reliable system has to pay that to achieve such resilience for services by eliminating every single point of failure. 51 | 52 | ## Availability 53 |
54 | Click to expand summary. 55 | 56 | - Availability is the time a system remains operational to perform its required function in a specific period. 57 | - Measured by the percentage of time that a system remains operational under normal conditions. 58 | - A reliable system is available. 59 | - An available system is not necessarily reliable. 60 | - A system with a security hole is available when there is no security attack. 61 | 62 |
63 | 64 | By definition, availability is the time a system remains operational to perform its required function in a specific period. It is a simple measure of the percentage of time that a system, service, or a machine remains operational under normal conditions. An aircraft that can be flown for many hours a month without much downtime can be said to have a high availability. Availability takes into account maintainability, repair time, spares availability, and other logistics considerations. If an aircraft is down for maintenance, it is considered not available during that time. 65 | 66 | Reliability is availability over time considering the full range of possible real-world conditions that can occur. An aircraft that can make it through any possible weather safely is more reliable than one that has vulnerabilities to possible conditions. 67 | 68 | #### Reliability Vs. Availability 69 | If a system is reliable, it is available. However, if it is available, it is not necessarily reliable. In other words, high reliability contributes to high availability, but it is possible to achieve a high availability even with an unreliable product by minimizing repair time and ensuring that spares are always available when they are needed. Let’s take the example of an online retail store that has 99.99% availability for the first two years after its launch. However, the system was launched without any information security testing. The customers are happy with the system, but they don’t realize that it isn’t very reliable as it is vulnerable to likely risks. In the third year, the system experiences a series of information security incidents that suddenly result in extremely low availability for extended periods of time. This results in reputational and financial damage to the customers. 70 | 71 | ## Efficiency 72 |
73 | Click to expand summary. 74 | 75 | - Latency: response time, the delay to obtain the first piece of data. 76 | - Bandwidth: throughput, amount of data delivered in a given time. 77 | 78 |
79 | 80 | To understand how to measure the efficiency of a distributed system, let’s assume we have an operation that runs in a distributed manner and delivers a set of items as result. Two standard measures of its efficiency are the response time (or latency) that denotes the delay to obtain the first item and the throughput (or bandwidth) which denotes the number of items delivered in a given time unit (e.g., a second). The two measures correspond to the following unit costs: 81 | 82 | Number of messages globally sent by the nodes of the system regardless of the message size. 83 | Size of messages representing the volume of data exchanges. 84 | The complexity of operations supported by distributed data structures (e.g., searching for a specific key in a distributed index) can be characterized as a function of one of these cost units. Generally speaking, the analysis of a distributed structure in terms of ‘number of messages’ is over-simplistic. It ignores the impact of many aspects, including the network topology, the network load, and its variation, the possible heterogeneity of the software and hardware components involved in data processing and routing, etc. However, it is quite difficult to develop a precise cost model that would accurately take into account all these performance factors; therefore, we have to live with rough but robust estimates of the system behavior. 85 | 86 | ## Serviceability / Manageability 87 |
88 | Click to expand summary. 89 | 90 | - Easiness to operate and maintain the system. 91 | - Simplicity and spend with which a system can be repaired or maintained. 92 | 93 |
94 | 95 | Another important consideration while designing a distributed system is how easy it is to operate and maintain. Serviceability or manageability is the simplicity and speed with which a system can be repaired or maintained; if the time to fix a failed system increases, then availability will decrease. Things to consider for manageability are the ease of diagnosing and understanding problems when they occur, ease of making updates or modifications, and how simple the system is to operate (i.e., does it routinely operate without failure or exceptions?). 96 | 97 | Early detection of faults can decrease or avoid system downtime. For example, some enterprise systems can automatically call a service center (without human intervention) when the system experiences a system fault. 98 | -------------------------------------------------------------------------------- /basics/load-balancing.md: -------------------------------------------------------------------------------- 1 | Load Balancing (LB) 2 | ==== 3 | 4 | Help scale horizontally across an ever-increasing number of servers. 5 | 6 | Load Balancer (LB) is another critical component of any distributed system. It helps to spread the traffic across a cluster of servers to improve responsiveness and availability of applications, websites or databases. LB also keeps track of the status of all the resources while distributing requests. If a server is not available to take new requests or is not responding or has elevated error rate, LB will stop sending traffic to such a server. 7 | 8 | Typically a load balancer sits between the client and the server accepting incoming network and application traffic and distributing the traffic across multiple backend servers using various algorithms. By balancing application requests across multiple servers, a load balancer reduces individual server load and prevents any one application server from becoming a single point of failure, thus improving overall application availability and responsiveness. 9 | 10 | ![](../img/basics/load-balancer-1.svg) 11 | 12 | To utilize full scalability and redundancy, we can try to balance the load at each layer of the system. We can add LBs at three places: 13 | 14 | * Between the user and the web server 15 | * Between web servers and an internal platform layer, like application servers or cache servers 16 | * Between internal platform layer and database. 17 | 18 | ![](../img/basics/load-balancer-2.png) 19 | 20 | ## Benefits of Load Balancing 21 | * Users experience faster, uninterrupted service. Users won’t have to wait for a single struggling server to finish its previous tasks. Instead, their requests are immediately passed on to a more readily available resource. 22 | * Service providers experience less downtime and higher throughput. Even a full server failure won’t affect the end user experience as the load balancer will simply route around it to a healthy server. 23 | * Load balancing makes it easier for system administrators to handle incoming requests while decreasing wait time for users. 24 | * Smart load balancers provide benefits like predictive analytics that determine traffic bottlenecks before they happen. As a result, the smart load balancer gives an organization actionable insights. These are key to automation and can help drive business decisions. 25 | * System administrators experience fewer failed or stressed components. Instead of a single device performing a lot of work, load balancing has several devices perform a little bit of work. 26 | 27 | ## Load Balancing Algorithms 28 | **How does the load balancer choose the backend server?** 29 | Load balancers consider two factors before forwarding a request to a backend server. They will first ensure that the server they choose is actually responding appropriately to requests and then use a pre-configured algorithm to select one from the set of healthy servers. We will discuss these algorithms shortly. 30 | 31 | **Health Checks** - Load balancers should only forward traffic to “healthy” backend servers. To monitor the health of a backend server, “health checks” regularly attempt to connect to backend servers to ensure that servers are listening. If a server fails a health check, it is automatically removed from the pool, and traffic will not be forwarded to it until it responds to the health checks again. 32 | 33 | There is a variety of load balancing methods, which use different algorithms for different needs. 34 | 35 | * **Least Connection Method** — This method directs traffic to the server with the fewest active connections. This approach is quite useful when there are a large number of persistent client connections which are unevenly distributed between the servers. 36 | * **Least Response Time Method** — This algorithm directs traffic to the server with the fewest active connections and the lowest average response time. 37 | * **Least Bandwidth Method** - This method selects the server that is currently serving the least amount of traffic measured in megabits per second (Mbps). 38 | * **Round Robin Method** — This method cycles through a list of servers and sends each new request to the next server. When it reaches the end of the list, it starts over at the beginning. It is most useful when the servers are of equal specification and there are not many persistent connections. 39 | * **Weighted Round Robin Method** — The weighted round-robin scheduling is designed to better handle servers with different processing capacities. Each server is assigned a weight (an integer value that indicates the processing capacity). Servers with higher weights receive new connections before those with less weights and servers with higher weights get more connections than those with less weights. 40 | * **IP Hash** — Under this method, a hash of the IP address of the client is calculated to redirect the request to a server. 41 | 42 | ## Redundant Load Balancers 43 | The load balancer can be a single point of failure; to overcome this, a second load balancer can be connected to the first to form a cluster. Each LB monitors the health of the other and, since both of them are equally capable of serving traffic and failure detection, in the event the main load balancer fails, the second load balancer takes over. 44 | 45 | ![](../img/basics/load-balancer-3.svg) 46 | 47 | Following links have some good discussion about load balancers: 48 | * [1] [What is load balancing](https://avinetworks.com/what-is-load-balancing/) 49 | * [2] [Introduction to architecting systems](https://lethain.com/introduction-to-architecting-systems-for-scale/) 50 | * [3] [Load balancing](https://en.wikipedia.org/wiki/Load_balancing_(computing)) 51 | -------------------------------------------------------------------------------- /basics/proxies.md: -------------------------------------------------------------------------------- 1 | Proxies 2 | === 3 | 4 | A proxy server is an intermediate server between the client and the back-end server. Clients connect to [proxy servers](https://en.wikipedia.org/wiki/Proxy_server) to make a request for a service like a web page, file, connection, etc. In short, a proxy server is a piece of software or hardware that acts as an intermediary for requests from clients seeking resources from other servers. 5 | 6 | Typically, proxies are used to filter requests, log requests, or sometimes transform requests (by adding/removing headers, encrypting/decrypting, or compressing a resource). Another advantage of a proxy server is that its cache can serve a lot of requests. If multiple clients access a particular resource, the proxy server can cache it and serve it to all the clients without going to the remote server. 7 | 8 | ![](../img/basics/proxy.png) 9 | 10 | ## Proxy Server Types 11 | Proxies can reside on the client’s local server or anywhere between the client and the remote servers. Here are a few famous types of proxy servers: 12 | 13 | ### Open Proxy 14 | An open proxy is a proxy server that is accessible by any Internet user. Generally, a proxy server only allows users within a network group (i.e. a closed proxy) to store and forward Internet services such as DNS or web pages to reduce and control the bandwidth used by the group. With an open proxy, however, any user on the Internet is able to use this forwarding service. There two famous open proxy types: 15 | 16 | 1. **Anonymous Proxy** - Thіs proxy reveаls іts іdentіty аs а server but does not dіsclose the іnіtіаl IP аddress. Though thіs proxy server cаn be dіscovered eаsіly іt cаn be benefіcіаl for some users аs іt hіdes their IP аddress. 17 | 2. **Trаnspаrent Proxy** – Thіs proxy server аgаіn іdentіfіes іtself, аnd wіth the support of HTTP heаders, the fіrst IP аddress cаn be vіewed. The mаіn benefіt of usіng thіs sort of server іs іts аbіlіty to cаche the websіtes. 18 | 19 | ### Reverse Proxy 20 | A [reverse proxy](https://en.wikipedia.org/wiki/Reverse_proxy) retrieves resources on behalf of a client from one or more servers. These resources are then returned to the client, appearing as if they originated from the proxy server itself. 21 | -------------------------------------------------------------------------------- /basics/queues.md: -------------------------------------------------------------------------------- 1 | Queues 2 | ==== 3 | 4 | - Queues are used to effectively manage requests in a large-scale distributed system, in which different components of the system may need to work in an asynchronous way. 5 | - It is an abstraction between the client’s request and the actual work performed to service it. 6 | - Queues are implemented on the asynchronous communication protocol. When a client submits a task to a queue they are no longer required to wait for the results. 7 | - Queue can provide protection from service outages and failures. 8 | -------------------------------------------------------------------------------- /basics/redundancy.md: -------------------------------------------------------------------------------- 1 | Redundancy 2 | ==== 3 | 4 |
5 | Click to expand summary. 6 | 7 | - Redundancy: **duplication of critical data or services** with the intention of increased reliability of the system. 8 | - Server failover 9 | - Remove single points of failure and provide backups (e.g. server failover). 10 | - Shared-nothing architecture 11 | - Each node can operate independently of one another. 12 | - No central service managing state or orchestrating activities. 13 | - New servers can be added without special conditions or knowledge. 14 | - No single point of failure. 15 | 16 |
17 | 18 | [Redundancy](https://en.wikipedia.org/wiki/Redundancy_(engineering)) is the duplication of critical components or functions of a system with the intention of increasing the reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance. For example, if there is only one copy of a file stored on a single server, then losing that server means losing the file. Since losing data is seldom a good thing, we can create duplicate or redundant copies of the file to solve this problem. 19 | 20 | Redundancy plays a key role in removing the single points of failure in the system and provides backups if needed in a crisis. For example, if we have two instances of a service running in production and one fails, the system can failover to the other one. 21 | 22 | ![](../img/basics/RedundancyAndReplication.png) 23 | 24 | [Replication](https://en.wikipedia.org/wiki/Replication_(computing)) means sharing information to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, [fault-tolerance](https://en.wikipedia.org/wiki/Fault_tolerance), or accessibility. 25 | 26 | Replication is widely used in many database management systems (DBMS), usually with a master-slave relationship between the original and the copies. The master gets all the updates, which then ripple through to the slaves. Each slave outputs a message stating that it has received the update successfully, thus allowing the sending of subsequent updates. 27 | -------------------------------------------------------------------------------- /basics/sharding.md: -------------------------------------------------------------------------------- 1 | Sharding / Data Partitioning 2 | ==== 3 | 4 | Data partitioning is a technique to break up a big database (DB) into many smaller parts. It is the process of splitting up a DB/table across multiple machines to improve the manageability, performance, availability, and load balancing of an application. The justification for data partitioning is that, after a certain scale point, it is cheaper and more feasible to scale horizontally by adding more machines than to grow it vertically by adding beefier servers. 5 | 6 | ## Partitioning Methods 7 | There are many different schemes one could use to decide how to break up an application database into multiple smaller DBs. Below are three of the most popular schemes used by various large scale applications. 8 | 9 | **Horizontal partitioning**: In this scheme, we put different rows into different tables. For example, if we are storing different places in a table, we can decide that locations with ZIP codes less than 10000 are stored in one table and places with ZIP codes greater than 10000 are stored in a separate table. This is also called a range based partitioning as we are storing different ranges of data in separate tables. Horizontal partitioning is also called as Data Sharding. 10 | 11 | The key problem with this approach is that if the value whose range is used for partitioning isn’t chosen carefully, then the partitioning scheme will lead to unbalanced servers. In the previous example, splitting location based on their zip codes assumes that places will be evenly distributed across the different zip codes. This assumption is not valid as there will be a lot of places in a thickly populated area like Manhattan as compared to its suburb cities. 12 | 13 | **Vertical Partitioning**: In this scheme, we divide our data to store tables related to a specific feature in their own server. For example, if we are building Instagram like application - where we need to store data related to users, photos they upload, and people they follow - we can decide to place user profile information on one DB server, friend lists on another, and photos on a third server. 14 | 15 | Vertical partitioning is straightforward to implement and has a low impact on the application. The main problem with this approach is that if our application experiences additional growth, then it may be necessary to further partition a feature specific DB across various servers (e.g. it would not be possible for a single server to handle all the metadata queries for 10 billion photos by 140 million users). 16 | 17 | **Directory Based Partitioning**: A loosely coupled approach to work around issues mentioned in the above schemes is to create a lookup service which knows your current partitioning scheme and abstracts it away from the DB access code. So, to find out where a particular data entity resides, we query the directory server that holds the mapping between each tuple key to its DB server. This loosely coupled approach means we can perform tasks like adding servers to the DB pool or changing our partitioning scheme without having an impact on the application. 18 | 19 | ## Partitioning Criteria 20 | **Key or Hash-based partitioning**: Under this scheme, we apply a hash function to some key attributes of the entity we are storing; that yields the partition number. For example, if we have 100 DB servers and our ID is a numeric value that gets incremented by one each time a new record is inserted. In this example, the hash function could be ‘ID % 100’, which will give us the server number where we can store/read that record. This approach should ensure a uniform allocation of data among servers. The fundamental problem with this approach is that it effectively fixes the total number of DB servers, since adding new servers means changing the hash function which would require redistribution of data and downtime for the service. A workaround for this problem is to use [Consistent Hashing](consistent-hashing.md). 21 | 22 | **List partitioning**: In this scheme, each partition is assigned a list of values, so whenever we want to insert a new record, we will see which partition contains our key and then store it there. For example, we can decide all users living in Iceland, Norway, Sweden, Finland, or Denmark will be stored in a partition for the Nordic countries. 23 | 24 | **Round-robin partitioning**: This is a very simple strategy that ensures uniform data distribution. With ‘n’ partitions, the ‘i’ tuple is assigned to partition (i mod n). 25 | 26 | **Composite partitioning**: Under this scheme, we combine any of the above partitioning schemes to devise a new scheme. For example, first applying a list partitioning scheme and then a hash based partitioning. Consistent hashing could be considered a composite of hash and list partitioning where the hash reduces the key space to a size that can be listed. 27 | 28 | ## Common Problems of Data Partitioning 29 | On a partitioned database, there are certain extra constraints on the different operations that can be performed. Most of these constraints are due to the fact that operations across multiple tables or multiple rows in the same table will no longer run on the same server. Below are some of the constraints and additional complexities introduced by partitioning: 30 | 31 | **Joins and Denormalization**: Performing joins on a database which is running on one server is straightforward, but once a database is partitioned and spread across multiple machines it is often not feasible to perform joins that span database partitions. Such joins will not be performance efficient since data has to be compiled from multiple servers. A common workaround for this problem is to denormalize the database so that queries that previously required joins can be performed from a single table. Of course, the service now has to deal with all the perils of denormalization such as data inconsistency. 32 | 33 | **Referential Integrity**: As we saw that performing a cross-partition query on a partitioned database is not feasible, similarly, trying to enforce data integrity constraints such as foreign keys in a partitioned database can be extremely difficult. 34 | 35 | Most of RDBMS do not support foreign keys constraints across databases on different database servers. Which means that applications that require referential integrity on partitioned databases often have to enforce it in application code. Often in such cases, applications have to run regular SQL jobs to clean up dangling references. 36 | 37 | **Rebalancing**: There could be many reasons we have to change our partitioning scheme: 38 | 39 | 1. The data distribution is not uniform, e.g., there are a lot of places for a particular ZIP code that cannot fit into one database partition. 40 | 2. There is a lot of load on a partition, e.g., there are too many requests being handled by the DB partition dedicated to user photos. 41 | 42 | In such cases, either we have to create more DB partitions or have to rebalance existing partitions, which means the partitioning scheme changed and all existing data moved to new locations. Doing this without incurring downtime is extremely difficult. Using a scheme like directory based partitioning does make rebalancing a more palatable experience at the cost of increasing the complexity of the system and creating a new single point of failure (i.e. the lookup service/database). 43 | -------------------------------------------------------------------------------- /basics/sql-vs-nosql.md: -------------------------------------------------------------------------------- 1 | SQL vs. NoSQL 2 | ==== 3 | 4 | In the world of databases, there are two main types of solutions: SQL and NoSQL (or relational databases and non-relational databases). Both of them differ in the way they were built, the kind of information they store, and the storage method they use. 5 | 6 | Relational databases are structured and have predefined schemas like phone books that store phone numbers and addresses. Non-relational databases are unstructured, distributed, and have a dynamic schema like file folders that hold everything from a person’s address and phone number to their Facebook ‘likes’ and online shopping preferences. 7 | 8 | ## SQL 9 | Relational databases store data in rows and columns. Each row contains all the information about one entity and each column contains all the separate data points. Some of the most popular relational databases are MySQL, Oracle, MS SQL Server, SQLite, Postgres, and MariaDB. 10 | 11 | ## NoSQL 12 | Following are the most common types of NoSQL: 13 | 14 | **Key-Value Stores**: Data is stored in an array of key-value pairs. The ‘key’ is an attribute name which is linked to a ‘value’. Well-known key-value stores include Redis, Voldemort, and Dynamo. 15 | 16 | **Document Databases**: In these databases, data is stored in documents (instead of rows and columns in a table) and these documents are grouped together in collections. Each document can have an entirely different structure. Document databases include the CouchDB and MongoDB. 17 | 18 | **Wide-Column Databases**: Instead of ‘tables,’ in columnar databases we have column families, which are containers for rows. Unlike relational databases, we don’t need to know all the columns up front and each row doesn’t have to have the same number of columns. Columnar databases are best suited for analyzing large datasets - big names include Cassandra and HBase. 19 | 20 | **Graph Databases**: These databases are used to store data whose relations are best represented in a graph. Data is saved in graph structures with nodes (entities), properties (information about the entities), and lines (connections between the entities). Examples of graph database include Neo4J and InfiniteGraph. 21 | 22 | ## High level differences between SQL and NoSQL 23 | **Storages**: SQL stores data in tables where each row represents an entity and each column represents a data point about that entity; for example, if we are storing a car entity in a table, different columns could be ‘Color’, ‘Make’, ‘Model’, and so on. 24 | 25 | NoSQL databases have different data storage models. The main ones are key-value, document, graph, and columnar. We will discuss differences between these databases below. 26 | 27 | **Schema**: In SQL, each record conforms to a fixed schema, meaning the columns must be decided and chosen before data entry and each row must have data for each column. The schema can be altered later, but it involves modifying the whole database and going offline. 28 | 29 | In NoSQL, schemas are dynamic. Columns can be added on the fly and each ‘row’ (or equivalent) doesn’t have to contain data for each ‘column.’ 30 | 31 | **Querying**: SQL databases use SQL (structured query language) for defining and manipulating the data, which is very powerful. In a NoSQL database, queries are focused on a collection of documents. Sometimes it is also called UnQL (Unstructured Query Language). Different databases have different syntax for using UnQL. 32 | 33 | **Scalability**: In most common situations, SQL databases are vertically scalable, i.e., by increasing the horsepower (higher Memory, CPU, etc.) of the hardware, which can get very expensive. It is possible to scale a relational database across multiple servers, but this is a challenging and time-consuming process. 34 | 35 | On the other hand, NoSQL databases are horizontally scalable, meaning we can add more servers easily in our NoSQL database infrastructure to handle a lot of traffic. Any cheap commodity hardware or cloud instances can host NoSQL databases, thus making it a lot more cost-effective than vertical scaling. A lot of NoSQL technologies also distribute data across servers automatically. 36 | 37 | **Reliability or ACID Compliancy (Atomicity, Consistency, Isolation, Durability)**: The vast majority of relational databases are ACID compliant. So, when it comes to data reliability and safe guarantee of performing transactions, SQL databases are still the better bet. 38 | 39 | Most of the NoSQL solutions sacrifice ACID compliance for performance and scalability. 40 | 41 | ## SQL VS. NoSQL - Which one to use? 42 | When it comes to database technology, there’s no one-size-fits-all solution. That’s why many businesses rely on both relational and non-relational databases for different needs. Even as NoSQL databases are gaining popularity for their speed and scalability, there are still situations where a highly structured SQL database may perform better; choosing the right technology hinges on the use case. 43 | 44 | ### Reasons to use SQL database 45 | Here are a few reasons to choose a SQL database: 46 | 47 | 1. We need to ensure ACID compliance. ACID compliance reduces anomalies and protects the integrity of your database by prescribing exactly how transactions interact with the database. Generally, NoSQL databases sacrifice ACID compliance for scalability and processing speed, but for many e-commerce and financial applications, an ACID-compliant database remains the preferred option. 48 | 2. Your data is structured and unchanging. If your business is not experiencing massive growth that would require more servers and if you’re only working with data that is consistent, then there may be no reason to use a system designed to support a variety of data types and high traffic volume. 49 | 50 | ### Reasons to use NoSQL database 51 | When all the other components of our application are fast and seamless, NoSQL databases prevent data from being the bottleneck. Big data is contributing to a large success for NoSQL databases, mainly because it handles data differently than the traditional relational databases. A few popular examples of NoSQL databases are MongoDB, CouchDB, Cassandra, and HBase. 52 | 53 | 1. Storing large volumes of data that often have little to no structure. A NoSQL database sets no limits on the types of data we can store together and allows us to add new types as the need changes. With document-based databases, you can store data in one place without having to define what “types” of data those are in advance. 54 | 2. Making the most of cloud computing and storage. Cloud-based storage is an excellent cost-saving solution but requires data to be easily spread across multiple servers to scale up. Using commodity (affordable, smaller) hardware on-site or in the cloud saves you the hassle of additional software and NoSQL databases like Cassandra are designed to be scaled across multiple data centers out of the box, without a lot of headaches. 55 | 3. Rapid development. NoSQL is extremely useful for rapid development as it doesn’t need to be prepped ahead of time. If you’re working on quick iterations of your system which require making frequent updates to the data structure without a lot of downtime between versions, a relational database will slow you down. 56 | 57 | # Summary 58 | ## Common types of NoSQL 59 | ### Key-value stores 60 | - Array of key-value pairs. The "key" is an attribute name. 61 | - Redis, Vodemort, Dynamo. 62 | 63 | ### Document databases 64 | - Data is stored in documents. 65 | - Documents are grouped in collections. 66 | - Each document can have an entirely different structure. 67 | - CouchDB, MongoDB. 68 | 69 | ### Wide-column / columnar databases 70 | - Column families - containers for rows. 71 | - No need to know all the columns up front. 72 | - Each row can have different number of columns. 73 | - Cassandra, HBase. 74 | 75 | ### Graph database 76 | - Data is stored in graph structures 77 | - Nodes: entities 78 | - Properties: information about the entities 79 | - Lines: connections between the entities 80 | - Neo4J, InfiniteGraph 81 | 82 | ## Differences between SQL and NoSQL 83 | ### Storage 84 | - SQL: store data in tables. 85 | - NoSQL: have different data storage models. 86 | 87 | ### Schema 88 | - SQL 89 | - Each record conforms to a fixed schema. 90 | - Schema can be altered, but it requires modifying the whole database. 91 | - NoSQL: 92 | - Schemas are dynamic. 93 | 94 | ### Querying 95 | - SQL 96 | - Use SQL (structured query language) for defining and manipulating the data. 97 | - NoSQL 98 | - Queries are focused on a collection of documents. 99 | - UnQL (unstructured query language). 100 | - Different databases have different syntax. 101 | 102 | ### Scalability 103 | - SQL 104 | - Vertically scalable (by increasing the horsepower: memory, CPU, etc) and expensive. 105 | - Horizontally scalable (across multiple servers); but it can be challenging and time-consuming. 106 | - NoSQL 107 | - Horizontablly scalable (by adding more servers) and cheap. 108 | 109 | ### ACID 110 | - Atomicity, consistency, isolation, durability 111 | - SQL 112 | - ACID compliant 113 | - Data reliability 114 | - Gurantee of transactions 115 | - NoSQL 116 | - Most sacrifice ACID compliance for performance and scalability. 117 | 118 | ## Which one to use? 119 | ### SQL 120 | - Ensure ACID compliance. 121 | - Reduce anomalies. 122 | - Protect database integrity. 123 | - Data is structured and unchanging. 124 | 125 | ### NoSQL 126 | - Data has little or no structure. 127 | - Make the most of cloud computing and storage. 128 | - Cloud-based storage requires data to be easily spread across multiple servers to scale up. 129 | - Rapid development. 130 | - Frequent updates to the data structure. 131 | -------------------------------------------------------------------------------- /basics/system-design-basics.md: -------------------------------------------------------------------------------- 1 | # System Design Basics 2 | 3 | Whenever we are designing a large system, we need to consider a few things: 4 | 5 | 1. What are the different architectural pieces that can be used? 6 | 2. How do these pieces work with each other? 7 | 3. How can we best utilize these pieces: what are the right tradeoffs? 8 | 9 | Investing in scaling before it is needed is generally not a smart business proposition; however, some forethought into the design can save valuable time and resources in the future. In the following chapters, we will try to define some of the core building blocks of scalable systems. Familiarizing these concepts would greatly benefit in understanding distributed system concepts. In the next section, we will go through Consistent Hashing, CAP Theorem, Load Balancing, Caching, Data Partitioning, Indexes, Proxies, Queues, Replication, and choosing between SQL vs. NoSQL. 10 | 11 | Let’s start with the Key Characteristics of Distributed Systems. 12 | -------------------------------------------------------------------------------- /bin/publish-gh-pages.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | # install the plugins and build the static site 4 | gitbook install && gitbook build 5 | 6 | # checkout to the gh-pages branch 7 | git checkout gh-pages || git checkout -b gh-pages 8 | 9 | # pull the latest updates 10 | git pull origin gh-pages --rebase 11 | 12 | # copy the static site files into the current directory. 13 | cp -R _book/* . 14 | 15 | # remove 'node_modules' and '_book' directory 16 | git clean -fx node_modules 17 | git clean -fx _book 18 | 19 | # add all files 20 | git add . 21 | 22 | # commit 23 | git commit -a -m "Update docs" 24 | 25 | # push to the origin 26 | git push -u origin gh-pages 27 | 28 | # checkout to the master branch 29 | git checkout master 30 | -------------------------------------------------------------------------------- /designs/additional-resources.md: -------------------------------------------------------------------------------- 1 | Additional Resources 2 | === 3 | 4 | Here are some useful links for further reading: 5 | 6 | 1. [Dynamo](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) - Highly Available Key-value Store 7 | 8 | 2. [Kafka](http://notes.stephenholiday.com/Kafka.pdf) - A Distributed Messaging System for Log Processing 9 | 10 | 3. [Consistent Hashing](https://www.akamai.com/es/es/multimedia/documents/technical-publication/consistent-hashing-and-random-trees-distributed-caching-protocols-for-relieving-hot-spots-on-the-world-wide-web-technical-publication.pdf) - Original paper 11 | 12 | 4. [Paxos](https://www.microsoft.com/en-us/research/uploads/prod/2016/12/paxos-simple-Copy.pdf) - Protocol for distributed consensus 13 | 14 | 5. [Concurrency Controls](http://sites.fas.harvard.edu/~cs265/papers/kung-1981.pdf) - Optimistic methods for concurrency controls 15 | 16 | 6. [Gossip protocol](http://highscalability.com/blog/2011/11/14/using-gossip-protocols-for-failure-detection-monitoring-mess.html) - For failure detection and more. 17 | 18 | 7. [Chubby](http://static.googleusercontent.com/media/research.google.com/en/us/archive/chubby-osdi06.pdf) - Lock service for loosely-coupled distributed systems 19 | 20 | 8. [ZooKeeper](https://www.usenix.org/legacy/event/usenix10/tech/full_papers/Hunt.pdf) - Wait-free coordination for Internet-scale systems 21 | 22 | 9. [MapReduce](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf) - Simplified Data Processing on Large Clusters 23 | 24 | 10. [Hadoop](http://storageconference.us/2010/Papers/MSST/Shvachko.pdf) - A Distributed File System 25 | -------------------------------------------------------------------------------- /designs/facebook-newsfeed.md: -------------------------------------------------------------------------------- 1 | # Facebook Newsfeed 2 | ### Let's design Facebook's Newsfeed, which would contain posts, photos, videos, and status updates from all the people and pages a user follows. 3 | 4 | Similar Services: Twitter Newsfeed, Instagram Newsfeed, Quora Newsfeed 5 | 6 | Difficulty Level: Hard 7 | 8 | ## 1. What is Facebook’s newsfeed? 9 | A Newsfeed is the constantly updating list of stories in the middle of Facebook’s homepage. It includes status updates, photos, videos, links, app activity, and ‘likes’ from people, pages, and groups that a user follows on Facebook. In other words, it is a compilation of a complete scrollable version of your friends’ and your life story from photos, videos, locations, status updates, and other activities. 10 | 11 | For any social media site you design - Twitter, Instagram, or Facebook - you will need some newsfeed system to display updates from friends and followers. 12 | 13 | ## 2. Requirements and Goals of the System 14 | Let’s design a newsfeed for Facebook with the following requirements: 15 | 16 | ### Functional requirements: 17 | 18 | 1. Newsfeed will be generated based on the posts from the people, pages, and groups that a user follows. 19 | 2. A user may have many friends and follow a large number of pages/groups. 20 | 3. Feeds may contain images, videos, or just text. 21 | 4. Our service should support appending new posts as they arrive to the newsfeed for all active users. 22 | 23 | ### Non-functional requirements: 24 | 25 | 1. Our system should be able to generate any user’s newsfeed in real-time - maximum latency seen by the end user would be 2s. 26 | 2. A post shouldn’t take more than 5s to make it to a user’s feed assuming a new newsfeed request comes in. 27 | 28 | ## 3. Capacity Estimation and Constraints 29 | Let’s assume on average a user has 300 friends and follows 200 pages. 30 | 31 | **Traffic estimates**: Let’s assume 300M daily active users with each user fetching their timeline an average of five times a day. This will result in 1.5B newsfeed requests per day or approximately 17,500 requests per second. 32 | 33 | **Storage estimates**: On average, let’s assume we need to have around 500 posts in every user’s feed that we want to keep in memory for a quick fetch. Let’s also assume that on average each post would be 1KB in size. This would mean that we need to store roughly 500KB of data per user. To store all this data for all the active users we would need 150TB of memory. If a server can hold 100GB we would need around 1500 machines to keep the top 500 posts in memory for all active users. 34 | 35 | ## 4. System APIs 36 | ### 💡 Once we have finalized the requirements, it’s always a good idea to define the system APIs. This should explicitly state what is expected from the system. 37 | 38 | We can have SOAP or REST APIs to expose the functionality of our service. The following could be the definition of the API for getting the newsfeed: 39 | 40 | ####
getUserFeed(api_dev_key, user_id, since_id, count, max_id, exclude_replies)
41 | 42 | **Parameters**: 43 | * api_dev_key (string): The API developer key of a registered can be used to, among other things, throttle users based on their allocated quota. 44 | * user_id (number): The ID of the user for whom the system will generate the newsfeed. 45 | * since_id (number): Optional; returns results with an ID higher than (that is, more recent than) the specified ID. 46 | * count (number): Optional; specifies the number of feed items to try and retrieve up to a maximum of 200 per distinct request. 47 | * max_id (number): Optional; returns results with an ID less than (that is, older than) or equal to the specified ID. 48 | * exclude_replies(boolean): Optional; this parameter will prevent replies from appearing in the returned timeline. 49 | 50 | **Returns**: (JSON) 51 | 52 | Returns a JSON object containing a list of feed items. 53 | 54 | ## 5. Database Design 55 | There are three primary objects: User, Entity (e.g. page, group, etc.), and FeedItem (or Post). Here are some observations about the relationships between these entities: 56 | 57 | * A User can follow other entities and can become friends with other users. 58 | * Both users and entities can post FeedItems which can contain text, images, or videos. 59 | * Each FeedItem will have a UserID which will point to the User who created it. For simplicity, let’s assume that only users can create feed items, although, on Facebook Pages can post feed item too. 60 | * Each FeedItem can optionally have an EntityID pointing to the page or the group where that post was created. 61 | 62 | If we are using a relational database, we would need to model two relations: User-Entity relation and FeedItem-Media relation. Since each user can be friends with many people and follow a lot of entities, we can store this relation in a separate table. The “Type” column in “UserFollow” identifies if the entity being followed is a User or Entity. Similarly, we can have a table for FeedMedia relation. 63 | 64 | ![](../img/newsfeed-1.png) 65 | 66 | ## 6. High Level System Design 67 | At a high level this problem can be divided into two parts: 68 | 69 | **Feed generation**: Newsfeed is generated from the posts (or feed items) of users and entities (pages and groups) that a user follows. So, whenever our system receives a request to generate the feed for a user (say Jane), we will perform the following steps: 70 | 71 | 1. Retrieve IDs of all users and entities that Jane follows. 72 | 2. Retrieve latest, most popular and relevant posts for those IDs. These are the potential posts that we can show in Jane’s newsfeed. 73 | 3. Rank these posts based on the relevance to Jane. This represents Jane’s current feed. 74 | 4. Store this feed in the cache and return top posts (say 20) to be rendered on Jane’s feed. 75 | 5. On the front-end, when Jane reaches the end of her current feed, she can fetch the next 20 posts from the server and so on. 76 | 77 | One thing to notice here is that we generated the feed once and stored it in the cache. What about new incoming posts from people that Jane follows? If Jane is online, we should have a mechanism to rank and add those new posts to her feed. We can periodically (say every five minutes) perform the above steps to rank and add the newer posts to her feed. Jane can then be notified that there are newer items in her feed that she can fetch. 78 | 79 | **Feed publishing**: Whenever Jane loads her newsfeed page, she has to request and pull feed items from the server. When she reaches the end of her current feed, she can pull more data from the server. For newer items either the server can notify Jane and then she can pull, or the server can push, these new posts. We will discuss these options in detail later. 80 | 81 | At a high level, we will need following components in our Newsfeed service: 82 | 83 | 1. Web servers: To maintain a connection with the user. This connection will be used to transfer data between the user and the server. 84 | 2. Application server: To execute the workflows of storing new posts in the database servers. We will also need some application servers to retrieve and to push the newsfeed to the end user. 85 | 3. Metadata database and cache: To store the metadata about Users, Pages, and Groups. 86 | 4. Posts database and cache: To store metadata about posts and their contents. 87 | 5. Video and photo storage, and cache: Blob storage, to store all the media included in the posts. 88 | 6. Newsfeed generation service: To gather and rank all the relevant posts for a user to generate newsfeed and store in the cache. This service will also receive live updates and will add these newer feed items to any user’s timeline. 89 | 7. Feed notification service: To notify the user that there are newer items available for their newsfeed. 90 | 91 | Following is the high-level architecture diagram of our system. User B and C are following User A. 92 | 93 | ![](../img/newsfeed-2.png) 94 | 95 | ####
Facebook Newsfeed Architecture
96 | 97 | ## 7. Detailed Component Design 98 | Let’s discuss different components of our system in detail. 99 | 100 | #### a. Feed generation 101 | Let’s take the simple case of the newsfeed generation service fetching most recent posts from all the users and entities that Jane follows; the query would look like this: 102 | 103 | ```sql 104 | (SELECT FeedItemID FROM FeedItem WHERE UserID in ( 105 | SELECT EntityOrFriendID FROM UserFollow WHERE UserID = and type = 0(user)) 106 | ) 107 | UNION 108 | (SELECT FeedItemID FROM FeedItem WHERE EntityID in ( 109 | SELECT EntityOrFriendID FROM UserFollow WHERE UserID = and type = 1(entity)) 110 | ) 111 | ORDER BY CreationDate DESC 112 | LIMIT 100 113 | ``` 114 | 115 | Here are issues with this design for the feed generation service: 116 | 117 | 1. Crazy slow for users with a lot of friends/follows as we have to perform sorting/merging/ranking of a huge number of posts. 118 | 2. We generate the timeline when a user loads their page. This would be quite slow and have a high latency. 119 | 3. For live updates, each status update will result in feed updates for all followers. This could result in high backlogs in our Newsfeed Generation Service. 120 | 4. For live updates, the server pushing (or notifying about) newer posts to users could lead to very heavy loads, especially for people or pages that have a lot of followers. To improve the efficiency, we can pre-generate the timeline and store it in a memory. 121 | 122 | **Offline generation for newsfeed**: We can have dedicated servers that are continuously generating users’ newsfeed and storing them in memory. So, whenever a user requests for the new posts for their feed, we can simply serve it from the pre-generated, stored location. Using this scheme, user’s newsfeed is not compiled on load, but rather on a regular basis and returned to users whenever they request for it. 123 | 124 | Whenever these servers need to generate the feed for a user, they will first query to see what was the last time the feed was generated for that user. Then, new feed data would be generated from that time onwards. We can store this data in a hash table where the “key” would be UserID and “value” would be a STRUCT like this: 125 | 126 | ```java 127 | Struct { 128 | LinkedHashMap feedItems; 129 | DateTime lastGenerated; 130 | } 131 | ``` 132 | 133 | We can store FeedItemIDs in a data structure similar to [Linked HashMap](https://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html) or [TreeMap](https://docs.oracle.com/javase/6/docs/api/java/util/TreeMap.html), which can allow us to not only jump to any feed item but also iterate through the map easily. Whenever users want to fetch more feed items, they can send the last FeedItemID they currently see in their newsfeed, we can then jump to that FeedItemID in our hash-map and return next batch/page of feed items from there. 134 | 135 | **How many feed items should we store in memory for a user’s feed?** Initially, we can decide to store 500 feed items per user, but this number can be adjusted later based on the usage pattern. For example, if we assume that one page of a user’s feed has 20 posts and most of the users never browse more than ten pages of their feed, we can decide to store only 200 posts per user. For any user who wants to see more posts (more than what is stored in memory), we can always query backend servers. 136 | 137 | **Should we generate (and keep in memory) newsfeeds for all users?** There will be a lot of users that don’t log-in frequently. Here are a few things we can do to handle this; 1) a more straightforward approach could be, to use an LRU based cache that can remove users from memory that haven’t accessed their newsfeed for a long time 2) a smarter solution can figure out the login pattern of users to pre-generate their newsfeed, e.g., at what time of the day a user is active and which days of the week does a user access their newsfeed? etc. 138 | 139 | Let’s now discuss some solutions to our “live updates” problems in the following section. 140 | 141 | #### b. Feed publishing 142 | The process of pushing a post to all the followers is called fanout. By analogy, the push approach is called fanout-on-write, while the pull approach is called fanout-on-load. Let’s discuss different options for publishing feed data to users. 143 | 144 | 1. **"Pull" model or Fan-out-on-load**: This method involves keeping all the recent feed data in memory so that users can pull it from the server whenever they need it. Clients can pull the feed data on a regular basis or manually whenever they need it. Possible problems with this approach are a) New data might not be shown to the users until they issue a pull request, b) It’s hard to find the right pull cadence, as most of the time pull requests will result in an empty response if there is no new data, causing waste of resources. 145 | 146 | 2. **"Push" model or Fan-out-on-write**: For a push system, once a user has published a post, we can immediately push this post to all the followers. The advantage is that when fetching feed you don’t need to go through your friend’s list and get feeds for each of them. It significantly reduces read operations. To efficiently handle this, users have to maintain a [Long Poll](https://en.wikipedia.org/wiki/Push_technology#Long_polling) request with the server for receiving the updates. A possible problem with this approach is that when a user has millions of followers (a celebrity-user) the server has to push updates to a lot of people. 147 | 148 | 3. **Hybrid**: An alternate method to handle feed data could be to use a hybrid approach, i.e., to do a combination of fan-out-on-write and fan-out-on-load. Specifically, we can stop pushing posts from users with a high number of followers (a celebrity user) and only push data for those users who have a few hundred (or thousand) followers. For celebrity users, we can let the followers pull the updates. Since the push operation can be extremely costly for users who have a lot of friends or followers, by disabling fanout for them, we can save a huge number of resources. Another alternate approach could be that, once a user publishes a post, we can limit the fanout to only her online friends. Also, to get benefits from both the approaches, a combination of ‘push to notify’ and ‘pull for serving’ end-users is a great way to go. Purely a push or pull model is less versatile. 149 | 150 | **How many feed items can we return to the client in each request?** We should have a maximum limit for the number of items a user can fetch in one request (say 20). But, we should let the client specify how many feed items they want with each request as the user may like to fetch a different number of posts depending on the device (mobile vs. desktop). 151 | 152 | **Should we always notify users if there are new posts available for their newsfeed?** It could be useful for users to get notified whenever new data is available. However, on mobile devices, where data usage is relatively expensive, it can consume unnecessary bandwidth. Hence, at least for mobile devices, we can choose not to push data, instead, let users “Pull to Refresh” to get new posts. 153 | 154 | ## 8. Feed Ranking 155 | The most straightforward way to rank posts in a newsfeed is by the creation time of the posts, but today’s ranking algorithms are doing a lot more than that to ensure “important” posts are ranked higher. The high-level idea of ranking is first to select key “signals” that make a post important and then to find out how to combine them to calculate a final ranking score. 156 | 157 | More specifically, we can select features that are relevant to the importance of any feed item, e.g., number of likes, comments, shares, time of the update, whether the post has images/videos, etc., and then, a score can be calculated using these features. This is generally enough for a simple ranking system. A better ranking system can significantly improve itself by constantly evaluating if we are making progress in user stickiness, retention, ads revenue, etc. 158 | 159 | ## 9. Data Partitioning 160 | 161 | #### a. Sharding posts and metadata 162 | Since we have a huge number of new posts every day and our read load is extremely high too, we need to distribute our data onto multiple machines such that we can read/write it efficiently. For sharding our databases that are storing posts and their metadata, we can have a similar design as discussed under [Designing Twitter](twitter.md). 163 | 164 | #### b. Sharding feed data 165 | For feed data, which is being stored in memory, we can partition it based on UserID. We can try storing all the data of a user on one server. When storing, we can pass the UserID to our hash function that will map the user to a cache server where we will store the user’s feed objects. Also, for any given user, since we don’t expect to store more than 500 FeedItemIDs, we will not run into a scenario where feed data for a user doesn’t fit on a single server. To get the feed of a user, we would always have to query only one server. For future growth and replication, we must use [Consistent Hashing](../basics/consistent-hashing.md). 166 | -------------------------------------------------------------------------------- /designs/instagram.md: -------------------------------------------------------------------------------- 1 | # Instagram 2 | ### Let's design a photo-sharing service like Instagram, where users can upload photos to share them with other users. 3 | 4 | Similar Services: Flickr, Picasa 5 | 6 | Difficulty Level: Medium 7 | 8 | ## 1. What is Instagram? 9 | Instagram is a social networking service which enables its users to upload and share their photos and videos with other users. Instagram users can choose to share information either publicly or privately. Anything shared publicly can be seen by any other user, whereas privately shared content can only be accessed by a specified set of people. Instagram also enables its users to share through many other social networking platforms, such as Facebook, Twitter, Flickr, and Tumblr. 10 | 11 | For the sake of this exercise, we plan to design a simpler version of Instagram, where a user can share photos and can also follow other users. The ‘News Feed’ for each user will consist of top photos of all the people the user follows. 12 | 13 | ## 2. Requirements and Goals of the System 14 | We’ll focus on the following set of requirements while designing the Instagram: 15 | 16 | ### Functional Requirements 17 | 1. Users should be able to upload/download/view photos. 18 | 2. Users can perform searches based on photo/video titles. 19 | 3. Users can follow other users. 20 | 4. The system should be able to generate and display a user’s News Feed consisting of top photos from all the people the user follows. 21 | 22 | ### Non-functional Requirements 23 | 1. Our service needs to be highly available. 24 | 2. The acceptable latency of the system is 200ms for News Feed generation. 25 | 3. Consistency can take a hit (in the interest of availability), if a user doesn’t see a photo for a while; it should be fine. 26 | 4. The system should be highly reliable; any uploaded photo or video should never be lost. 27 | 28 | **Not in scope:** Adding tags to photos, searching photos on tags, commenting on photos, tagging users to photos, who to follow, etc. 29 | 30 | ## 3. Some Design Considerations 31 | The system would be read-heavy, so we will focus on building a system that can retrieve photos quickly. 32 | 33 | 1. Practically, users can upload as many photos as they like. Efficient management of storage should be a crucial factor while designing this system. 34 | 2. Low latency is expected while viewing photos. 35 | 3. Data should be 100% reliable. If a user uploads a photo, the system will guarantee that it will never be lost. 36 | 37 | ## 4. Capacity Estimation and Constraints 38 | * Let’s assume we have 500M total users, with 1M daily active users. 39 | * 2M new photos every day, 23 new photos every second. 40 | * Average photo file size => 200KB 41 | * Total space required for 1 day of photos 42 | ####
2M * 200KB => 400 GB
43 | * Total space required for 10 years: 44 | ####
400GB * 365 (days a year) * 10 (years) ~= 1425TB
45 | 46 | ## 5. High Level System Design 47 | At a high-level, we need to support two scenarios, one to upload photos and the other to view/search photos. Our service would need some [object storage](https://en.wikipedia.org/wiki/Object_storage) servers to store photos and also some database servers to store metadata information about the photos. 48 | 49 | ![](../img/instagram-1.png) 50 | 51 | ## 6. Database Schema 52 | ### 💡Defining the DB schema in the early stages of the interview would help to understand the data flow among various components and later would guide towards data partitioning. 53 | We need to store data about users, their uploaded photos, and people they follow. Photo table will store all data related to a photo; we need to have an index on (PhotoID, CreationDate) since we need to fetch recent photos first. 54 | 55 | ![](../img/instagram-2.png) 56 | 57 | A straightforward approach for storing the above schema would be to use an RDBMS like MySQL since we require joins. But relational databases come with their challenges, especially when we need to scale them. For details, please take a look at [SQL vs. NoSQL](../basics/sql-vs-nosql.md). 58 | 59 | We can store photos in a distributed file storage like [HDFS](https://en.wikipedia.org/wiki/Apache_Hadoop) or [S3](https://en.wikipedia.org/wiki/Amazon_S3). 60 | 61 | We can store the above schema in a distributed key-value store to enjoy the benefits offered by NoSQL. All the metadata related to photos can go to a table where the ‘key’ would be the ‘PhotoID’ and the ‘value’ would be an object containing PhotoLocation, UserLocation, CreationTimestamp, etc. 62 | 63 | We need to store relationships between users and photos, to know who owns which photo. We also need to store the list of people a user follows. For both of these tables, we can use a wide-column datastore like [Cassandra](https://en.wikipedia.org/wiki/Apache_Cassandra). For the ‘UserPhoto’ table, the ‘key’ would be ‘UserID’ and the ‘value’ would be the list of ‘PhotoIDs’ the user owns, stored in different columns. We will have a similar scheme for the ‘UserFollow’ table. 64 | 65 | Cassandra or key-value stores in general, always maintain a certain number of replicas to offer reliability. Also, in such data stores, deletes don’t get applied instantly, data is retained for certain days (to support undeleting) before getting removed from the system permanently. 66 | 67 | ## 68 | 69 | ![](../img/instagram-3.png) 70 | ![](../img/instagram-4.png) 71 | -------------------------------------------------------------------------------- /designs/pastebin.md: -------------------------------------------------------------------------------- 1 | # Pastebin 2 | ### Let's design a Pastebin like web service, where users can store plain text. Users of the service will enter a piece of text and get a randomly generated URL to access it. 3 | 4 | Similar Services: pastebin.com, pasted.co, chopapp.com 5 | 6 | Difficulty Level: Easy 7 | 8 | ## 1. What is Pastebin? 9 | Pastebin like services enable users to store plain text or images over the network (typically the Internet) and generate unique URLs to access the uploaded data. Such services are also used to share data over the network quickly, as users would just need to pass the URL to let other users see it. 10 | 11 | If you haven’t used [pastebin.com](http://pastebin.com/) before, please try creating a new ‘Paste’ there and spend some time going through the different options their service offers. This will help you a lot in understanding this chapter. 12 | 13 | ## 2. Requirements and Goals of the System 14 | Our Pastebin service should meet the following requirements: 15 | 16 | ### Functional Requirements: 17 | 18 | 1. Users should be able to upload or “paste” their data and get a unique URL to access it. 19 | 2. Users will only be able to upload text. 20 | 3. Data and links will expire after a specific timespan automatically; users should also be able to specify expiration time. 21 | 4. Users should optionally be able to pick a custom alias for their paste. 22 | 23 | ### Non-Functional Requirements: 24 | 25 | 1. The system should be highly reliable, any data uploaded should not be lost. 26 | 2. The system should be highly available. This is required because if our service is down, users will not be able to access their Pastes. 27 | 3. Users should be able to access their Pastes in real-time with minimum latency. 28 | 4. Paste links should not be guessable (not predictable). 29 | 30 | ### Extended Requirements: 31 | 32 | 1. Analytics, e.g., how many times a paste was accessed? 33 | 2. Our service should also be accessible through REST APIs by other services. 34 | 35 | ## 3. Some Design Considerations 36 | Pastebin shares some requirements with [URL Shortening service](./short-url.md), but there are some additional design considerations we should keep in mind. 37 | 38 | What should be the limit on the amount of text user can paste at a time? We can limit users not to have Pastes bigger than 10MB to stop the abuse of the service. 39 | 40 | Should we impose size limits on custom URLs? Since our service supports custom URLs, users can pick any URL that they like, but providing a custom URL is not mandatory. However, it is reasonable (and often desirable) to impose a size limit on custom URLs, so that we have a consistent URL database. 41 | 42 | ## 4. Capacity Estimation and Constraints 43 | Our services will be read-heavy; there will be more read requests compared to new Pastes creation. We can assume a 5:1 ratio between read and write. 44 | 45 | **Traffic estimates:** Pastebin services are not expected to have traffic similar to Twitter or Facebook, let’s assume here that we get one million new pastes added to our system every day. This leaves us with five million reads per day. 46 | 47 | New Pastes per second: 48 | 49 | ####
1M / (24 hours * 3600 seconds) ~= 12 pastes/sec
50 | 51 | Paste reads per second: 52 | 53 | ####
5M / (24 hours * 3600 seconds) ~= 58 reads/sec
54 | 55 | **Storage estimates:** Users can upload maximum 10MB of data; commonly Pastebin like services are used to share source code, configs or logs. Such texts are not huge, so let’s assume that each paste on average contains 10KB. 56 | 57 | At this rate, we will be storing 10GB of data per day. 58 | 59 | ####
1M * 10KB => 10 GB/day
60 | 61 | If we want to store this data for ten years we would need the total storage capacity of 36TB. 62 | 63 | With 1M pastes every day we will have 3.6 billion Pastes in 10 years. We need to generate and store keys to uniquely identify these pastes. If we use base64 encoding ([A-Z, a-z, 0-9, ., -]) we would need six letters strings: 64 | 65 | ####
64^6 ~= 68.7 billion unique strings
66 | 67 | If it takes one byte to store one character, total size required to store 3.6B keys would be: 68 | 69 | ####
3.6B * 6 => 22 GB
70 | 71 | 22GB is negligible compared to 36TB. To keep some margin, we will assume a 70% capacity model (meaning we don’t want to use more than 70% of our total storage capacity at any point), which raises our storage needs to 51.4TB. 72 | 73 | **Bandwidth estimates:** For write requests, we expect 12 new pastes per second, resulting in 120KB of ingress per second. 74 | 75 | ####
12 * 10KB => 120 KB/s
76 | 77 | As for the read request, we expect 58 requests per second. Therefore, total data egress (sent to users) will be 0.6 MB/s. 78 | 79 | ####
58 * 10KB => 0.6 MB/s
80 | 81 | Although total ingress and egress are not big, we should keep these numbers in mind while designing our service. 82 | 83 | **Memory estimates:** We can cache some of the hot pastes that are frequently accessed. Following the 80-20 rule, meaning 20% of hot pastes generate 80% of traffic, we would like to cache these 20% pastes 84 | 85 | Since we have 5M read requests per day, to cache 20% of these requests, we would need: 86 | 87 | ####
0.2 * 5M * 10KB ~= 10 GB
88 | 89 | ## 5. System APIs 90 | We can have SOAP or REST APIs to expose the functionality of our service. Following could be the definitions of the APIs to create/retrieve/delete Pastes: 91 | 92 | ####
addPaste(api_dev_key, paste_data, custom_url=None user_name=None, paste_name=None, expire_date=None)
93 | 94 | **Parameters:** 95 | * api_dev_key (string): The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota. 96 | * paste_data (string): Textual data of the paste. 97 | * custom_url (string): Optional custom URL. 98 | * user_name (string): Optional user name to be used to generate URL. 99 | * paste_name (string): Optional name of the paste 100 | * expire_date (string): Optional expiration date for the paste. 101 | 102 | **Returns:** (string) 103 | A successful insertion returns the URL through which the paste can be accessed, otherwise, it will return an error code. 104 | 105 | Similarly, we can have retrieve and delete Paste APIs: 106 | 107 | ####
getPaste(api_dev_key, api_paste_key)
108 | 109 | Where “api_paste_key” is a string representing the Paste Key of the paste to be retrieved. This API will return the textual data of the paste. 110 | 111 | ####
deletePaste(api_dev_key, api_paste_key)
112 | 113 | A successful deletion returns ‘true’, otherwise returns ‘false’. 114 | 115 | ## 6. Database Design 116 | A few observations about the nature of the data we are storing: 117 | 118 | We need to store billions of records. 119 | Each metadata object we are storing would be small (less than 1KB). 120 | Each paste object we are storing can be of medium size (it can be a few MB). 121 | There are no relationships between records, except if we want to store which user created what Paste. 122 | Our service is read-heavy. 123 | Database Schema: 124 | We would need two tables, one for storing information about the Pastes and the other for users’ data. 125 | 126 | ![](../img/pastebin-1.png) 127 | 128 | Here, ‘URlHash’ is the URL equivalent of the TinyURL and ‘ContentKey’ is a reference to an external object storing the contents of the paste; we’ll discuss the external storage of the paste contents later in the chapter. 129 | 130 | ## 7. High Level Design 131 | At a high level, we need an application layer that will serve all the read and write requests. Application layer will talk to a storage layer to store and retrieve data. We can segregate our storage layer with one database storing metadata related to each paste, users, etc., while the other storing the paste contents in some object storage (like [Amazon S3](https://en.wikipedia.org/wiki/Amazon_S3)). This division of data will also allow us to scale them individually. 132 | 133 | ![](../img/pastebin-2.svg) 134 | 135 | ## 8. Component Design 136 | ### a. Application layer 137 | Our application layer will process all incoming and outgoing requests. The application servers will be talking to the backend data store components to serve the requests. 138 | 139 | **How to handle a write request?** Upon receiving a write request, our application server will generate a six-letter random string, which would serve as the key of the paste (if the user has not provided a custom key). The application server will then store the contents of the paste and the generated key in the database. After the successful insertion, the server can return the key to the user. One possible problem here could be that the insertion fails because of a duplicate key. Since we are generating a random key, there is a possibility that the newly generated key could match an existing one. In that case, we should regenerate a new key and try again. We should keep retrying until we don’t see failure due to the duplicate key. We should return an error to the user if the custom key they have provided is already present in our database. 140 | 141 | Another solution of the above problem could be to run a standalone Key Generation Service (KGS) that generates random six letters strings beforehand and stores them in a database (let’s call it key-DB). Whenever we want to store a new paste, we will just take one of the already generated keys and use it. This approach will make things quite simple and fast since we will not be worrying about duplications or collisions. KGS will make sure all the keys inserted in key-DB are unique. KGS can use two tables to store keys, one for keys that are not used yet and one for all the used keys. As soon as KGS gives some keys to an application server, it can move these to the used keys table. KGS can always keep some keys in memory so that whenever a server needs them, it can quickly provide them. As soon as KGS loads some keys in memory, it can move them to the used keys table, this way we can make sure each server gets unique keys. If KGS dies before using all the keys loaded in memory, we will be wasting those keys. We can ignore these keys given that we have a huge number of them. 142 | 143 | **Isn’t KGS a single point of failure?** Yes, it is. To solve this, we can have a standby replica of KGS and whenever the primary server dies it can take over to generate and provide keys. 144 | 145 | **Can each app server cache some keys from key-DB?** Yes, this can surely speed things up. Although in this case, if the application server dies before consuming all the keys, we will end up losing those keys. This could be acceptable since we have 68B unique six letters keys, which are a lot more than we require. 146 | 147 | **How does it handle a paste read request?** Upon receiving a read paste request, the application service layer contacts the datastore. The datastore searches for the key, and if it is found, returns the paste’s contents. Otherwise, an error code is returned. 148 | 149 | ### b. Datastore layer 150 | We can divide our datastore layer into two: 151 | 152 | 1. Metadata database: We can use a relational database like MySQL or a Distributed Key-Value store like Dynamo or Cassandra. 153 | 2. Object storage: We can store our contents in an Object Storage like Amazon’s S3. Whenever we feel like hitting our full capacity on content storage, we can easily increase it by adding more servers. 154 | 155 | ![](../img/pastebin-3.png) 156 | 157 | ## 9. Purging or DB Cleanup 158 | Please see [Designing a URL Shortening service](./short-url.md). 159 | 160 | ## 10. Data Partitioning and Replication 161 | Please see [Designing a URL Shortening service](./short-url.md). 162 | 163 | ## 11. Cache and Load Balancer 164 | Please see [Designing a URL Shortening service](./short-url.md). 165 | 166 | ## 12. Security and Permissions 167 | Please see [Designing a URL Shortening service](./short-url.md). 168 | -------------------------------------------------------------------------------- /designs/step-by-step-guide.md: -------------------------------------------------------------------------------- 1 | # System Design Interviews: A step by step guide 2 | A lot of software engineers struggle with system design interviews (SDIs) primarily because of three reasons: 3 | 4 | * The unstructured nature of SDIs, where the candidates are asked to work on an open-ended design problem that doesn’t have a standard answer. 5 | * Candidates lack of experience in developing complex and large scale systems. 6 | * Candidates did not spend enough time to prepare for SDIs. 7 | 8 | Like coding interviews, candidates who haven’t put a deliberate effort to prepare for SDIs, mostly perform poorly specially at top companies like Google, Facebook, Amazon, Microsoft, etc. In these companies, candidates who do not perform above average have a limited chance to get an offer. On the other hand, a good performance always results in a better offer (higher position and salary), since it shows the candidate’s ability to handle a complex system. 9 | 10 | In this course, we’ll follow a step by step approach to solve multiple design problems. First, let’s go through these steps: 11 | 12 | ## Step 1: Requirements clarifications 13 | It is always a good idea to ask questions about the exact scope of the problem we are solving. Design questions are mostly open-ended, and they don’t have ONE correct answer, that’s why clarifying ambiguities early in the interview becomes critical. Candidates who spend enough time to define the end goals of the system always have a better chance to be successful in the interview. Also, since we only have 35-40 minutes to design a (supposedly) large system, we should clarify what parts of the system we will be focusing on. 14 | 15 | Let’s expand this with an actual example of designing a Twitter-like service. Here are some questions for designing Twitter that should be answered before moving on to the next steps: 16 | 17 | * Will users of our service be able to post tweets and follow other people? 18 | * Should we also design to create and display the user’s timeline? 19 | * Will tweets contain photos and videos? 20 | * Are we focusing on the backend only or are we developing the front-end too? 21 | * Will users be able to search tweets? 22 | * Do we need to display hot trending topics? 23 | * Will there be any push notification for new (or important) tweets? 24 | 25 | All such questions will determine how our end design will look like. 26 | 27 | ## Step 2: Back-of-the-envelope estimation 28 | It is always a good idea to estimate the scale of the system we’re going to design. This will also help later when we will be focusing on scaling, partitioning, load balancing and caching. 29 | 30 | * What scale is expected from the system (e.g., number of new tweets, number of tweet views, number of timeline generations per sec., etc.)? 31 | * How much storage will we need? We will have different storage requirements if users can have photos and videos in their tweets. 32 | * What network bandwidth usage are we expecting? This will be crucial in deciding how we will manage traffic and balance load between servers. 33 | 34 | ## Step 3: System interface definition 35 | Define what APIs are expected from the system. This will not only establish the exact contract expected from the system but will also ensure if we haven’t gotten any requirements wrong. Some examples of APIs for our Twitter-like service will be: 36 | 37 | ```python 38 | postTweet(user_id, tweet_data, tweet_location, user_location, timestamp, …) 39 | ``` 40 | ```python 41 | generateTimeline(user_id, current_time, user_location, …) 42 | ``` 43 | ```python 44 | markTweetFavorite(user_id, tweet_id, timestamp, …) 45 | ``` 46 | 47 | ## Step 4: Defining data model 48 | Defining the data model in the early part of the interview will clarify how data will flow between different components of the system. Later, it will guide for data partitioning and management. The candidate should be able to identify various entities of the system, how they will interact with each other, and different aspects of data management like storage, transportation, encryption, etc. Here are some entities for our Twitter-like service: 49 | 50 | **User**: UserID, Name, Email, DoB, CreationData, LastLogin, etc. 51 | **Tweet**: TweetID, Content, TweetLocation, NumberOfLikes, TimeStamp, etc. 52 | **UserFollow**: UserdID1, UserID2 53 | **FavoriteTweets**: UserID, TweetID, TimeStamp 54 | 55 | Which database system should we use? Will NoSQL like [Cassandra](https://en.wikipedia.org/wiki/Apache_Cassandra) best fit our needs, or should we use a MySQL-like solution? What kind of block storage should we use to store photos and videos? 56 | 57 | ## Step 5: High-level design 58 | Draw a block diagram with 5-6 boxes representing the core components of our system. We should identify enough components that are needed to solve the actual problem from end-to-end. 59 | 60 | For Twitter, at a high-level, we will need multiple application servers to serve all the read/write requests with load balancers in front of them for traffic distributions. If we’re assuming that we will have a lot more read traffic (as compared to write), we can decide to have separate servers for handling these scenarios. On the backend, we need an efficient database that can store all the tweets and can support a huge number of reads. We will also need a distributed file storage system for storing photos and videos. 61 | 62 | ![](../img/high-level-design.png) 63 | 64 | ## Step 6: Detailed design 65 | Dig deeper into two or three major components; interviewer’s feedback should always guide us to what parts of the system need further discussion. We should be able to present different approaches, their pros and cons, and explain why we will prefer one approach on the other. Remember there is no single answer; the only important thing is to consider tradeoffs between different options while keeping system constraints in mind. 66 | 67 | * Since we will be storing a massive amount of data, how should we partition our data to distribute it to multiple databases? 68 | * Should we try to store all the data of a user on the same database? What issue could it cause? 69 | * How will we handle hot users who tweet a lot or follow lots of people? 70 | * Since users’ timeline will contain the most recent (and relevant) tweets, should we try to store our data in such a way that is optimized for scanning the latest tweets? 71 | * How much and at which layer should we introduce cache to speed things up? 72 | * What components need better load balancing? 73 | 74 | ## Step 7: Identifying and resolving bottlenecks 75 | Try to discuss as many bottlenecks as possible and different approaches to mitigate them. 76 | 77 | * Is there any single point of failure in our system? What are we doing to mitigate it? 78 | * Do we have enough replicas of the data so that if we lose a few servers, we can still serve our users? 79 | * Similarly, do we have enough copies of different services running such that a few failures will not cause total system shutdown? 80 | * How are we monitoring the performance of our service? Do we get alerts whenever critical components fail or their performance degrades? 81 | 82 | ## Summary 83 | In short, preparation and being organized during the interview are the keys to be successful in system design interviews. The steps mentioned above should guide you to remain on track and cover all the different aspects while designing a system. 84 | 85 | Let’s apply the above guidelines to design a few systems that are asked in SDIs. 86 | -------------------------------------------------------------------------------- /designs/twitter-search.md: -------------------------------------------------------------------------------- 1 | # Twitter Search 2 | ### Twitter is one of the largest social networking service where users can share photos, news, and text-based messages. In this chapter, we will design a service that can store and search user tweets. 3 | 4 | Difficulty Level: Medium 5 | 6 | ## 1. What is Twitter Search? 7 | 8 | Twitter users can update their status whenever they like. Each status (called tweet) consists of plain text and our goal is to design a system that allows searching over all the user tweets. 9 | 10 | ## 2. Requirements and Goals of the System 11 | 12 | * Let’s assume Twitter has 1.5 billion total users with 800 million daily active users. 13 | * On average Twitter gets 400 million tweets every day. 14 | * The average size of a tweet is 300 bytes. 15 | * Let’s assume there will be 500M searches every day. 16 | * The search query will consist of multiple words combined with AND/OR. 17 | 18 | We need to design a system that can efficiently store and query tweets. 19 | 20 | ## 3. Capacity Estimation and Constraints 21 | 22 | **Storage Capacity**: Since we have 400 million new tweets every day and each tweet on average is 300 bytes, the total storage we need, will be: 23 | 24 | 400M * 300 => 120GB/day 25 | 26 | Total storage per second: 27 | 28 | 120GB / 24hours / 3600sec ~= 1.38MB/second 29 | 30 | ## 4. System APIs 31 | 32 | We can have SOAP or REST APIs to expose the functionality of our service; following could be the definition of the search API: 33 | 34 | ####
search(api_dev_key, search_terms, maximum_results_to_return, sort, page_token)
35 | 36 | **Parameters**: 37 | * api_dev_key (string): The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota. 38 | * search_terms (string): A string containing the search terms. 39 | * maximum_results_to_return (number): Number of tweets to return. 40 | * sort (number): Optional sort mode: Latest first (0 - default), Best matched (1), Most liked (2). 41 | * page_token (string): This token will specify a page in the result set that should be returned. 42 | 43 | **Returns**: (JSON) 44 | 45 | A JSON containing information about a list of tweets matching the search query. Each result entry can have the user ID & name, tweet text, tweet ID, creation time, number of likes, etc. 46 | 47 | ## 5. High Level Design 48 | 49 | At the high level, we need to store all the statues in a database and also build an index that can keep track of which word appears in which tweet. This index will help us quickly find tweets that users are trying to search. 50 | 51 | ![](../img/twitter-search-1.png) 52 | 53 | ####
High level design for Twitter search
54 | 55 | ## 6. Detailed Component Design 56 | 57 | **1. Storage**: We need to store 120GB of new data every day. Given this huge amount of data, we need to come up with a data partitioning scheme that will be efficiently distributing the data onto multiple servers. If we plan for next five years, we will need the following storage: 58 | 59 | ####
120GB * 365days * 5years ~= 200TB
60 | 61 | If we never want to be more than 80% full at any time, we approximately will need 250TB of total storage. Let’s assume that we want to keep an extra copy of all tweets for fault tolerance; then, our total storage requirement will be 500TB. If we assume a modern server can store up to 4TB of data, we would need 125 such servers to hold all of the required data for the next five years. 62 | 63 | Let’s start with a simplistic design where we store the tweets in a MySQL database. We can assume that we store the tweets in a table having two columns, TweetID and TweetText. Let’s assume we partition our data based on TweetID. If our TweetIDs are unique system-wide, we can define a hash function that can map a TweetID to a storage server where we can store that tweet object. 64 | 65 | How can we create system-wide unique TweetIDs? If we are getting 400M new tweets each day, then how many tweet objects we can expect in five years? 66 | 67 | ####
400M * 365 days * 5 years => 730 billion
68 | 69 | This means we would need a five bytes number to identify TweetIDs uniquely. Let’s assume we have a service that can generate a unique TweetID whenever we need to store an object (The TweetID discussed here will be similar to TweetID discussed in [Designing Twitter](twitter.md)). We can feed the TweetID to our hash function to find the storage server and store our tweet object there. 70 | 71 | **2. Index**: What should our index look like? Since our tweet queries will consist of words, let’s build the index that can tell us which word comes in which tweet object. Let’s first estimate how big our index will be. If we want to build an index for all the English words and some famous nouns like people names, city names, etc., and if we assume that we have around 300K English words and 200K nouns, then we will have 500k total words in our index. Let’s assume that the average length of a word is five characters. If we are keeping our index in memory, we need 2.5MB of memory to store all the words: 72 | 73 | ####
500K * 5 => 2.5 MB
74 | 75 | Let’s assume that we want to keep the index in memory for all the tweets from only past two years. Since we will be getting 730B tweets in 5 years, this will give us 292B tweets in two years. Given that each TweetID will be 5 bytes, how much memory will we need to store all the TweetIDs? 76 | 77 | ####
292B * 5 => 1460 GB
78 | 79 | So our index would be like a big distributed hash table, where ‘key’ would be the word and ‘value’ will be a list of TweetIDs of all those tweets which contain that word. Assuming on average we have 40 words in each tweet and since we will not be indexing prepositions and other small words like ‘the’, ‘an’, ‘and’ etc., let’s assume we will have around 15 words in each tweet that need to be indexed. This means each TweetID will be stored 15 times in our index. So total memory we will need to store our index: 80 | 81 | ####
(1460 * 15) + 2.5MB ~= 21 TB
82 | 83 | Assuming a high-end server has 144GB of memory, we would need 152 such servers to hold our index. 84 | 85 | We can partition our data based on two criteria: 86 | 87 | **Sharding based on Words**: While building our index, we will iterate through all the words of a tweet and calculate the hash of each word to find the server where it would be indexed. To find all tweets containing a specific word we have to query only the server which contains this word. 88 | 89 | We have a couple of issues with this approach: 90 | 91 | 1. What if a word becomes hot? Then there will be a lot of queries on the server holding that word. This high load will affect the performance of our service. 92 | 2. Over time, some words can end up storing a lot of TweetIDs compared to others, therefore, maintaining a uniform distribution of words while tweets are growing is quite tricky. 93 | 94 | To recover from these situations we either have to repartition our data or use [Consistent Hashing](../basics/consistent-hashing.md). 95 | 96 | **Sharding based on the tweet object**: While storing, we will pass the TweetID to our hash function to find the server and index all the words of the tweet on that server. While querying for a particular word, we have to query all the servers, and each server will return a set of TweetIDs. A centralized server will aggregate these results to return them to the user. 97 | 98 | ![](../img/twitter-search-2.png) 99 | 100 | ####
Detailed component design
101 | 102 | ## 7. Fault Tolerance 103 | 104 | What will happen when an index server dies? We can have a secondary replica of each server and if the primary server dies it can take control after the failover. Both primary and secondary servers will have the same copy of the index. 105 | 106 | What if both primary and secondary servers die at the same time? We have to allocate a new server and rebuild the same index on it. How can we do that? We don’t know what words/tweets were kept on this server. If we were using ‘Sharding based on the tweet object’, the brute-force solution would be to iterate through the whole database and filter TweetIDs using our hash function to figure out all the required tweets that would be stored on this server. This would be inefficient and also during the time when the server was being rebuilt we would not be able to serve any query from it, thus missing some tweets that should have been seen by the user. 107 | 108 | How can we efficiently retrieve a mapping between tweets and the index server? We have to build a reverse index that will map all the TweetID to their index server. Our Index-Builder server can hold this information. We will need to build a Hashtable where the ‘key’ will be the index server number and the ‘value’ will be a HashSet containing all the TweetIDs being kept at that index server. Notice that we are keeping all the TweetIDs in a HashSet; this will enable us to add/remove tweets from our index quickly. So now, whenever an index server has to rebuild itself, it can simply ask the Index-Builder server for all the tweets it needs to store and then fetch those tweets to build the index. This approach will surely be fast. We should also have a replica of the Index-Builder server for fault tolerance. 109 | 110 | ## 8. Cache 111 | 112 | To deal with hot tweets we can introduce a cache in front of our database. We can use [Memcached](https://en.wikipedia.org/wiki/Memcached), which can store all such hot tweets in memory. Application servers, before hitting the backend database, can quickly check if the cache has that tweet. Based on clients’ usage patterns, we can adjust how many cache servers we need. For cache eviction policy, Least Recently Used (LRU) seems suitable for our system. 113 | 114 | ## 9. Load Balancing 115 | 116 | We can add a load balancing layer at two places in our system 117 | 118 | 1. Between Clients and Application servers and 119 | 2. Between Application servers and Backend server. 120 | 121 | Initially, a simple Round Robin approach can be adopted; that distributes incoming requests equally among backend servers. This LB is simple to implement and does not introduce any overhead. Another benefit of this approach is LB will take dead servers out of the rotation and will stop sending any traffic to it. A problem with Round Robin LB is it won’t take server load into consideration. If a server is overloaded or slow, the LB will not stop sending new requests to that server. To handle this, a more intelligent LB solution can be placed that periodically queries the backend server about their load and adjust traffic based on that. 122 | 123 | ## 10. Ranking 124 | 125 | How about if we want to rank the search results by social graph distance, popularity, relevance, etc? 126 | 127 | Let’s assume we want to rank tweets by popularity, like how many likes or comments a tweet is getting, etc. In such a case, our ranking algorithm can calculate a ‘popularity number’ (based on the number of likes, etc.) and store it with the index. Each partition can sort the results based on this popularity number before returning results to the aggregator server. The aggregator server combines all these results, sorts them based on the popularity number, and sends the top results to the user. 128 | -------------------------------------------------------------------------------- /designs/typeahead.md: -------------------------------------------------------------------------------- 1 | # Designing Typeahead Suggestion 2 | ### Let's design a real-time suggestion service, which will recommend terms to users as they enter text for searching. 3 | 4 | Similar Services: Auto-suggestions, Typeahead search 5 | 6 | Difficulty: Medium 7 | 8 | ## 1. What is Typeahead Suggestion? 9 | 10 | Typeahead suggestions enable users to search for known and frequently searched terms. As the user types into the search box, it tries to predict the query based on the characters the user has entered and gives a list of suggestions to complete the query. Typeahead suggestions help the user to articulate their search queries better. It’s not about speeding up the search process but rather about guiding the users and lending them a helping hand in constructing their search query. 11 | 12 | ## 2. Requirements and Goals of the System 13 | 14 | **Functional Requirements**: As the user types in their query, our service should suggest top 10 terms starting with whatever the user has typed. 15 | 16 | **Non-function Requirements**: The suggestions should appear in real-time. The user should be able to see the suggestions within 200ms. 17 | 18 | ## 3. Basic System Design and Algorithm 19 | 20 | The problem we are solving is that we have a lot of ‘strings’ that we need to store in such a way that users can search with any prefix. Our service will suggest next terms that will match the given prefix. For example, if our database contains the following terms: cap, cat, captain, or capital and the user has typed in ‘cap’, our system should suggest ‘cap’, ‘captain’ and ‘capital’. 21 | 22 | Since we’ve got to serve a lot of queries with minimum latency, we need to come up with a scheme that can efficiently store our data such that it can be queried quickly. We can’t depend upon some database for this; we need to store our index in memory in a highly efficient data structure. 23 | 24 | One of the most appropriate data structures that can serve our purpose is the Trie (pronounced “try”). A trie is a tree-like data structure used to store phrases where each node stores a character of the phrase in a sequential manner. For example, if we need to store ‘cap, cat, caption, captain, capital’ in the trie, it would look like: 25 | 26 | ![](../img/typeahead-1.png) 27 | 28 | Now if the user has typed ‘cap’, our service can traverse the trie to go to the node ‘P’ to find all the terms that start with this prefix (e.g., cap-tion, cap-ital etc). 29 | 30 | We can merge nodes that have only one branch to save storage space. The above trie can be stored like this: 31 | 32 | ![](../img/typeahead-2.png) 33 | 34 | **Should we have case insensitive trie?** For simplicity and search use-case, let’s assume our data is case insensitive. 35 | 36 | **How to find top suggestion?** Now that we can find all the terms for a given prefix, how can we find the top 10 terms for the given prefix? One simple solution could be to store the count of searches that terminated at each node, e.g., if users have searched about ‘CAPTAIN’ 100 times and ‘CAPTION’ 500 times, we can store this number with the last character of the phrase. Now if the user types ‘CAP’ we know the top most searched word under the prefix ‘CAP’ is ‘CAPTION’. So, to find the top suggestions for a given prefix, we can traverse the sub-tree under it. 37 | 38 | **Given a prefix, how much time will it take to traverse its sub-tree?** Given the amount of data we need to index, we should expect a huge tree. Even traversing a sub-tree would take really long, e.g., the phrase ‘system design interview questions’ is 30 levels deep. Since we have very strict latency requirements we do need to improve the efficiency of our solution. 39 | 40 | **Can we store top suggestions with each node?** This can surely speed up our searches but will require a lot of extra storage. We can store top 10 suggestions at each node that we can return to the user. We have to bear the big increase in our storage capacity to achieve the required efficiency. 41 | 42 | We can optimize our storage by storing only references of the terminal nodes rather than storing the entire phrase. To find the suggested terms we need to traverse back using the parent reference from the terminal node. We will also need to store the frequency with each reference to keep track of top suggestions. 43 | 44 | **How would we build this trie?** We can efficiently build our trie bottom up. Each parent node will recursively call all the child nodes to calculate their top suggestions and their counts. Parent nodes will combine top suggestions from all of their children to determine their top suggestions. 45 | 46 | **How to update the trie?** Assuming five billion searches every day, which would give us approximately 60K queries per second. If we try to update our trie for every query it’ll be extremely resource intensive and this can hamper our read requests, too. One solution to handle this could be to update our trie offline after a certain interval. 47 | 48 | As the new queries come in we can log them and also track their frequencies. Either we can log every query or do sampling and log every 1000th query. For example, if we don’t want to show a term which is searched for less than 1000 times, it’s safe to log every 1000th searched term. 49 | 50 | We can have a [Map-Reduce (MR)](https://en.wikipedia.org/wiki/MapReduce) set-up to process all the logging data periodically say every hour. These MR jobs will calculate frequencies of all searched terms in the past hour. We can then update our trie with this new data. We can take the current snapshot of the trie and update it with all the new terms and their frequencies. We should do this offline as we don’t want our read queries to be blocked by update trie requests. We can have two options: 51 | 52 | 1. We can make a copy of the trie on each server to update it offline. Once done we can switch to start using it and discard the old one. 53 | 2. Another option is we can have a master-slave configuration for each trie server. We can update slave while the master is serving traffic. Once the update is complete, we can make the slave our new master. We can later update our old master, which can then start serving traffic, too. 54 | 55 | **How can we update the frequencies of typeahead suggestions?** Since we are storing frequencies of our typeahead suggestions with each node, we need to update them too! We can update only differences in frequencies rather than recounting all search terms from scratch. If we’re keeping count of all the terms searched in last 10 days, we’ll need to subtract the counts from the time period no longer included and add the counts for the new time period being included. We can add and subtract frequencies based on [Exponential Moving Average (EMA)](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average) of each term. In EMA, we give more weight to the latest data. It’s also known as the exponentially weighted moving average. 56 | 57 | After inserting a new term in the trie, we’ll go to the terminal node of the phrase and increase its frequency. Since we’re storing the top 10 queries in each node, it is possible that this particular search term jumped into the top 10 queries of a few other nodes. So, we need to update the top 10 queries of those nodes then. We have to traverse back from the node to all the way up to the root. For every parent, we check if the current query is part of the top 10. If so, we update the corresponding frequency. If not, we check if the current query’s frequency is high enough to be a part of the top 10. If so, we insert this new term and remove the term with the lowest frequency. 58 | 59 | **How can we remove a term from the trie?** Let’s say we have to remove a term from the trie because of some legal issue or hate or piracy etc. We can completely remove such terms from the trie when the regular update happens, meanwhile, we can add a filtering layer on each server which will remove any such term before sending them to users. 60 | 61 | **What could be different ranking criteria for suggestions?** In addition to a simple count, for terms ranking, we have to consider other factors too, e.g., freshness, user location, language, demographics, personal history etc. 62 | 63 | ## 4. Permanent Storage of the Trie 64 | 65 | **How to store trie in a file so that we can rebuild our trie easily - this will be needed when a machine restarts?** We can take a snapshot of our trie periodically and store it in a file. This will enable us to rebuild a trie if the server goes down. To store, we can start with the root node and save the trie level-by-level. With each node, we can store what character it contains and how many children it has. Right after each node, we should put all of its children. Let’s assume we have the following trie: 66 | 67 | ![](../img/typeahead-3.png) 68 | 69 | If we store this trie in a file with the above-mentioned scheme, we will have: “C2,A2,R1,T,P,O1,D”. From this, we can easily rebuild our trie. 70 | 71 | If you’ve noticed, we are not storing top suggestions and their counts with each node. It is hard to store this information; as our trie is being stored top down, we don’t have child nodes created before the parent, so there is no easy way to store their references. For this, we have to recalculate all the top terms with counts. This can be done while we are building the trie. Each node will calculate its top suggestions and pass it to its parent. Each parent node will merge results from all of its children to figure out its top suggestions. 72 | 73 | ## 5. Scale Estimation 74 | 75 | If we are building a service that has the same scale as that of Google we can expect 5 billion searches every day, which would give us approximately 60K queries per second. 76 | 77 | Since there will be a lot of duplicates in 5 billion queries, we can assume that only 20% of these will be unique. If we only want to index the top 50% of the search terms, we can get rid of a lot of less frequently searched queries. Let’s assume we will have 100 million unique terms for which we want to build an index. 78 | 79 | **Storage Estimation**: If on the average each query consists of 3 words and if the average length of a word is 5 characters, this will give us 15 characters of average query size. Assuming we need 2 bytes to store a character, we will need 30 bytes to store an average query. So total storage we will need: 80 | 81 | ####
100 million * 30 bytes => 3 GB
82 | 83 | We can expect some growth in this data every day, but we should also be removing some terms that are not searched anymore. If we assume we have 2% new queries every day and if we are maintaining our index for the last one year, total storage we should expect: 84 | 85 | ####
3GB + (0.02 * 3 GB * 365 days) => 25 GB
86 | 87 | ## 6. Data Partition 88 | 89 | Although our index can easily fit on one server, we can still partition it in order to meet our requirements of higher efficiency and lower latencies. How can we efficiently partition our data to distribute it onto multiple servers? 90 | 91 | **a. Range Based Partitioning**: What if we store our phrases in separate partitions based on their first letter. So we save all the terms starting with the letter ‘A’ in one partition and those that start with the letter ‘B’ into another partition and so on. We can even combine certain less frequently occurring letters into one partition. We should come up with this partitioning scheme statically so that we can always store and search terms in a predictable manner. 92 | 93 | The main problem with this approach is that it can lead to unbalanced servers, for instance, if we decide to put all terms starting with the letter ‘E’ into one partition, but later we realize that we have too many terms that start with letter ‘E’ that we can’t fit into one partition. 94 | 95 | We can see that the above problem will happen with every statically defined scheme. It is not possible to calculate if each of our partitions will fit on one server statically. 96 | 97 | **b. Partition based on the maximum capacity of the server**: Let’s say we partition our trie based on the maximum memory capacity of the servers. We can keep storing data on a server as long as it has memory available. Whenever a sub-tree cannot fit into a server, we break our partition there to assign that range to this server and move on the next server to repeat this process. Let’s say if our first trie server can store all terms from ‘A’ to ‘AABC’, which mean our next server will store from ‘AABD’ onwards. If our second server could store up to ‘BXA’, the next server will start from ‘BXB’, and so on. We can keep a hash table to quickly access this partitioning scheme: 98 | 99 | Server 1, A-AABC 100 | Server 2, AABD-BXA 101 | Server 3, BXB-CDA 102 | 103 | For querying, if the user has typed ‘A’ we have to query both server 1 and 2 to find the top suggestions. When the user has typed ‘AA’, we still have to query server 1 and 2, but when the user has typed ‘AAA’ we only need to query server 1. 104 | 105 | We can have a load balancer in front of our trie servers which can store this mapping and redirect traffic. Also, if we are querying from multiple servers, either we need to merge the results on the server side to calculate the overall top results or make our clients do that. If we prefer to do this on the server side, we need to introduce another layer of servers between load balancers and trie severs (let’s call them aggregator). These servers will aggregate results from multiple trie servers and return the top results to the client. 106 | 107 | Partitioning based on the maximum capacity can still lead us to hotspots, e.g., if there are a lot of queries for terms starting with ‘cap’, the server holding it will have a high load compared to others. 108 | 109 | **c. Partition based on the hash of the term**: Each term will be passed to a hash function, which will generate a server number and we will store the term on that server. This will make our term distribution random and hence minimize hotspots. The disadvantage of this scheme is, to find typeahead suggestions for a term we have to ask all the servers and then aggregate the results. 110 | 111 | ## 7. Cache 112 | 113 | We should realize that caching the top searched terms will be extremely helpful in our service. There will be a small percentage of queries that will be responsible for most of the traffic. We can have separate cache servers in front of the trie servers holding most frequently searched terms and their typeahead suggestions. Application servers should check these cache servers before hitting the trie servers to see if they have the desired searched terms. This will save us time to traverse the tri. 114 | 115 | We can also build a simple Machine Learning (ML) model that can try to predict the engagement on each suggestion based on simple counting, personalization, or trending data, and cache these terms beforehand. 116 | 117 | ## 8. Replication and Load Balancer 118 | 119 | We should have replicas for our trie servers both for load balancing and also for fault tolerance. We also need a load balancer that keeps track of our data partitioning scheme and redirects traffic based on the prefixes. 120 | 121 | ## 9. Fault Tolerance 122 | 123 | **What will happen when a trie server goes down?** As discussed above we can have a master-slave configuration; if the master dies, the slave can take over after failover. Any server that comes back up, can rebuild the trie based on the last snapshot. 124 | 125 | ## 10. Typeahead Client 126 | 127 | We can perform the following optimizations on the client side to improve user’s experience: 128 | 129 | 1. The client should only try hitting the server if the user has not pressed any key for 50ms. 130 | 2. If the user is constantly typing, the client can cancel the in-progress requests. 131 | 3. Initially, the client can wait until the user enters a couple of characters. 132 | 4. Clients can pre-fetch some data from the server to save future requests. 133 | 5. Clients can store the recent history of suggestions locally. Recent history has a very high rate of being reused. 134 | 6. Establishing an early connection with the server turns out to be one of the most important factors. As soon as the user opens the search engine website, the client can open a connection with the server. So when a user types in the first character, the client doesn’t waste time in establishing the connection. 135 | 7. The server can push some part of their cache to CDNs and Internet Service Providers (ISPs) for efficiency. 136 | 137 | ## 11. Personalization 138 | 139 | Users will receive some typeahead suggestions based on their historical searches, location, language, etc. We can store the personal history of each user separately on the server and also cache them on the client. The server can add these personalized terms in the final set before sending it to the user. Personalized searches should always come before others. 140 | -------------------------------------------------------------------------------- /designs/uber-backend.md: -------------------------------------------------------------------------------- 1 | # Uber Backend 2 | ### Let's design a ride-sharing service like Uber, which connects passengers who need a ride with drivers who have a car. 3 | 4 | Similar Services: Lyft, Didi, Via, Sidecar, etc. 5 | 6 | Difficulty level: Hard 7 | 8 | Prerequisite: [Designing Yelp](yelp.md) 9 | 10 | ## 1. What is Uber? 11 | 12 | Uber enables its customers to book drivers for taxi rides. Uber drivers use their personal cars to drive customers around. Both customers and drivers communicate with each other through their smartphones using the Uber app. 13 | 14 | ## 2. Requirements and Goals of the System 15 | 16 | Let’s start with building a simpler version of Uber. 17 | 18 | There are two types of users in our system: 1) Drivers 2) Customers. 19 | 20 | * Drivers need to regularly notify the service about their current location and their availability to pick passengers. 21 | * Passengers get to see all the nearby available drivers. 22 | * Customer can request a ride; nearby drivers are notified that a customer is ready to be picked up. 23 | * Once a driver and a customer accept a ride, they can constantly see each other’s current location until the trip finishes. 24 | * Upon reaching the destination, the driver marks the journey complete to become available for the next ride. 25 | 26 | ## 3. Capacity Estimation and Constraints 27 | 28 | * Let’s assume we have 300M customers and 1M drivers with 1M daily active customers and 500K daily active drivers. 29 | * Let’s assume 1M daily rides. 30 | * Let’s assume that all active drivers notify their current location every three seconds. 31 | * Once a customer puts in a request for a ride, the system should be able to contact drivers in real-time. 32 | 33 | ## 4. Basic System Design and Algorithm 34 | 35 | We will take the solution discussed in [Designing Yelp](yelp.md) and modify it to make it work for the above-mentioned “Uber” use cases. The biggest difference we have is that our QuadTree was not built keeping in mind that there would be frequent updates to it. So, we have two issues with our Dynamic Grid solution: 36 | 37 | * Since all active drivers are reporting their locations every three seconds, we need to update our data structures to reflect that. If we have to update the QuadTree for every change in the driver’s position, it will take a lot of time and resources. To update a driver to its new location, we must find the right grid based on the driver’s previous location. If the new position does not belong to the current grid, we have to remove the driver from the current grid and move/reinsert the user to the correct grid. After this move, if the new grid reaches the maximum limit of drivers, we have to repartition it. 38 | * We need to have a quick mechanism to propagate the current location of all the nearby drivers to any active customer in that area. Also, when a ride is in progress, our system needs to notify both the driver and passenger about the current location of the car. 39 | 40 | Although our QuadTree helps us find nearby drivers quickly, a fast update in the tree is not guaranteed. 41 | 42 | **Do we need to modify our QuadTree every time a driver reports their location?** If we don’t update our QuadTree with every update from the driver, it will have some old data and will not reflect the current location of drivers correctly. If you recall, our purpose of building the QuadTree was to find nearby drivers (or places) efficiently. Since all active drivers report their location every three seconds, therefore there will be a lot more updates happening to our tree than querying for nearby drivers. So, what if we keep the latest position reported by all drivers in a hash table and update our QuadTree a little less frequently? Let’s assume we guarantee that a driver’s current location will be reflected in the QuadTree within 15 seconds. Meanwhile, we will maintain a hash table that will store the current location reported by drivers; let’s call this DriverLocationHT. 43 | 44 | **How much memory we need for DriverLocationHT?** We need to store DriveID, their present and old location, in the hash table. So, we need a total of 35 bytes to store one record: 45 | 46 | 1. DriverID (3 bytes - 1 million drivers) 47 | 2. Old latitude (8 bytes) 48 | 3. Old longitude (8 bytes) 49 | 4. New latitude (8 bytes) 50 | 5. New longitude (8 bytes) Total = 35 bytes 51 | 52 | If we have 1 million total drivers, we need the following memory (ignoring hash table overhead): 53 | 54 | ####
1 million * 35 bytes => 35 MB
55 | 56 | **How much bandwidth will our service consume to receive location updates from all drivers?** If we get DriverID and their location, it will be (3+16 => 19 bytes). If we receive this information every three seconds from 500K daily active drivers, we will be getting 9.5MB per three seconds. 57 | 58 | **Do we need to distribute DriverLocationHT onto multiple servers?** Although our memory and bandwidth requirements don’t require this, since all this information can easily be stored on one server, but, for scalability, performance, and fault tolerance, we should distribute DriverLocationHT onto multiple servers. We can distribute based on the DriverID to make the distribution completely random. Let’s call the machines holding DriverLocationHT the Driver Location server. Other than storing the driver’s location, each of these servers will do two things: 59 | 60 | 1. As soon as the server receives an update for a driver’s location, they will broadcast that information to all the interested customers. 61 | 2. The server needs to notify the respective QuadTree server to refresh the driver’s location. As discussed above, this can happen every 10 seconds. 62 | 63 | **How can we efficiently broadcast the driver’s location to customers?** We can have a *Push Model* where the server will push the positions to all the relevant users. We can have a dedicated Notification Service that can broadcast the current location of drivers to all the interested customers. We can build our Notification service on a publisher/subscriber model. When a customer opens the Uber app on their cell phone, they query the server to find nearby drivers. On the server side, before returning the list of drivers to the customer, we will subscribe the customer for all the updates from those drivers. We can maintain a list of customers (subscribers) interested in knowing the location of a driver and, whenever we have an update in DriverLocationHT for that driver, we can broadcast the current location of the driver to all subscribed customers. This way, our system makes sure that we always show the driver’s current position to the customer. 64 | 65 | **How much memory will we need to store all these subscriptions?** As we have estimated above, we will have 1M daily active customers and 500K daily active drivers. On average let’s assume that five customers subscribe to one driver. Let’s assume we store all this information in a hash table so that we can update it efficiently. We need to store driver and customer IDs to maintain the subscriptions. Assuming we will need 3 bytes for DriverID and 8 bytes for CustomerID, we will need 21MB of memory. 66 | 67 | ####
(500K * 3) + (500K * 5 * 8 ) ~= 21 MB
68 | 69 | **How much bandwidth will we need to broadcast the driver’s location to customers?** For every active driver, we have five subscribers, so the total subscribers we have: 70 | 71 | ####
5 * 500K => 2.5M
72 | 73 | To all these customers we need to send DriverID (3 bytes) and their location (16 bytes) every second, so, we need the following bandwidth: 74 | 75 | ####
2.5M * 19 bytes => 47.5 MB/s
76 | 77 | **How can we efficiently implement Notification service?** We can either use HTTP long polling or push notifications. 78 | 79 | **How will the new publishers/drivers get added for a current customer?** As we have proposed above, customers will be subscribed to nearby drivers when they open the Uber app for the first time, what will happen when a new driver enters the area the customer is looking at? To add a new customer/driver subscription dynamically, we need to keep track of the area the customer is watching. This will make our solution complicated; how about if instead of pushing this information, clients pull it from the server? 80 | 81 | **How about if clients pull information about nearby drivers from the server?** Clients can send their current location, and the server will find all the nearby drivers from the QuadTree to return them to the client. Upon receiving this information, the client can update their screen to reflect the current positions of the drivers. Clients can query every five seconds to limit the number of round trips to the server. This solution looks simpler compared to the push model described above. 82 | 83 | **Do we need to repartition a grid as soon as it reaches the maximum limit?** We can have a cushion to let each grid grow a little bigger beyond the limit before we decide to partition it. Let’s say our grids can grow/shrink an extra 10% before we partition/merge them. This should decrease the load for a grid partition or merge on high traffic grids. 84 | 85 | ![](../img/uber-1.png) 86 | 87 | #### How would “Request Ride” use case work? 88 | 89 | 1. The customer will put a request for a ride. 90 | 2. One of the Aggregator servers will take the request and asks QuadTree servers to return nearby drivers. 91 | 3. The Aggregator server collects all the results and sorts them by ratings. 92 | 4. The Aggregator server will send a notification to the top (say three) drivers simultaneously, whichever driver accepts the request first will be assigned the ride. The other drivers will receive a cancellation request. If none of the three drivers respond, the Aggregator will request a ride from the next three drivers from the list. 93 | 5. Once a driver accepts a request, the customer is notified. 94 | 95 | ## 5. Fault Tolerance and Replication 96 | 97 | **What if a Driver Location server or Notification server dies?** We would need replicas of these servers, so that if the primary dies the secondary can take control. Also, we can store this data in some persistent storage like SSDs that can provide fast IOs; this will ensure that if both primary and secondary servers die we can recover the data from the persistent storage. 98 | 99 | ## 6. Ranking 100 | 101 | How about if we want to rank the search results not just by proximity but also by popularity or relevance? 102 | 103 | **How can we return top rated drivers within a given radius?** Let’s assume we keep track of the overall ratings of each driver in our database and QuadTree. An aggregated number can represent this popularity in our system, e.g., how many stars does a driver get out of ten? While searching for the top 10 drivers within a given radius, we can ask each partition of the QuadTree to return the top 10 drivers with a maximum rating. The aggregator server can then determine the top 10 drivers among all the drivers returned by different partitions. 104 | 105 | ## 7. Advanced Issues 106 | 107 | 1. How will we handle clients on slow and disconnecting networks? 108 | 2. What if a client gets disconnected when they are a part of a ride? How will we handle billing in such a scenario? 109 | 3. How about if clients pull all the information, compared to servers always pushing it? 110 | -------------------------------------------------------------------------------- /designs/youtube.md: -------------------------------------------------------------------------------- 1 | # Youtube 2 | Let's design a video sharing service like Youtube, where users will be able to upload/view/search videos. 3 | 4 | Similar Services: netflix.com, vimeo.com, dailymotion.com, veoh.com 5 | 6 | Difficulty Level: Medium 7 | 8 | ## 1. Why Youtube? 9 | Youtube is one of the most popular video sharing websites in the world. Users of the service can upload, view, share, rate, and report videos as well as add comments on videos. 10 | 11 | ## 2. Requirements and Goals of the System 12 | For the sake of this exercise, we plan to design a simpler version of Youtube with following requirements: 13 | 14 | ### Functional Requirements: 15 | 16 | 1. Users should be able to upload videos. 17 | 2. Users should be able to share and view videos. 18 | 3. Users should be able to perform searches based on video titles. 19 | 4. Our services should be able to record stats of videos, e.g., likes/dislikes, total number of views, etc. 20 | 5. Users should be able to add and view comments on videos. 21 | 22 | ### Non-Functional Requirements: 23 | 24 | 1. The system should be highly reliable, any video uploaded should not be lost. 25 | 2. The system should be highly available. Consistency can take a hit (in the interest of availability); if a user doesn’t see a video for a while, it should be fine. 26 | 3. Users should have a real time experience while watching videos and should not feel any lag. 27 | 28 | ### Not in scope: 29 | Video recommendations, most popular videos, channels, subscriptions, watch later, favorites, etc. 30 | 31 | ## 3. Capacity Estimation and Constraints 32 | Let’s assume we have 1.5 billion total users, 800 million of whom are daily active users. If, on average, a user views five videos per day then the total video-views per second would be: 33 | 34 | ####
800M * 5 / 86400 sec => 46K videos/sec
35 | 36 | Let’s assume our upload:view ratio is 1:200, i.e., for every video upload we have 200 videos viewed, giving us 230 videos uploaded per second. 37 | 38 | ####
46K / 200 => 230 videos/sec
39 | 40 | **Storage Estimates**: Let’s assume that every minute 500 hours worth of videos are uploaded to Youtube. If on average, one minute of video needs 50MB of storage (videos need to be stored in multiple formats), the total storage needed for videos uploaded in a minute would be: 41 | 42 | ####
500 hours * 60 min * 50MB => 1500 GB/min (25 GB/sec)
43 | 44 | These numbers are estimated with ignoring video compression and replication, which would change our estimates. 45 | 46 | **Bandwidth estimates**: With 500 hours of video uploads per minute and assuming each video upload takes a bandwidth of 10MB/min, we would be getting 300GB of uploads every minute. 47 | 48 | ####
500 hours * 60 mins * 10MB => 300GB/min (5GB/sec)
49 | 50 | Assuming an upload:view ratio of 1:200, we would need 1TB/s outgoing bandwidth. 51 | 52 | ## 4. System APIs 53 | We can have SOAP or REST APIs to expose the functionality of our service. The following could be the definitions of the APIs for uploading and searching videos: 54 | 55 | ####
uploadVideo(api_dev_key, video_title, vide_description, tags[], category_id, default_language, recording_details, video_contents)
56 | 57 | **Parameters**: 58 | * api_dev_key (string): The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota. 59 | * video_title (string): Title of the video. 60 | * video_description (string): Optional description of the video. 61 | * tags (string[]): Optional tags for the video. 62 | * category_id (string): Category of the video, e.g., Film, Song, People, etc. 63 | * default_language (string): For example English, Mandarin, Hindi, etc. 64 | * recording_details (string): Location where the video was recorded. 65 | * video_contents (stream): Video to be uploaded. 66 | 67 | **Returns**: (string) 68 | 69 | A successful upload will return HTTP 202 (request accepted) and once the video encoding is completed the user is notified through email with a link to access the video. We can also expose a queryable API to let users know the current status of their uploaded video. 70 | 71 | ####
searchVideo(api_dev_key, search_query, user_location, maximum_videos_to_return, page_token)
72 | 73 | **Parameters**: 74 | * api_dev_key (string): The API developer key of a registered account of our service. 75 | * search_query (string): A string containing the search terms. 76 | * user_location (string): Optional location of the user performing the search. 77 | * maximum_videos_to_return (number): Maximum number of results returned in one request. 78 | * page_token (string): This token will specify a page in the result set that should be returned. 79 | 80 | **Returns**: (JSON) 81 | 82 | A JSON containing information about the list of video resources matching the search query. Each video resource will have a video title, a thumbnail, a video creation date, and a view count. 83 | 84 | ####
streamVideo(api_dev_key, video_id, offset, codec, resolution)
85 | 86 | **Parameters**: 87 | * api_dev_key (string): The API developer key of a registered account of our service. 88 | * video_id (string): A string to identify the video. 89 | * offset (number): We should be able to stream video from any offset; this offset would be a time in seconds from the beginning of the video. If we support playing/pausing a video from multiple devices, we will need to store the offset on the server. This will enable the users to start watching a video on any device from the same point where they left off. 90 | * codec (string) & resolution(string): We should send the codec and resolution info in the API from the client to support play/pause from multiple devices. Imagine you are watching a video on your TV’s Netflix app, paused it, and started watching it on your phone’s Netflix app. In this case, you would need codec and resolution, as both these devices have a different resolution and use a different codec. 91 | 92 | **Returns**: (STREAM) 93 | 94 | A media stream (a video chunk) from the given offset. 95 | 96 | ## 5. High Level Design 97 | At a high-level we would need the following components: 98 | 99 | 1. **Processing Queue**: Each uploaded video will be pushed to a processing queue to be de-queued later for encoding, thumbnail generation, and storage. 100 | 2. **Encoder**: To encode each uploaded video into multiple formats. 101 | 3. **Thumbnails generator**: To generate a few thumbnails for each video. 102 | 4. **Video and Thumbnail storage**: To store video and thumbnail files in some distributed file storage. 103 | 5. **User Database**: To store user’s information, e.g., name, email, address, etc. 104 | 6. **Video metadata storage**: A metadata database to store all the information about videos like title, file path in the system, uploading user, total views, likes, dislikes, etc. It will also be used to store all the video comments. 105 | 106 | ![](../img/youtube-1.png) 107 | 108 | ####
High level design of Youtube
109 | 110 | ## 6. Database Schema 111 | #### Video metadata storage - MySql 112 | 113 | Videos metadata can be stored in a SQL database. The following information should be stored with each video: 114 | 115 | * VideoID 116 | * Title 117 | * Description 118 | * Size 119 | * Thumbnail 120 | * Uploader/User 121 | * Total number of likes 122 | * Total number of dislikes 123 | * Total number of views 124 | 125 | For each video comment, we need to store following information: 126 | 127 | * CommentID 128 | * VideoID 129 | * UserID 130 | * Comment 131 | * TimeOfCreation 132 | 133 | #### User data storage - MySql 134 | 135 | * UserID 136 | * Name 137 | * Email 138 | * Address 139 | * Age 140 | * Registration details 141 | * etc. 142 | 143 | ## 7. Detailed Component Design 144 | The service would be read-heavy, so we will focus on building a system that can retrieve videos quickly. We can expect our read:write ratio to be 200:1, which means for every video upload there are 200 video views. 145 | 146 | **Where would videos be stored?** Videos can be stored in a distributed file storage system like [HDFS](https://en.wikipedia.org/wiki/Apache_Hadoop#HDFS) or [GlusterFS](https://en.wikipedia.org/wiki/GlusterFS). 147 | 148 | **How should we efficiently manage read traffic?** We should segregate our read traffic from write traffic. Since we will have multiple copies of each video, we can distribute our read traffic on different servers. For metadata, we can have master-slave configurations where writes will go to master first and then gets applied at all the slaves. Such configurations can cause some staleness in data, e.g., when a new video is added, its metadata would be inserted in the master first and before it gets applied at the slave our slaves would not be able to see it; and therefore it will be returning stale results to the user. This staleness might be acceptable in our system as it would be very short-lived and the user would be able to see the new videos after a few milliseconds. 149 | 150 | **Where would thumbnails be stored?** There will be a lot more thumbnails than videos. If we assume that every video will have five thumbnails, we need to have a very efficient storage system that can serve a huge read traffic. There will be two consideration before deciding which storage system should be used for thumbnails: 151 | 152 | 1. Thumbnails are small files with, say, a maximum 5KB each. 153 | 2. Read traffic for thumbnails will be huge compared to videos. Users will be watching one video at a time, but they might be looking at a page that has 20 thumbnails of other videos. 154 | 155 | Let’s evaluate storing all the thumbnails on a disk. Given that we have a huge number of files, we have to perform a lot of seeks to different locations on the disk to read these files. This is quite inefficient and will result in higher latencies. 156 | 157 | [Bigtable](https://en.wikipedia.org/wiki/Bigtable) can be a reasonable choice here as it combines multiple files into one block to store on the disk and is very efficient in reading a small amount of data. Both of these are the two most significant requirements of our service. Keeping hot thumbnails in the cache will also help in improving the latencies and, given that thumbnails files are small in size, we can easily cache a large number of such files in memory. 158 | 159 | **Video Uploads**: Since videos could be huge, if while uploading the connection drops we should support resuming from the same point. 160 | 161 | **Video Encoding**: Newly uploaded videos are stored on the server and a new task is added to the processing queue to encode the video into multiple formats. Once all the encoding will be completed the uploader will be notified and the video is made available for view/sharing. 162 | 163 | ![](../img/youtube-2.png) 164 | 165 | ####
Detailed component design of Youtube
166 | 167 | ## 8. Metadata Sharding 168 | Since we have a huge number of new videos every day and our read load is extremely high, therefore, we need to distribute our data onto multiple machines so that we can perform read/write operations efficiently. We have many options to shard our data. Let’s go through different strategies of sharding this data one by one: 169 | 170 | **Sharding based on UserID**: We can try storing all the data for a particular user on one server. While storing, we can pass the UserID to our hash function which will map the user to a database server where we will store all the metadata for that user’s videos. While querying for videos of a user, we can ask our hash function to find the server holding the user’s data and then read it from there. To search videos by titles we will have to query all servers and each server will return a set of videos. A centralized server will then aggregate and rank these results before returning them to the user. 171 | 172 | This approach has a couple of issues: 173 | 174 | 1. What if a user becomes popular? There could be a lot of queries on the server holding that user; this could create a performance bottleneck. This will also affect the overall performance of our service. 175 | 2. Over time, some users can end up storing a lot of videos compared to others. Maintaining a uniform distribution of growing user data is quite tricky. 176 | 177 | To recover from these situations either we have to repartition/redistribute our data or used consistent hashing to balance the load between servers. 178 | 179 | **Sharding based on VideoID**: Our hash function will map each VideoID to a random server where we will store that Video’s metadata. To find videos of a user we will query all servers and each server will return a set of videos. A centralized server will aggregate and rank these results before returning them to the user. This approach solves our problem of popular users but shifts it to popular videos. 180 | 181 | We can further improve our performance by introducing a cache to store hot videos in front of the database servers. 182 | 183 | ## 9. Video Deduplication 184 | With a huge number of users uploading a massive amount of video data our service will have to deal with widespread video duplication. Duplicate videos often differ in aspect ratios or encodings, can contain overlays or additional borders, or can be excerpts from a longer original video. The proliferation of duplicate videos can have an impact on many levels: 185 | 186 | 1. Data Storage: We could be wasting storage space by keeping multiple copies of the same video. 187 | 2. Caching: Duplicate videos would result in degraded cache efficiency by taking up space that could be used for unique content. 188 | 3. Network usage: Duplicate videos will also increase the amount of data that must be sent over the network to in-network caching systems. 189 | 4. Energy consumption: Higher storage, inefficient cache, and network usage could result in energy wastage. 190 | 191 | For the end user, these inefficiencies will be realized in the form of duplicate search results, longer video startup times, and interrupted streaming. 192 | 193 | For our service, deduplication makes most sense early; when a user is uploading a video as compared to post-processing it to find duplicate videos later. Inline deduplication will save us a lot of resources that can be used to encode, transfer, and store the duplicate copy of the video. As soon as any user starts uploading a video, our service can run video matching algorithms (e.g., [Block Matching](https://en.wikipedia.org/wiki/Block-matching_algorithm), [Phase Correlation](https://en.wikipedia.org/wiki/Phase_correlation), etc.) to find duplications. If we already have a copy of the video being uploaded, we can either stop the upload and use the existing copy or continue the upload and use the newly uploaded video if it is of higher quality. If the newly uploaded video is a subpart of an existing video or, vice versa, we can intelligently divide the video into smaller chunks so that we only upload the parts that are missing. 194 | 195 | ## 10. Load Balancing 196 | We should use [Consistent Hashing](../basics/consistent-hashing.md) among our cache servers, which will also help in balancing the load between cache servers. Since we will be using a static hash-based scheme to map videos to hostnames it can lead to an uneven load on the logical replicas due to the different popularity of each video. For instance, if a video becomes popular, the logical replica corresponding to that video will experience more traffic than other servers. These uneven loads for logical replicas can then translate into uneven load distribution on corresponding physical servers. To resolve this issue any busy server in one location can redirect a client to a less busy server in the same cache location. We can use dynamic HTTP redirections for this scenario. 197 | 198 | However, the use of redirections also has its drawbacks. First, since our service tries to load balance locally, it leads to multiple redirections if the host that receives the redirection can’t serve the video. Also, each redirection requires a client to make an additional HTTP request; it also leads to higher delays before the video starts playing back. Moreover, inter-tier (or cross data-center) redirections lead a client to a distant cache location because the higher tier caches are only present at a small number of locations. 199 | 200 | ## 11. Cache 201 | To serve globally distributed users, our service needs a massive-scale video delivery system. Our service should push its content closer to the user using a large number of geographically distributed video cache servers. We need to have a strategy that will maximize user performance and also evenly distributes the load on its cache servers. 202 | 203 | We can introduce a cache for metadata servers to cache hot database rows. Using Memcache to cache the data and Application servers before hitting database can quickly check if the cache has the desired rows. Least Recently Used (LRU) can be a reasonable cache eviction policy for our system. Under this policy, we discard the least recently viewed row first. 204 | 205 | **How can we build more intelligent cache?** If we go with 80-20 rule, i.e., 20% of daily read volume for videos is generating 80% of traffic, meaning that certain videos are so popular that the majority of people view them; it follows that we can try caching 20% of daily read volume of videos and metadata. 206 | 207 | ## 12. Content Delivery Network (CDN) 208 | A CDN is a system of distributed servers that deliver web content to a user based in the geographic locations of the user, the origin of the web page and a content delivery server. Take a look at ‘CDN’ section in our [Caching](../basics/caching.md) chapter. 209 | 210 | Our service can move popular videos to CDNs: 211 | 212 | * CDNs replicate content in multiple places. There’s a better chance of videos being closer to the user and, with fewer hops, videos will stream from a friendlier network. 213 | * CDN machines make heavy use of caching and can mostly serve videos out of memory. 214 | 215 | Less popular videos (1-20 views per day) that are not cached by CDNs can be served by our servers in various data centers. 216 | 217 | ## 13. Fault Tolerance 218 | We should use [Consistent Hashing](../basics/consistent-hashing.md) for distribution among database servers. Consistent hashing will not only help in replacing a dead server, but also help in distributing load among servers. 219 | -------------------------------------------------------------------------------- /img/.keep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/.keep -------------------------------------------------------------------------------- /img/basics/.keep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /img/basics/RedundancyAndReplication.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/basics/RedundancyAndReplication.png -------------------------------------------------------------------------------- /img/basics/cap-theorem.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/basics/cap-theorem.png -------------------------------------------------------------------------------- /img/basics/consistent-hashing-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/basics/consistent-hashing-1.png -------------------------------------------------------------------------------- /img/basics/consistent-hashing-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/basics/consistent-hashing-2.png -------------------------------------------------------------------------------- /img/basics/consistent-hashing-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/basics/consistent-hashing-3.png -------------------------------------------------------------------------------- /img/basics/consistent-hashing-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/basics/consistent-hashing-4.png -------------------------------------------------------------------------------- /img/basics/consistent-hashing-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/basics/consistent-hashing-5.png -------------------------------------------------------------------------------- /img/basics/index-table.svg: -------------------------------------------------------------------------------- 1 | 2 |
Search Key
Pointer
Title 3
Database indexes


Intro to computers


Intro to databases


Intro to software


[Not supported by viewer]
TitleWriterDate
Intro to databases
Michele Clark
Dec 2, 2017
Database indexes
Adam Cambel
Nov, 14, 2016
Intro to computers
Nickolas Homes
Feb 5, 2018
Intro to softwareNicholas RobinFeb 7, 2018
[Not supported by viewer]
Table
Table
Index
Index
-------------------------------------------------------------------------------- /img/basics/load-balancer-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/basics/load-balancer-2.png -------------------------------------------------------------------------------- /img/basics/proxy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/basics/proxy.png -------------------------------------------------------------------------------- /img/dropbox-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/dropbox-1.png -------------------------------------------------------------------------------- /img/dropbox-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/dropbox-2.png -------------------------------------------------------------------------------- /img/dropbox-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/dropbox-3.png -------------------------------------------------------------------------------- /img/dropbox-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/dropbox-4.png -------------------------------------------------------------------------------- /img/facebook-messenger-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/facebook-messenger-1.png -------------------------------------------------------------------------------- /img/facebook-messenger-10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/facebook-messenger-10.png -------------------------------------------------------------------------------- /img/facebook-messenger-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/facebook-messenger-2.png -------------------------------------------------------------------------------- /img/facebook-messenger-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/facebook-messenger-3.png -------------------------------------------------------------------------------- /img/facebook-messenger-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/facebook-messenger-4.png -------------------------------------------------------------------------------- /img/facebook-messenger-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/facebook-messenger-5.png -------------------------------------------------------------------------------- /img/facebook-messenger-6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/facebook-messenger-6.png -------------------------------------------------------------------------------- /img/facebook-messenger-7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/facebook-messenger-7.png -------------------------------------------------------------------------------- /img/facebook-messenger-8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/facebook-messenger-8.png -------------------------------------------------------------------------------- /img/facebook-messenger-9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/facebook-messenger-9.png -------------------------------------------------------------------------------- /img/high-level-design.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/high-level-design.png -------------------------------------------------------------------------------- /img/instagram-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/instagram-1.png -------------------------------------------------------------------------------- /img/instagram-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/instagram-2.png -------------------------------------------------------------------------------- /img/instagram-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/instagram-3.png -------------------------------------------------------------------------------- /img/instagram-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/instagram-4.png -------------------------------------------------------------------------------- /img/newsfeed-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/newsfeed-1.png -------------------------------------------------------------------------------- /img/newsfeed-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/newsfeed-2.png -------------------------------------------------------------------------------- /img/pastebin-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/pastebin-1.png -------------------------------------------------------------------------------- /img/pastebin-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/pastebin-3.png -------------------------------------------------------------------------------- /img/rate-limiter-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/rate-limiter-1.png -------------------------------------------------------------------------------- /img/rate-limiter-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/rate-limiter-2.png -------------------------------------------------------------------------------- /img/rate-limiter-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/rate-limiter-3.png -------------------------------------------------------------------------------- /img/rate-limiter-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/rate-limiter-4.png -------------------------------------------------------------------------------- /img/rate-limiter-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/rate-limiter-5.png -------------------------------------------------------------------------------- /img/rate-limiter-6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/rate-limiter-6.png -------------------------------------------------------------------------------- /img/rate-limiter-7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/rate-limiter-7.png -------------------------------------------------------------------------------- /img/rate-limiter-8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/rate-limiter-8.png -------------------------------------------------------------------------------- /img/rate-limiter-9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/rate-limiter-9.png -------------------------------------------------------------------------------- /img/ticketmaster-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/ticketmaster-1.png -------------------------------------------------------------------------------- /img/ticketmaster-10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/ticketmaster-10.png -------------------------------------------------------------------------------- /img/ticketmaster-11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/ticketmaster-11.png -------------------------------------------------------------------------------- /img/ticketmaster-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/ticketmaster-2.png -------------------------------------------------------------------------------- /img/ticketmaster-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/ticketmaster-3.png -------------------------------------------------------------------------------- /img/ticketmaster-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/ticketmaster-4.png -------------------------------------------------------------------------------- /img/ticketmaster-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/ticketmaster-5.png -------------------------------------------------------------------------------- /img/ticketmaster-6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/ticketmaster-6.png -------------------------------------------------------------------------------- /img/ticketmaster-7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/ticketmaster-7.png -------------------------------------------------------------------------------- /img/ticketmaster-8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/ticketmaster-8.png -------------------------------------------------------------------------------- /img/ticketmaster-9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/ticketmaster-9.png -------------------------------------------------------------------------------- /img/twitter-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/twitter-1.png -------------------------------------------------------------------------------- /img/twitter-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/twitter-2.png -------------------------------------------------------------------------------- /img/twitter-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/twitter-3.png -------------------------------------------------------------------------------- /img/twitter-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/twitter-4.png -------------------------------------------------------------------------------- /img/twitter-search-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/twitter-search-1.png -------------------------------------------------------------------------------- /img/twitter-search-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/twitter-search-2.png -------------------------------------------------------------------------------- /img/typeahead-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/typeahead-1.png -------------------------------------------------------------------------------- /img/typeahead-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/typeahead-2.png -------------------------------------------------------------------------------- /img/typeahead-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/typeahead-3.png -------------------------------------------------------------------------------- /img/uber-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/uber-1.png -------------------------------------------------------------------------------- /img/url-shortener-1.svg: -------------------------------------------------------------------------------- 1 | 2 |
URL
URL
Hash: varchar(16)
<span>Hash: varchar(16)</span>
PK
PK
OriginalURL: varchar(512)
<span>OriginalURL: varchar(512)</span>
CreationDate: datetime
<span>CreationDate: datetime</span>
ExpirationDate: datatime
<span>ExpirationDate: datatime</span>
UserID: int
[Not supported by viewer]
User
User
UserID: int
<span>UserID: int</span>
PK
PK
Name: varchar(20)
<span>Name: varchar(20)</span>
Email: varchar(32)
<span>Email: varchar(32)</span>
CreationDate: datetime
CreationDate: datetime<br>
LastLogin: datatime
<span>LastLogin: datatime</span>
-------------------------------------------------------------------------------- /img/url-shortener-10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-10.png -------------------------------------------------------------------------------- /img/url-shortener-11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-11.png -------------------------------------------------------------------------------- /img/url-shortener-12.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-12.png -------------------------------------------------------------------------------- /img/url-shortener-13.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-13.png -------------------------------------------------------------------------------- /img/url-shortener-14.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-14.png -------------------------------------------------------------------------------- /img/url-shortener-15.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-15.png -------------------------------------------------------------------------------- /img/url-shortener-16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-16.png -------------------------------------------------------------------------------- /img/url-shortener-17.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-17.png -------------------------------------------------------------------------------- /img/url-shortener-18.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-18.png -------------------------------------------------------------------------------- /img/url-shortener-19.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-19.png -------------------------------------------------------------------------------- /img/url-shortener-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-2.png -------------------------------------------------------------------------------- /img/url-shortener-20.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-20.png -------------------------------------------------------------------------------- /img/url-shortener-21.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-21.png -------------------------------------------------------------------------------- /img/url-shortener-22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-22.png -------------------------------------------------------------------------------- /img/url-shortener-23.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-23.png -------------------------------------------------------------------------------- /img/url-shortener-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-3.png -------------------------------------------------------------------------------- /img/url-shortener-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-4.png -------------------------------------------------------------------------------- /img/url-shortener-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-5.png -------------------------------------------------------------------------------- /img/url-shortener-6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-6.png -------------------------------------------------------------------------------- /img/url-shortener-7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-7.png -------------------------------------------------------------------------------- /img/url-shortener-8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-8.png -------------------------------------------------------------------------------- /img/url-shortener-9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/url-shortener-9.png -------------------------------------------------------------------------------- /img/web-crawler-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/web-crawler-1.png -------------------------------------------------------------------------------- /img/web-crawler-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/web-crawler-2.png -------------------------------------------------------------------------------- /img/yelp-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/yelp-1.png -------------------------------------------------------------------------------- /img/yelp-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/yelp-2.png -------------------------------------------------------------------------------- /img/yelp-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/yelp-3.png -------------------------------------------------------------------------------- /img/youtube-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/youtube-1.png -------------------------------------------------------------------------------- /img/youtube-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RickSayd/navigating-system-design/86991ac8f235b23d79cdcc6976bde8ce02ae1ca2/img/youtube-2.png --------------------------------------------------------------------------------