├── designs
    ├── yelp.md
    ├── ticketmaster.md
    ├── uber-backend.md
    ├── facebook-newsfeed.md
    ├── pastebin.md
    ├── youtube.md
    ├── dropbox.md
    ├── twitter.md
    ├── instagram.md
    ├── web-crawler.md
    ├── twitter-search.md
    ├── facebook-messenger.md
    └── short-url.md
├── img
    ├── yelp-overview.png
    ├── dropbox-detail.png
    ├── dropbox-overview.png
    ├── instagram-detail.png
    ├── pastebin-detail.png
    ├── short-url-detail.png
    ├── twitter-detail.png
    ├── twitter-overview.png
    ├── youtube-detail.png
    ├── youtube-overview.png
    ├── instagram-overview.png
    ├── pastebin-overview.png
    ├── short-url-overview.png
    ├── web-crawler-detail.png
    ├── ticketmaster-overview.png
    ├── twitter-search-detail.png
    ├── uber-backend-overview.png
    ├── web-crawler-overview.png
    ├── twitter-search-overview.png
    ├── facebook-messenger-detail.png
    ├── facebook-newsfeed-overview.png
    └── facebook-messenger-overview.png
├── basics
    ├── indexes.md
    ├── queues.md
    ├── proxies.md
    ├── redundancy.md
    ├── load-balancing.md
    ├── cap-theorem.md
    ├── consistent-hashing.md
    ├── key-characteristics.md
    ├── client-server-communication.md
    ├── sql-vs-nosql.md
    ├── sharding.md
    └── caching.md
├── book.json
├── bin
    └── publish-gh-pages.sh
├── SUMMARY.md
└── README.md


/designs/yelp.md:
--------------------------------------------------------------------------------
1 | # Yelp
2 | 
3 | ## Summary
4 | ![overview](../img/yelp-overview.png)
5 | 


--------------------------------------------------------------------------------
/designs/ticketmaster.md:
--------------------------------------------------------------------------------
1 | # Ticketmaster
2 | 
3 | ## Summary
4 | ![overview](../img/ticketmaster-overview.png)
5 | 


--------------------------------------------------------------------------------
/designs/uber-backend.md:
--------------------------------------------------------------------------------
1 | # Uber Backend
2 | 
3 | ## Summary
4 | ![overview](../img/uber-backend-overview.png)
5 | 


--------------------------------------------------------------------------------
/img/yelp-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/yelp-overview.png


--------------------------------------------------------------------------------
/img/dropbox-detail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/dropbox-detail.png


--------------------------------------------------------------------------------
/img/dropbox-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/dropbox-overview.png


--------------------------------------------------------------------------------
/img/instagram-detail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/instagram-detail.png


--------------------------------------------------------------------------------
/img/pastebin-detail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/pastebin-detail.png


--------------------------------------------------------------------------------
/img/short-url-detail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/short-url-detail.png


--------------------------------------------------------------------------------
/img/twitter-detail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/twitter-detail.png


--------------------------------------------------------------------------------
/img/twitter-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/twitter-overview.png


--------------------------------------------------------------------------------
/img/youtube-detail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/youtube-detail.png


--------------------------------------------------------------------------------
/img/youtube-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/youtube-overview.png


--------------------------------------------------------------------------------
/img/instagram-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/instagram-overview.png


--------------------------------------------------------------------------------
/img/pastebin-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/pastebin-overview.png


--------------------------------------------------------------------------------
/img/short-url-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/short-url-overview.png


--------------------------------------------------------------------------------
/img/web-crawler-detail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/web-crawler-detail.png


--------------------------------------------------------------------------------
/designs/facebook-newsfeed.md:
--------------------------------------------------------------------------------
1 | # Facebook Newsfeed
2 | 
3 | ## Summary
4 | ![overview](../img/facebook-newsfeed-overview.png)
5 | 


--------------------------------------------------------------------------------
/img/ticketmaster-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/ticketmaster-overview.png


--------------------------------------------------------------------------------
/img/twitter-search-detail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/twitter-search-detail.png


--------------------------------------------------------------------------------
/img/uber-backend-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/uber-backend-overview.png


--------------------------------------------------------------------------------
/img/web-crawler-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/web-crawler-overview.png


--------------------------------------------------------------------------------
/img/twitter-search-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/twitter-search-overview.png


--------------------------------------------------------------------------------
/designs/pastebin.md:
--------------------------------------------------------------------------------
1 | # Pastebin
2 | 
3 | ## Summary
4 | ![overview](../img/pastebin-overview.png)
5 | ![summary](../img/pastebin-detail.png)


--------------------------------------------------------------------------------
/designs/youtube.md:
--------------------------------------------------------------------------------
1 | # Youtube
2 | 
3 | ## Summary
4 | ![overview](../img/youtube-overview.png)
5 | ![detail](../img/youtube-detail.png)
6 | 


--------------------------------------------------------------------------------
/img/facebook-messenger-detail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/facebook-messenger-detail.png


--------------------------------------------------------------------------------
/img/facebook-newsfeed-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/facebook-newsfeed-overview.png


--------------------------------------------------------------------------------
/designs/dropbox.md:
--------------------------------------------------------------------------------
1 | # Dropbox
2 | 
3 | ## Summary
4 | ![overview](../img/dropbox-overview.png)
5 | ![summary](../img/dropbox-detail.png)
6 | 


--------------------------------------------------------------------------------
/designs/twitter.md:
--------------------------------------------------------------------------------
1 | # Twitter
2 | 
3 | ## Summary
4 | ![overview](../img/twitter-overview.png)
5 | ![summary](../img/twitter-detail.png)
6 | 


--------------------------------------------------------------------------------
/img/facebook-messenger-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/valsamovich/grokking-system-design/HEAD/img/facebook-messenger-overview.png


--------------------------------------------------------------------------------
/designs/instagram.md:
--------------------------------------------------------------------------------
1 | # Instagram
2 | 
3 | ## Summary
4 | ![overview](../img/instagram-overview.png)
5 | ![summary](../img/instagram-detail.png)
6 | 


--------------------------------------------------------------------------------
/designs/web-crawler.md:
--------------------------------------------------------------------------------
1 | # Web Crawler
2 | 
3 | ## Summary
4 | ![overview](../img/web-crawler-overview.png)
5 | ![detail](../img/web-crawler-detail.png)
6 | 


--------------------------------------------------------------------------------
/designs/twitter-search.md:
--------------------------------------------------------------------------------
1 | # Twitter Search
2 | 
3 | ## Summary
4 | ![overview](../img/twitter-search-overview.png)
5 | ![detail](../img/twitter-search-detail.png)
6 | 


--------------------------------------------------------------------------------
/designs/facebook-messenger.md:
--------------------------------------------------------------------------------
1 | # Facebook Messenger
2 | 
3 | ## Summary
4 | ![overview](../img/facebook-messenger-overview.png)
5 | ![summary](../img/facebook-messenger-detail.png)
6 | 


--------------------------------------------------------------------------------
/basics/indexes.md:
--------------------------------------------------------------------------------
1 | Indexes
2 | ====
3 | 
4 | - Improve the performance of search queries.
5 | - Decrease the write performance. This performance degradation applies to all insert, update, and delete operations.
6 | 


--------------------------------------------------------------------------------
/book.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "plugins": [
 3 |     "-sharing",
 4 |     "github",
 5 |     "anchors",
 6 |     "splitter"
 7 |   ],
 8 | 
 9 |   "pluginsConfig": {
10 |     "github": {
11 |       "url": "https://github.com/tuliren/grokking-system-design"
12 |     }
13 |   }
14 | }
15 | 


--------------------------------------------------------------------------------
/basics/queues.md:
--------------------------------------------------------------------------------
1 | Queues
2 | ====
3 | 
4 | - Queues are used to effectively manage requests in a large-scale distributed system, in which different components of the system may need to work in an asynchronous way.
5 | - It is an abstraction between the client’s request and the actual work performed to service it.
6 | - Queues are implemente on the asynchronious communication protocol. When a client submits a task to a queue they are no longer required to wait for the results
7 | - Queue can provide protection from service outages and failures.
8 | 


--------------------------------------------------------------------------------
/basics/proxies.md:
--------------------------------------------------------------------------------
 1 | Proxies
 2 | ====
 3 | 
 4 | - A proxy server is an intermediary piece of hardward / software sitting between client and backend server.
 5 |   - Filter requests
 6 |   - Log requests
 7 |   - Transform requests (encryption, compression, etc)
 8 |   - [Cache](caching.md)
 9 |   - Batch requests
10 |     - Collapsed forwarding: enable multiple client requests for the same URI to be processed as one request to the backend server
11 |     - Collapse requests for data that is spatially close together in the storage to minimize the reads
12 | 


--------------------------------------------------------------------------------
/basics/redundancy.md:
--------------------------------------------------------------------------------
 1 | Redundancy
 2 | ====
 3 | 
 4 | - Redundancy: **duplication of critical data or services** with the intention of increased reliability of the system.
 5 | - Server failover
 6 |   - Remove single points of failure and provide backups (e.g. server failover).
 7 | - Shared-nothing architecture
 8 |   - Each node can operate independently of one another.
 9 |   - No central service managing state or orchestrating activities.
10 |   - New servers can be added without special conditions or knowledge.
11 |   - No single point of failure.
12 | 


--------------------------------------------------------------------------------
/basics/load-balancing.md:
--------------------------------------------------------------------------------
 1 | Load Balancing (LB)
 2 | ====
 3 | 
 4 | Help scale horizontally across an ever-increasing number of servers.
 5 | 
 6 | ## LB locations
 7 | - Between user and web server
 8 | - Between web servers and an internal platform layer (application servers, cache servers)
 9 | - Between internal platform layer and database
10 | 
11 | ## Algorithms
12 | - Least connection
13 | - Least response time
14 | - Least bandwidth
15 | - Round robin
16 | - Weighted round robin
17 | - IP hash
18 | 
19 | ## Implementation
20 | - Smart clients
21 | - Hardware load balancers
22 | - Software load balancers
23 | 


--------------------------------------------------------------------------------
/bin/publish-gh-pages.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/sh
 2 | 
 3 | # install the plugins and build the static site
 4 | gitbook install && gitbook build
 5 | 
 6 | # checkout to the gh-pages branch
 7 | git checkout gh-pages || git checkout -b gh-pages
 8 | 
 9 | # pull the latest updates
10 | git pull origin gh-pages --rebase
11 | 
12 | # copy the static site files into the current directory.
13 | cp -R _book/* .
14 | 
15 | # remove 'node_modules' and '_book' directory
16 | git clean -fx node_modules
17 | git clean -fx _book
18 | 
19 | # add all files
20 | git add .
21 | 
22 | # commit
23 | git commit -a -m "Update docs"
24 | 
25 | # push to the origin
26 | git push -u origin gh-pages
27 | 
28 | # checkout to the master branch
29 | git checkout master
30 | 


--------------------------------------------------------------------------------
/basics/cap-theorem.md:
--------------------------------------------------------------------------------
 1 | [CAP Theorem](https://en.wikipedia.org/wiki/CAP_theorem)
 2 | ====
 3 | 
 4 | - Consistency: every read receives the most recent write or an error.
 5 | - Availability: every request receives a response that is not an error.
 6 | - Partition tolerance: the system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes
 7 | - CAP theorem implies that in the presence of a network partition, one has to choose between consistency and availability
 8 | - CAP is frequently misunderstood as if one has to choose to abandon one of the three guarantees at all times. In fact, the choice is really between consistency and availability only when a network partition or failure happens; at all other times, no trade-off has to be made.
 9 | - [ACID](https://en.wikipedia.org/wiki/ACID) databases choose consistency over availability.
10 | - [BASE](https://en.wikipedia.org/wiki/Eventual_consistency) systems choose availability over consistency.
11 | 


--------------------------------------------------------------------------------
/SUMMARY.md:
--------------------------------------------------------------------------------
 1 | * [Contents](README.md)
 2 | 
 3 | * Basics
 4 |   - [Key Characterics](basics/key-characteristics.md)
 5 |   - [Loading balancing](basics/load-balancing.md)
 6 |   - [Caching](basics/caching.md)
 7 |   - [Sharding](basics/sharding.md)
 8 |   - [Indexes](basics/indexes.md)
 9 |   - [Proxies](basics/proxies.md)
10 |   - [Queues](basics/queues.md)
11 |   - [Redundancy](basics/redundancy.md)
12 |   - [SQL vs. NoSQL](basics/sql-vs-nosql.md)
13 |   - [CAP Theorem](basics/cap-theorem.md)
14 |   - [Consistent Hashing](basics/consistent-hashing.md)
15 |   - [Client Server Communication](basics/client-server-communication.md)
16 | 
17 | * Designs
18 |   - [Short URL Service](designs/short-url.md)
19 |   - [Pastebin](designs/pastebin.md)
20 |   - [Instagram](designs/instagram.md)
21 |   - [Dropbox](designs/dropbox.md)
22 |   - [Twitter](designs/twitter.md)
23 |   - [Youtube](designs/youtube.md)
24 |   - [Twitter Search](designs/twitter-search.md)
25 |   - [Web Crawler](designs/web-crawler.md)
26 |   - [Facebook Newsfeed](designs/facebook-newsfeed.md)
27 |   - [Yelp](designs/yelp.md)
28 |   - [Uber Backend](designs/uber-backend.md)
29 |   - [Ticketmaster](designs/ticketmaster.md)
30 | 


--------------------------------------------------------------------------------
/basics/consistent-hashing.md:
--------------------------------------------------------------------------------
 1 | Consistent Hashing
 2 | ====
 3 | 
 4 | ## Simple hashing
 5 | Problems of simple hashing function `key % n` (`n` is the number of servsers):
 6 | - It is not horizontally scalable. Whenever a new cache host is added to the system, all existing mappings are broken.
 7 | - It may not be load balanced, especially for non-uniformly distributed data. Some servers will become hot spots.
 8 | 
 9 | ## Consistent Hashing
10 | - Consistent hashing maps a key to an integer.
11 | - Imagine that the integers in the range are placed on a ring such that the values are wrapped around.
12 | - Given a list of servers, hash them to integers in the range.
13 | - To map a key to a server:
14 |   - Hash it to a single integer.
15 |   - Move clockwise on the ring until finding the first cache it encounters.
16 | - When the hash table is resized (a server is added or deleted), only `k/n` keys need to be remapped (`k` is the total number of keys, and `n` is the total number of servers).
17 | - To handle hot spots, add “virtual replicas” for caches.
18 |   - Instead of mapping each cache to a single point on the ring, map it to multiple points on the ring (replicas). This way, each cache is associated with multiple portions of the ring.
19 |   - If the hash function is “mixes well,” as the number of replicas increases, the keys will be more balanced.
20 | 


--------------------------------------------------------------------------------
/basics/key-characteristics.md:
--------------------------------------------------------------------------------
 1 | # Key Characteristics of Distributed Systems
 2 | 
 3 | ## Scalability
 4 | - The capability of a system to grow and manage increased demand.
 5 | - A system that can continuously evolve to support growing amount of work is scalable.
 6 | - Horizontal scaling: by adding more servers into the pool of resources.
 7 | - Vertical scaling: by adding more resource (CPU, RAM, storage, etc) to an existing server. This approach comes with downtime and an upper limit.
 8 | 
 9 | ## Reliability
10 | - Reliability is the probability that a system will fail in a given period.
11 | - A distributed system is reliable if it keeps delivering its service even when one or multiple components fail.
12 | - Reliability is achieved through redundancy of components and data (remove every single point of failure).
13 | 
14 | ## Availability
15 | - Availability is the time a system remains operational to perform its required function in a specific period.
16 | - Measured by the percentage of time that a system remains operational under normal conditions.
17 | - A reliable system is available.
18 | - An available system is not necessarily reliable.
19 |   - A system with a security hole is available when there is no security attack.
20 | 
21 | ## Efficiency
22 | - Latency: response time, the delay to obtain the first piece of data.
23 | - Bandwidth: throughput, amount of data delivered in a given time.
24 | 
25 | ## Serviceability / Manageability
26 | - Easiness to operate and maintain the system.
27 | - Simplicity and spend with which a system can be repaired or maintained.
28 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | [Grokking System Design](https://www.educative.io/collection/5668639101419520/5649050225344512)
 2 | ====
 3 | Source: [educative](https://www.educative.io)
 4 | 
 5 | ## Interview Process
 6 | - Scope the problem
 7 |   - Don’t make assumptions.
 8 |   - Ask clarifying questions to understand the constraints and use cases.
 9 |   - Steps
10 |     - Requirements clarifications
11 |     - System interface definition
12 | - Sketch up an abstract design
13 |   - Building blocks of the system
14 |   - Relationships between them
15 |   - Steps
16 |     - Back-of-the-envelope estimation
17 |     - Defining data model
18 |     - High-level design
19 | - Identify and address the bottlenecks
20 |   - Use the fundamental principles of scalable system design
21 |   - Steps
22 |     - Detailed design
23 |     - Identifying and resolving bottlenecks
24 | 
25 | ## Distributed System Design Basics
26 | - [Key Characterics](basics/key-characteristics.md)
27 | - [Loading balancing](basics/load-balancing.md)
28 | - [Caching](basics/caching.md)
29 | - [Sharding](basics/sharding.md)
30 | - [Indexes](basics/indexes.md)
31 | - [Proxies](basics/proxies.md)
32 | - [Queues](basics/queues.md)
33 | - [Redundancy](basics/redundancy.md)
34 | - [SQL vs. NoSQL](basics/sql-vs-nosql.md)
35 | - [CAP Theorem](basics/cap-theorem.md)
36 | - [Consistent Hashing](basics/consistent-hashing.md)
37 | - [Client Server Communication](basics/client-server-communication.md)
38 | 
39 | ## System Designs
40 | - [Short URL Service](designs/short-url.md)
41 | - [Pastebin](designs/pastebin.md)
42 | - [Instagram](designs/instagram.md)
43 | - [Dropbox](designs/dropbox.md)
44 | - [Twitter](designs/twitter.md)
45 | - [Youtube](designs/youtube.md)
46 | - [Twitter Search](designs/twitter-search.md)
47 | - [Web Crawler](designs/web-crawler.md)
48 | - [Facebook Newsfeed](designs/facebook-newsfeed.md)
49 | - [Yelp](designs/yelp.md)
50 | - [Uber Backend](designs/uber-backend.md)
51 | - [Ticketmaster](designs/ticketmaster.md)
52 | 


--------------------------------------------------------------------------------
/basics/client-server-communication.md:
--------------------------------------------------------------------------------
 1 | Client-Server Communication
 2 | ====
 3 | 
 4 | ## Standard HTTP Web Request
 5 | 1. Client opens a connection and requests data from server.
 6 | 2. Server calculates the response.
 7 | 3. Server sends the response back to the client on the opened request.
 8 | 
 9 | ## Ajax Polling
10 | The client repeatedly polls (or requests) a server for data, and waits for the server to respond with data. If no data is available, an empty response is returned.
11 | 
12 | 1. Client opens a connection and requests data from the server using regular HTTP.
13 | 2. The requested webpage sends requests to the server at regular intervals (e.g., 0.5 seconds).
14 | 3. The server calculates the response and sends it back, like regular HTTP traffic.
15 | 4. Client repeats the above three steps periodically to get updates from the server.
16 | 
17 | Problems
18 | - Client has to keep asking the server for any new data.
19 | - A lot of responses are empty, creating HTTP overhead.
20 | 
21 | ## HTTP Long-Polling
22 | The client requests information from the server exactly as in normal polling, but with the expectation that the server may not respond immediately.
23 | 
24 | 1. The client makes an initial request using regular HTTP and then waits for a response.
25 | 2. The server delays its response until an update is available, or until a timeout has occurred.
26 | 3. When an update is available, the server sends a full response to the client.
27 | 4. The client typically sends a new long-poll request, either immediately upon receiving a response or after a pause to allow an acceptable latency period.
28 | 
29 | Each Long-Poll request has a timeout. The client has to reconnect periodically after the connection is closed, due to timeouts.
30 | 
31 | ## WebSockets
32 | - A persistent full duplex communication channels over a single TCP connection. Both server and client can send data at any time.
33 | - A connection is established through WebSocket handshake.
34 | - Low communication overhead.
35 | - Real-time data transfer.
36 | 
37 | ## Server-Sent Event (SSE)
38 | 1. Client requests data from a server using regular HTTP.
39 | 2. The requested webpage opens a connection to the server.
40 | 3. Server sends the data to the client whenever there’s new information available.
41 | 
42 | - Use case:
43 |   - When real-time traffic from server to client is needed.
44 |   - When server generates data in a loop and sends multiple events to client.
45 | 


--------------------------------------------------------------------------------
/basics/sql-vs-nosql.md:
--------------------------------------------------------------------------------
 1 | SQL vs. NoSQL
 2 | ====
 3 | 
 4 | ## Common types of NoSQL
 5 | ### Key-value stores
 6 | - Array of key-value pairs. The "key" is an attribute name.
 7 | - Redis, Vodemort, Dynamo.
 8 | 
 9 | ### Document databases
10 | - Data is stored in documents.
11 | - Documents are grouped in collections.
12 | - Each document can have an entirely different structure.
13 | - CouchDB, MongoDB.
14 | 
15 | ### Wide-column / columnar databases
16 | - Column families - containers for rows.
17 | - No need to know all the columns up front.
18 | - Each row can have different number of columns.
19 | - Cassandra, HBase.
20 | 
21 | ### Graph database
22 | - Data is stored in graph structures
23 |   - Nodes: entities
24 |   - Properties: information about the entities
25 |   - Lines: connections between the entities
26 | - Neo4J, InfiniteGraph
27 | 
28 | ## Differences between SQL and NoSQL
29 | ### Storage
30 | - SQL: store data in tables.
31 | - NoSQL: have different data storage models.
32 | 
33 | ### Schema
34 | - SQL
35 |   - Each record conforms to a fixed schema.
36 |   - Schema can be altered, but it requires modifying the whole database.
37 | - NoSQL:
38 |   - Schemas are dynamic.
39 | 
40 | ### Querying
41 | - SQL
42 |   - Use SQL (structured query language) for defining and manipulating the data.
43 | - NoSQL
44 |   - Queries are focused on a collection of documents.
45 |   - UnQL (unstructured query language).
46 |   - Different databases have different syntax.
47 | 
48 | ### Scalability
49 | - SQL
50 |   - Vertically scalable (by increasing the horsepower: memory, CPU, etc) and expensive.
51 |   - Horizontally scalable (across multiple servers); but it can be challenging and time-consuming.
52 | - NoSQL
53 |   - Horizontablly scalable (by adding more servers) and cheap.
54 | 
55 | ### ACID
56 | - Atomicity, consistency, isolation, durability
57 | - SQL
58 |   - ACID compliant
59 |   - Data reliability
60 |   - Gurantee of transactions
61 | - NoSQL
62 |   - Most sacrifice ACID compliance for performance and scalability.
63 | 
64 | ## Which one to use?
65 | ### SQL
66 | - Ensure ACID compliance.
67 |   - Reduce anomalies.
68 |   - Protect database integrity.
69 | - Data is structured and unchanging.
70 | 
71 | ### NoSQL
72 | - Data has little or no structure.
73 | - Make the most of cloud computing and storage.
74 |   - Cloud-based storage requires data to be easily spread across multiple servers to scale up.
75 | - Rapid development.
76 |   - Frequent updates to the data structure.
77 | 


--------------------------------------------------------------------------------
/basics/sharding.md:
--------------------------------------------------------------------------------
 1 | Sharding / Data Partitioning
 2 | ====
 3 | 
 4 | ## Partitioning methods
 5 | - Horizontal partitioning
 6 |   - Range based sharding.
 7 |   - Put different rows into different tables.
 8 |   - Con
 9 |     - If the value whose range is used for sharding isn’t chosen carefully, the partitioning scheme will lead to unbalanced servers.
10 | - Vertical partitioning
11 |   - Divide data for a specific feature to their own server.
12 |   - Pro
13 |     - Straightforward to implement.
14 |     - Low impact on the application.
15 |   - Con
16 |     - To support growth of the application, a database may need further partitioning.
17 | - Directory-based partitioning
18 |   - A lookup service that knows the partitioning scheme and abstracts it away from the database access code.
19 |   - Allow addition of db servers or change of partitioning schema without impacting application.
20 |   - Con
21 |     - Can be a single point of failure.
22 | 
23 | ## Partitioning criteria
24 | - Key or hash-based partitioning
25 |   - Apply a hash function to some key attribute of the entry to get the partition number.
26 |   - Problem
27 |     - Adding new servers may require changing the hash function, which would need redistribution of data and downtime for the service.
28 |     - Workaround: [consistent hashing](https://en.wikipedia.org/wiki/Consistent_hashing).
29 | - List partitioning
30 |   - Each partition is assigned a list of values.
31 | - Round-robin partitioning
32 |   - With `n` partitions, the `i` tuple is assigned to partition `i % n`.
33 | - Composite partitioning
34 |   - Combine any of above partitioning schemes to devise a new scheme.
35 |   - Consistent hashing is a composite of hash and list partitioning.
36 |     - Key -> reduced key space through hash -> list -> partition.
37 | 
38 | ## Common problems of sharding
39 | Most of the constraints are due to the fact that operations across multiple tables or multiple rows in the same table will no longer run on the same server.
40 | 
41 | - Joins and denormalization
42 |   - Joins will not be performance efficient since data has to be compiled from multiple servers.
43 |   - Workaround: denormalize the database so that queries can be performed from a single table. But this can lead to data inconsistency.
44 | - Referential integrity
45 |   - Difficult to enforce data integrity constraints (e.g. foreign keys).
46 |   - Workaround
47 |     - Referential integrity is enforced by application code.
48 |     - Applications can run SQL jobs to clean up dangling references.
49 | - Rebalancing
50 |   - Necessity of rebalancing
51 |     - Data distribution is not uniform.
52 |     - A lot of load on one shard.
53 |   - Create more db shards or rebalance existing shards changes partitioning scheme and requires data movement.
54 | 


--------------------------------------------------------------------------------
/basics/caching.md:
--------------------------------------------------------------------------------
 1 | Caching
 2 | ====
 3 | 
 4 | - Take advantage of the locality of reference principle: recently requested data is likely to be requested again.
 5 | - Exist at all levels in architecture, but often found at the level nearest to the front end.
 6 | 
 7 | ## Application server cache
 8 | - Cache placed on a request layer node.
 9 | - When a request layer node is expanded to many nodes
10 |   - Load balancer randomly distributes requests across the nodes.
11 |   - The same request can go to different nodes.
12 |   - Increase cache misses.
13 |   - Solutions:
14 |     - Global caches
15 |     - Distributed caches
16 | 
17 | ## Distributed cache
18 | - Each request layer node owns part of the cached data.
19 | - Entire cache is divided up using a consistent hashing function.
20 | - Pro
21 |   - Cache space can be increased easily by adding more nodes to the request pool.
22 | - Con
23 |   - A missing node leads to cache lost.
24 | 
25 | ## Global cache
26 | - A server or file store that is faster than original store, and accessible by all request layer nodes.
27 | - Two common forms
28 |   - Cache server handles cache miss.
29 |     - Used by most applications.
30 |   - Request nodes handle cache miss.
31 |     - Have a large percentage of the hot data set in the cache.
32 |     - An architecture where the files stored in the cache are static and shouldn’t be evicted.
33 |     - The application logic understands the eviction strategy or hot spots better than the cache
34 | 
35 | ## Content distributed network (CDN)
36 | - For sites serving large amounts of static media.
37 | - Process
38 |   - A request first asks the CDN for a piece of static media.
39 |   - CDN serves that content if it has it locally available.
40 |   - If content isn’t available, CDN will query back-end servers for the file, cache it locally and serve it to the requesting user.
41 | - If the system is not large enough for CDN, it can be built like this:
42 |   - Serving static media off a separate subdomain using lightweight HTTP server (e.g. Nginx).
43 |   - Cutover the DNS from this subdomain to a CDN later.
44 | 
45 | ## Cache invalidation
46 | - Keep cache coherent with the source of truth. Invalidate cache when source of truth has changed.
47 | - Write-through cache
48 |   - Data is written into the cache and permanent storage at the same time.
49 |   - Pro
50 |     - Fast retrieval, complete data consistency, robust to system disruptions.
51 |   - Con
52 |     - Higher latency for write operations.
53 | - Write-around cache
54 |   - Data is written to permanent storage, not cache.
55 |   - Pro
56 |     - Reduce the cache that is no used.
57 |   - Con
58 |     - Query for recently written data creates a cache miss and higher latency.
59 | - Write-back cache
60 |   - Data is only written to cache.
61 |   - Write to the permanent storage is done later on.
62 |   - Pro
63 |     - Low latency, high throughput for write-intensive applications.
64 |   - Con
65 |     - Risk of data loss in case of system disruptions.
66 | 
67 | ## Cache eviction policies
68 | - FIFO: first in first out
69 | - LIFE: last in first out
70 | - LRU: least recently used
71 | - MRU: most recently used
72 | - LFU: least frequently used
73 | - RR: random replacement
74 | 


--------------------------------------------------------------------------------
/designs/short-url.md:
--------------------------------------------------------------------------------
  1 | # URL Shortening Service
  2 | 
  3 | ## Summary
  4 | ![overview](../img/short-url-overview.png)
  5 | ![summary](../img/short-url-detail.png)
  6 | 
  7 | ## Requirements
  8 | - Functional Requirements
  9 |   - Given a URL, generate a shorter and unique alias (short link).
 10 |   - When users access a short link, redirect to the original link.
 11 |   - Users should optionally be able to pick a custom short link for their URL.
 12 |   - Links will expire after a standard default timespan. Users should also be able to specify the expiration time.
 13 | 
 14 | - Non-Functional Requirements
 15 |   - The system should be highly available. This is required because, if our service is down, all the URL redirections will start failing.
 16 |   - URL redirection should happen in real-time with minimal latency.
 17 |   - Shortened links should not be guessable (not predictable).
 18 | 
 19 | - Extended Requirements
 20 |   - Analytics; e.g., how many times a redirection happened?
 21 |   - Be accessible through REST APIs by other services.
 22 | 
 23 | ## Capacity Estimation and Constraints
 24 | - Assumption
 25 |   - Read-heavy. More redirection requests compared to new URL shortenings.
 26 |   - Assume **100:1** ratio between read and write.
 27 | 
 28 | - Traffic estimates
 29 |   - **500M** new URL shortenings per month, 100 * 500M => 50B redirections per month.
 30 |   - New URL shortenings per second
 31 |     - 500 million / (30 days * 24 hours * 3600 seconds) = **~200 URLs/s**
 32 |   - URLs redirections per second
 33 |     - 50 billion / (30 days * 24 hours * 3600 sec) = **~19K/s**
 34 | 
 35 | - Storage estimates
 36 |   - Assume storing every URL shortening request for 5 years, each object takes **500 bytes**
 37 |   - Total objects: 500 million * 5 years * 12 months = **30 billion**
 38 |   - Total storage: 30 billion * 500 bytes = **15 TB**
 39 | 
 40 | - Bandwidth estimates
 41 |   - Write: 200 URL/s * 500 bytes/URL = **100 KB/s**
 42 |   - Read: 19K URL/s * 500 bytes/URL = **~9 MB/s**
 43 | 
 44 | - Cache memory estimates
 45 |   - Follow the 80-20 rule, assuming 20% of URLs generate 80% of traffic, cache 20% hot URLs
 46 |   - Requests per day: 19K * 3600 seconds * 24 hours = **~1.7 billion/day**
 47 |   - Cache 20%: 0.2 * 1.7 billion * 500 bytes = **~170GB**
 48 | 
 49 | - Summary
 50 |   - Assuming 500 million new URLs per month and 100:1 read:write ratio
 51 |   Category | Calculation | Estimate
 52 |   ---- | ---- | ----
 53 |   New URLs | 500 million / (30 days * 24 hours * 3600 seconds) | 200 /s
 54 |   URL redirections | 500 million * 100 / (30 days * 24 hours * 3600 seconds) | 19 K/s
 55 |   Incoming data | 500 bytes/URL * 200 URL/s | 100 KB/s
 56 |   Outgoing data | 500 bytes/URL * 19K URL/s | 9 MB/s
 57 |   Storage for 5 years | 500 bytes/URL * 500 million * 60 months | 15 TB
 58 |   Memory for cache | 19K URL * 3600 seconds * 24 hours * 500 bytes * 20% | 170 GB
 59 | 
 60 | ## System APIs
 61 | ### `createUrl`
 62 | - Parameters
 63 |   Name | Type | Note
 64 |   ---- | ---- | ----
 65 |   `api_dev_key` | `string` | The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota.
 66 |   `original_url` | `string` | Original URL to be shortened.
 67 |   `custom_alias` | `string` | Optional custom key for the URL.
 68 |   `user_name` | `string` | Optional user name to be used in encoding.
 69 |   `expire_date` | `string` | Optional expiration date for the shortened URL.
 70 | - Return
 71 |   - `string`
 72 |   - A successful insertion returns the shortened URL; otherwise, it returns an error code.
 73 | 
 74 | ### `deleteUrl`
 75 | - Parameters
 76 |   Name | Type | Note
 77 |   ---- | ---- | ----
 78 |   `api_dev_key` | `string` | The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota.
 79 |   `url_key` | `string` | Short URL.
 80 | - Return
 81 |   - `string`
 82 |   - A successful deletion returns ‘URL Removed’.
 83 | 
 84 | ## Database design
 85 | - Observations
 86 |   - Need to store billions of records.
 87 |   - Each object is small (less than 1K).
 88 |   - No relationships between records—other than storing which user created a URL.
 89 |   - Read-heavy.
 90 |   - A [NoSQL](../basics/sql-vs-nosql.md) choice would also be easier to scale.
 91 |   - Comment: SQL with sharding should also work
 92 | 
 93 | - Schema
 94 |   - URL
 95 |     Column | Type
 96 |     ---- | ----
 97 |     `hash` | varchar(16)
 98 |     `original_url` | varchar(512)
 99 |     `creation_date` | datetime
100 |     `expiration_date` | datetime
101 |     `user_id` | int
102 |   - User
103 |     Column | Type
104 |     ---- | ----
105 |     `name` | varchar(20)
106 |     `email` | varchar(32)
107 |     `creation_date` | datetime
108 |     `last_login` | datetime
109 | 
110 | ## Basic System Design and Algorithm
111 | ### Encoding actual URL
112 | - Compute unique hash
113 |   - `base64`: A-Z, a-z, 0-9, `-`, `.`
114 |   - 6 letters: 64 ^ 6 = ~68.7 billion
115 |   - 8 letters: 64 ^ 8 = ~281 trillion
116 |   - Use 6 letters
117 |   - `MD5` generates 128 bit hash value
118 |   - Each `base64` character encodes 6 bits
119 |   - `base64` encoding generates 22 characters
120 |   - Select 8 characters
121 | - Issues with this approach
122 |   - Same URL from multiple users
123 |   - URL-encoded
124 | - Workaround
125 |   - Append an increasing sequence number to each input URL, and generate a hash for it
126 |   - Append user id to input URL
127 | 
128 | ### Generating keys offline
129 | - Standalone Key Generation Service (KGS)
130 |   - Generate random 6 letter strings and store them in a database (key DB)
131 |   - When a short URL is needed, take one from the key DB
132 | 
133 | - Key DB size
134 |   - 6 characters/key * 68.7B unique keys = 412 GB
135 | 
136 | - Concurrency issue
137 |   - If there are multiple servers reading keys concurrently, two or more servers try to read the same key from the database.
138 | 
139 | - Workaround
140 |   - Servers can use KGS to read/mark keys in the database.
141 |   - KGS can use two tables to store keys: one for keys that are not used yet, and one for all the used keys.
142 |   - KGS can always keep some keys in memory so that it can quickly provide them whenever a server needs them.
143 |   - KGS needs to make sure not to give the same key to multiple servers.
144 |   - Comment: keys are sharded. Each KGS server only serves one application server.
145 | 
146 | - Key lookup
147 |   - When a key is found, issue an "HTTP 302 Redirect" status and passing the stored URL.
148 |   - When a key is not found, issue an "HTTP 404 Not Found", or redirect to homepage.
149 | 
150 | ### UUID
151 | Replace KGS with UUID.
152 | 
153 | ## Data Partitioning and Replication
154 | - Range Based Partitioning
155 |   - Store URLs in separate partitions based on the first letter of the URL or the hash key.
156 |   - Combine certain less frequently occurring letters into one database partition.
157 | - Problem with this approach
158 |   - Unbalanced servers.
159 | 
160 | - Hash-Based Partitioning
161 |   - Take a hash of the short URL we are storing, and calculate which partition to use based upon the hash.
162 |   - Use [consistent hashing](../basics/consistent-hashing.md)
163 | 
164 | ## Cache
165 | - Eviction policy
166 |   - LRU: discard the least recently used URL first
167 | - Cache update
168 |   - Cache miss: hit backend database and pass new entry to all cache replicas
169 | 
170 | ## Load Balancer (LB)
171 | - LB locations
172 |   - Between Clients and Application servers
173 |   - Between Application Servers and database servers
174 |   - Between Application Servers and Cache servers
175 | 
176 | ## DB Sweeping
177 | A separate Cleanup service can run periodically to remove expired links from our storage and cache.
178 | 
179 | ## Telemetry
180 | Statistics about the system: how many times a short URL has been used
181 | 
182 | ## Security and Permissions
183 | - Store permission level (public/private) with each URL in the database
184 | - Send an error (HTTP 401) for unauthorized access
185 | 


--------------------------------------------------------------------------------