├── Linux └── Essential Linux.pdf ├── Kaggle Course Notes └── Kaggle Course Notes.pdf ├── Princeton Algorithm └── Princeton Algorithm Coursera Notes Junfan Zhu.pdf ├── Coding Interview Patterns: Nail Your Next Coding Interview └── Bonus_Pdf.pdf ├── Grokking the System Design Interview ├── Grokking the System Design Interview.pdf └── Grokking the System Design Interview.md ├── README.md └── Beyond Cracking the Coding Interview └── Beyond Cracking the Coding Interview.md /Linux/Essential Linux.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/junfanz1/Software-Engineer-Coding-Interviews/HEAD/Linux/Essential Linux.pdf -------------------------------------------------------------------------------- /Kaggle Course Notes/Kaggle Course Notes.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/junfanz1/Software-Engineer-Coding-Interviews/HEAD/Kaggle Course Notes/Kaggle Course Notes.pdf -------------------------------------------------------------------------------- /Princeton Algorithm/Princeton Algorithm Coursera Notes Junfan Zhu.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/junfanz1/Software-Engineer-Coding-Interviews/HEAD/Princeton Algorithm/Princeton Algorithm Coursera Notes Junfan Zhu.pdf -------------------------------------------------------------------------------- /Coding Interview Patterns: Nail Your Next Coding Interview/Bonus_Pdf.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/junfanz1/Software-Engineer-Coding-Interviews/HEAD/Coding Interview Patterns: Nail Your Next Coding Interview/Bonus_Pdf.pdf -------------------------------------------------------------------------------- /Grokking the System Design Interview/Grokking the System Design Interview.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/junfanz1/Software-Engineer-Coding-Interviews/HEAD/Grokking the System Design Interview/Grokking the System Design Interview.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Software-Engineer-Coding-Interviews 3 | 4 | 5 | 6 | - [1. System Design Interview](#1-system-design-interview) 7 | * [ByteByteGo - GenAI/ML/Modern System Design Interview](#bytebytego-genaimlmodern-system-design-interview) 8 | * [Educative - GenAI/Modern System Design Interview](#educative-genaimodern-system-design-interview) 9 | - [2. Coding Interview](#2-coding-interview) 10 | - [3. Linux, Git](#3-linux-git) 11 | - [4. Algorithms, Data Science](#4-algorithms-data-science) 12 | * [Star History](#star-history) 13 | 14 | 15 | 16 | 17 | --- 18 | 19 | 20 | # 1. System Design Interview 21 | 22 | 23 | ## ByteByteGo - GenAI/ML/Modern System Design Interview 24 | 25 | 26 | > [System Design Interview, An Insider's Guide, Second Edition - by Alex Xu, 2020](https://www.amazon.com/System-Design-Interview-insiders-Second/dp/B08CMF2CQF) | [__PDF Notes-Chinese__](https://github.com/junfanz1/Quant-Books-Notes/blob/main/System%20Design/Notes%20on%20System%20Design.pdf) 27 | 28 | > [Generative AI System Design Interview - by Ali Aminian, Hao Sheng, 2024](https://www.amazon.com/Generative-AI-System-Design-Interview/dp/1736049143) | [__Markdown Notes__](https://github.com/junfanz1/AI-LLM-ML-CS-Quant-Review/blob/main/System%20Design/GenAI%20System%20Design%20Interview.md) 29 | 30 | > [Machine Learning System Design Interview - by Ali Aminian, Alex Xu, 2023](https://www.amazon.com/Machine-Learning-System-Design-Interview/dp/1736049127) | [__Markdown Notes__](https://github.com/junfanz1/AI-LLM-ML-CS-Quant-Review/blob/main/System%20Design/ML%20System%20Design%20Interview.md) 31 | 32 |

33 |

34 |

35 |

36 |

37 | 38 | 39 | ## Educative - GenAI/Modern System Design Interview 40 | 41 | > [Educative - Grokking System Design Interview](https://www.educative.io/verify-certificate/B86jYxWPP3JhA8lAZw0B2Mhr92YjJNmG5Ty) | [__PDF Notes__](https://github.com/junfanz1/CS-Online-Course-Notes/blob/main/Grokking%20the%20System%20Design%20Interview/Grokking%20the%20System%20Design%20Interview.pdf) | [__Markdown Notes__](https://github.com/junfanz1/CS-Online-Course-Notes/blob/main/Grokking%20the%20System%20Design%20Interview/Grokking%20the%20System%20Design%20Interview.md) 42 | 43 | > [Educative - Grokking the Modern System Design Interview](https://www.educative.io/courses/grokking-the-system-design-interview) | [__Markdown Notes__](https://github.com/junfanz1/AI-LLM-ML-CS-Quant-Overview/blob/main/System%20Design/Modern%20System%20Design.md) 44 | 45 | > [Educative - GenAI System Design](https://www.educative.io/verify-certificate/RgxzXQFQkKyYgKrGjTX1RQpE9J3vT6) | [__Notes__](https://github.com/junfanz1/AI-LLM-ML-CS-Quant-Readings/blob/main/System%20Design/GenAI%20System%20Design.md) 46 | 47 | 48 | 49 | # 2. Coding Interview 50 | 51 | > [Coding Interview Patterns: Nail Your Next Coding Interview - by Alex Xu, Shaun Gunawardane, 2024](https://www.amazon.com/Coding-Interview-Patterns-Nail-Your/dp/1736049135) | [__Markdown Notes__](https://github.com/junfanz1/Coding-Interview-Practices/blob/main/Coding%20Interview%20Patterns:%20Nail%20Your%20Next%20Coding%20Interview/Coding%20Interview%20Patterns,%20Alex%20Xu.md) | [__Bonus PDF of the Book__](https://github.com/junfanz1/Coding-Interview-Practices/blob/main/Coding%20Interview%20Patterns%3A%20Nail%20Your%20Next%20Coding%20Interview/Bonus_Pdf.pdf) 52 | 53 | > [Beyond Cracking the Coding Interview - by Gayle Laakmann McDowell, Mike Mroczka, Aline Lerner, Nil Mamano, 2025](https://www.amazon.com/Beyond-Cracking-Coding-Interview-Successfully/dp/195570600X) | [__Markdown Notes__](https://github.com/junfanz1/Software-Engineer-Coding-Interviews/blob/main/Beyond%20Cracking%20the%20Coding%20Interview/Beyond%20Cracking%20the%20Coding%20Interview.md) 54 | 55 | > [Educative - Grokking the Coding Interview Patterns in Python](https://www.educative.io/courses/grokking-coding-interview-in-python) | [__Markdown Notes__](https://github.com/junfanz1/Software-Engineer-Coding-Interviews/blob/main/Grokking%20the%20Coding%20Interview%20Patterns%20in%20Python/Grokking%20the%20Coding%20Interview%20Patterns%20in%20Python.md) 56 | 57 | 58 |

59 |

60 |

61 |

62 | 63 | 64 | # 3. Linux, Git 65 | 66 | Linux, Git CheatSheet | [__PDF Notes__](https://github.com/junfanz1/Coding-Interview-Practices/blob/main/Linux/Essential%20Linux.pdf) 67 | 68 | 69 | # 4. Algorithms, Data Science 70 | 71 | Algorithm Part I and Part II, by Robert Sedgewick and Kevin Wayne, Princeton Coursera. 72 | 73 | > [Char 1-6](https://www.coursera.org/learn/algorithms-part1) | [Char 7-12](https://www.coursera.org/learn/algorithms-part2) | [__PDF Notes__](https://github.com/junfanz1/CS-Online-Course-Notes/blob/main/Princeton%20Algorithm/Princeton%20Algorithm%20Coursera%20Notes%20Junfan%20Zhu.pdf) | [__Markdown Notes__](https://github.com/junfanz1/CS-Online-Course-Notes/blob/main/Princeton%20Algorithm/Princeton%20Algorithm%20Coursera%20Notes.md) 74 | 75 | Kaggle Notes 76 | 77 | > [Kaggle Mini-courses](https://www.kaggle.com/learn) | [__PDF Notes__](https://github.com/junfanz1/CS-Online-Course-Notes/blob/main/Kaggle%20Course%20Notes/Kaggle%20Course%20Notes.pdf) | [__Markdown Notes__](https://github.com/junfanz1/CS-Online-Course-Notes/blob/main/Kaggle%20Course%20Notes/Kaggle%20Course%20Notes.md) 78 | 79 | 80 | --- 81 | 82 |

97 | 98 | 99 | 100 | ## Star History 101 | 102 | [![Star History Chart](https://api.star-history.com/svg?repos=junfanz1/Coding-Interview-Practices&type=Date)](https://star-history.com/#junfanz1/Coding-Interview-Practices&Date) 103 | 104 | -------------------------------------------------------------------------------- /Grokking the System Design Interview/Grokking the System Design Interview.md: -------------------------------------------------------------------------------- 1 | Grokking the System Design Interview 2 | =========== 3 | 4 | $2021-04-12$ 5 | 6 | Junfan Zhu 7 | ----------- 8 | 9 | (`junfanz@gatech.edu`; `junfanzhu@uchicago.edu`) 10 | 11 | Course Links 12 | ------------- 13 | 14 | https://www.educative.io/courses/grokking-the-system-design-interview/ 15 | 16 | ---- 17 | 18 | Table of Contents 19 | 20 | 21 | 22 | - [Grokking the System Design Interview](#grokking-the-system-design-interview) 23 | - [Junfan Zhu](#junfan-zhu) 24 | - [Course Links](#course-links) 25 | - [1. Back-of-the-envelope estimation](#1-back-of-the-envelope-estimation) 26 | - [2. Shortening URL](#2-shortening-url) 27 | - [2.1. Encoding actual URL](#21-encoding-actual-url) 28 | - [2.2. Cache](#22-cache) 29 | - [2.3. Load Balancer (LB)](#23-load-balancer-lb) 30 | - [3. DropBox](#3-dropbox) 31 | - [3.1. Clarify Requirements and Goals of the System](#31-clarify-requirements-and-goals-of-the-system) 32 | - [4. Facebook Messenger](#4-facebook-messenger) 33 | - [4.1. Message Handling](#41-message-handling) 34 | - [4.2. Storing and retrieving the messages from the database](#42-storing-and-retrieving-the-messages-from-the-database) 35 | - [5. YouTube](#5-youtube) 36 | - [5.1. Metadata Sharding](#51-metadata-sharding) 37 | - [5.1.1. Sharding based on UserID](#511-sharding-based-on-userid) 38 | - [5.1.2. Sharding based on VideoID](#512-sharding-based-on-videoid) 39 | - [5.2. Load Balancing](#52-load-balancing) 40 | - [6. Designing Typeahead Suggestion](#6-designing-typeahead-suggestion) 41 | - [7. API Rate Limiter](#7-api-rate-limiter) 42 | - [8. Web Crawler](#8-web-crawler) 43 | - [8.1. How to crawl?](#81-how-to-crawl) 44 | - [8.2. Component Design](#82-component-design) 45 | - [9. Facebook’s Newsfeed](#9-facebooks-newsfeed) 46 | - [9.1. Feed generation](#91-feed-generation) 47 | - [9.2. Feed publishing](#92-feed-publishing) 48 | - [9.3. Data Partitioning](#93-data-partitioning) 49 | - [10. Yelp](#10-yelp) 50 | - [10.1. Dynamic size grids](#101-dynamic-size-grids) 51 | - [11. Ticket Master](#11-ticket-master) 52 | - [11.1. Active Reservations Service](#111-active-reservations-service) 53 | - [11.2. Waiting Users Service](#112-waiting-users-service) 54 | - [11.3. Concurrency](#113-concurrency) 55 | - [12. Load Balancing](#12-load-balancing) 56 | - [12.1. Benefits](#121-benefits) 57 | - [12.2. Algorithms](#122-algorithms) 58 | - [13. Caching](#13-caching) 59 | - [13.1. Cache Invalidation](#131-cache-invalidation) 60 | - [13.2. Cache eviction policies](#132-cache-eviction-policies) 61 | - [14. Data Partitioning](#14-data-partitioning) 62 | - [14.1. Partitioning Criteria](#141-partitioning-criteria) 63 | - [14.2. Common Problems of Data Partitioning](#142-common-problems-of-data-partitioning) 64 | - [15. Proxy Server](#15-proxy-server) 65 | - [15.1. Open Proxy](#151-open-proxy) 66 | - [15.2. Reverse Proxy](#152-reverse-proxy) 67 | - [16. SQL & NoSQL](#16-sql--nosql) 68 | - [16.1. SQL](#161-sql) 69 | - [16.2. NoSQL](#162-nosql) 70 | - [16.3. Differences: SQL vs. NoSQL](#163-differences-sql-vs-nosql) 71 | - [16.4. Choose which?](#164-choose-which) 72 | - [16.4.1. SQL](#1641-sql) 73 | - [16.4.2. NoSQL](#1642-nosql) 74 | - [17. CAP Theorem](#17-cap-theorem) 75 | - [18. Consistent Hashing](#18-consistent-hashing) 76 | - [18.1. Improve caching system](#181-improve-caching-system) 77 | - [18.2. Consistent Hashing](#182-consistent-hashing) 78 | - [18.3. Algorithm](#183-algorithm) 79 | 80 | 81 | 82 | --------- 83 | 84 | # 1. Back-of-the-envelope estimation 85 | 86 | Scaling, partitioning, load balancing, and caching. 87 | 88 | - What scale is expected from the system? 89 | - How much storage will we need? 90 | - What network bandwidth usage are we expecting? 91 | 92 | # 2. Shortening URL 93 | 94 | ## 2.1. Encoding actual URL 95 | 96 | Use MD5 algorithm as hash function $\Rightarrow$ produce a 128-bit hash value. 97 | 98 | ## 2.2. Cache 99 | 100 | __Which cache eviction policy would best fit our needs?__ 101 | 102 | When the cache is full, and we want to replace a link with a newer/hotter URL, how would we choose? . Under this policy, we discard the least recently used URL first. We can use a Linked Hash Map or a similar data structure to store our URLs and Hashes, which will also keep track of the URLs that have been accessed recently. 103 | 104 | To further increase the efficiency, we can replicate our caching servers to distribute the load between them. 105 | 106 | ## 2.3. Load Balancer (LB) 107 | 108 | We can add a Load balancing layer at three places in our system: 109 | 110 | - Between Clients and Application servers 111 | - Between Application Servers and database servers 112 | - Between Application Servers and Cache servers 113 | 114 | # 3. DropBox 115 | 116 | ## 3.1. Clarify Requirements and Goals of the System 117 | 118 | What do we wish to achieve from a Cloud Storage system? Here are the top-level requirements for our system: 119 | 120 | - Users should be able to upload and download their files/photos from any device. 121 | Users should be able to share files or folders with other users. 122 | - Our service should support automatic synchronization between devices, i.e., after updating a file on one device, it should get synchronized on all devices. 123 | - The system should support storing large files up to a GB. 124 | - ACID-ity is required. Atomicity, Consistency, Isolation and Durability of all file operations should be guaranteed. 125 | - Our system should support offline editing. Users should be able to add/delete/modify files while offline, and as soon as they come online, all their changes should be synced to the remote servers and other online devices. 126 | 127 | 128 | # 4. Facebook Messenger 129 | 130 | ## 4.1. Message Handling 131 | 132 | How does the messenger maintain the sequencing of the messages? We can store a timestamp with each message, which is the time the message is received by the server. This will still not ensure the correct ordering of messages for clients. The scenario where the server timestamp cannot determine the exact order of messages would look like this: 133 | 134 | - User-1 sends a message M1 to the server for User-2. 135 | - The server receives M1 at T1. 136 | - Meanwhile, User-2 sends a message M2 to the server for User-1. 137 | - The server receives the message M2 at T2, such that T2 > T1. 138 | - The server sends message M1 to User-2 and M2 to User-1. 139 | - So User-1 will see M1 first and then M2, whereas User-2 will see M2 first and then M1. 140 | 141 | To resolve this, we need to keep a sequence number with every message for each client. This sequence number will determine the exact ordering of messages for EACH user. With this solution, both clients will see a different view of the message sequence, but this view will be consistent for them on all devices. 142 | 143 | ## 4.2. Storing and retrieving the messages from the database 144 | 145 | Whenever the chat server receives a new message, it needs to store it in the database. To do so, we have two options: 146 | 147 | - Start a separate thread, which will work with the database to store the message. 148 | - Send an asynchronous request to the database to store the message. 149 | 150 | We have to keep certain things in mind while designing our database: 151 | 152 | - How to efficiently work with the database connection pool. 153 | - How to retry failed requests. 154 | - Where to log those requests that failed even after some retries. 155 | - How to retry these logged requests (that failed after the retry) when all the issues have resolved. 156 | 157 | __Which storage system we should use?__ 158 | 159 | We need to have a database that can support a very high rate of small updates and also fetch a range of records quickly. We cannot use RDBMS like MySQL or NoSQL like MongoDB because we cannot afford to read/write a row from the database every time a user receives/sends a message. This will not only make the basic operations of our service run with high latency but also create a huge load on databases. 160 | 161 | Both of our requirements can be easily met with a wide-column database solution like HBase. HBase is a column-oriented key-value NoSQL database that can store multiple values against one key into multiple columns. HBase is modeled after Google’s BigTable and runs on top of Hadoop Distributed File System (HDFS). HBase groups data together to store new data in a memory buffer and, once the buffer is full, it dumps the data to the disk. This way of storage not only helps to store a lot of small data quickly but also fetching rows by the key or scanning ranges of rows. HBase is also an efficient database to store variable-sized data, which is also required by our service. 162 | 163 | __Design Summary:__ 164 | 165 | Clients will open a connection to the chat server to send a message; the server will then pass it to the requested user. All the active users will keep a connection open with the server to receive messages. Whenever a new message arrives, the chat server will push it to the receiving user on the long poll request. Messages can be stored in HBase, which supports quick small updates, and range based searches. The servers can broadcast the online status of a user to other relevant users. Clients can pull status updates for users who are visible in the client’s viewport on a less frequent basis. 166 | 167 | # 5. YouTube 168 | 169 | ## 5.1. Metadata Sharding 170 | 171 | Since we have a huge number of new videos every day and our read load is extremely high, therefore, we need to distribute our data onto multiple machines so that we can perform read/write operations efficiently. 172 | 173 | ### 5.1.1. Sharding based on UserID 174 | 175 | We can try storing all the data for a particular user on one server. While storing, we can pass the UserID to our hash function, which will map the user to a database server where we will store all the metadata for that user’s videos. While querying for videos of a user, we can ask our hash function to find the server holding the user’s data and then read it from there. To search videos by titles, we will have to query all servers, and each server will return a set of videos. A centralized server will then aggregate and rank these results before returning them to the user. 176 | 177 | ### 5.1.2. Sharding based on VideoID 178 | 179 | Our hash function will map each VideoID to a random server where we will store that Video’s metadata. To find videos of a user, we will query all servers, and each server will return a set of videos. A centralized server will aggregate and rank these results before returning them to the user. This approach solves our problem of popular users but shifts it to popular videos. 180 | 181 | ## 5.2. Load Balancing 182 | 183 | We should use Consistent Hashing among our cache servers, which will also help in balancing the load between cache servers. Since we will be using a static hash-based scheme to map videos to hostnames, it can lead to an uneven load on the logical replicas due to each video’s different popularity. For instance, if a video becomes popular, the logical replica corresponding to that video will experience more traffic than other servers. These uneven loads for logical replicas can then translate into uneven load distribution on corresponding physical servers. To resolve this issue, any busy server in one location can redirect a client to a less busy server in the same cache location. We can use dynamic HTTP redirections for this scenario. Consistent hashing will not only help in replacing a dead server but also help in distributing load among servers. 184 | 185 | However, the use of redirections also has its drawbacks. First, since our service tries to load balance locally, it leads to multiple redirections if the host that receives the redirection can’t serve the video. Also, each redirection requires a client to make an additional HTTP request; it also leads to higher delays before the video starts playing back. Moreover, inter-tier (or cross data-center) redirections lead a client to a distant cache location because the higher tier caches are only present at a small number of locations. 186 | 187 | # 6. Designing Typeahead Suggestion 188 | 189 | We can have a Map-Reduce (MR) set-up to process all the logging data periodically say every hour. These MR jobs will calculate frequencies of all searched terms in the past hour. We can then update our trie with this new data. We can take the current snapshot of the trie and update it with all the new terms and their frequencies. We should do this offline as we don’t want our read queries to be blocked by update trie requests. We can have two options: 190 | 191 | - We can make a copy of the trie on each server to update it offline. Once done we can switch to start using it and discard the old one. 192 | - Another option is we can have a primary-secondary configuration for each trie server. We can update the secondary while the primary is serving traffic. Once the update is complete, we can make the secondary our new primary. We can later update our old primary, which can then start serving traffic, too. 193 | 194 | # 7. API Rate Limiter 195 | 196 | Rate Limiting helps to protect services against abusive behaviors targeting the application layer like Denial-of-service (DOS) attacks, brute-force password attempts, brute-force credit card transactions, etc. These attacks are usually a barrage of HTTP/S requests which may look like they are coming from real users, but are typically generated by machines (or bots). As a result, these attacks are often harder to detect and can more easily bring down a service, application, or an API. 197 | 198 | Rate limiting is also used to prevent revenue loss, to reduce infrastructure costs, to stop spam, and to stop online harassment. Following is a list of scenarios that can benefit from Rate limiting by making a service (or API) more reliable: 199 | 200 | - Misbehaving clients/scripts: Either intentionally or unintentionally, some entities can overwhelm a service by sending a large number of requests. Another scenario could be when a user is sending a lot of lower-priority requests and we want to make sure that it doesn’t affect the high-priority traffic. For example, users sending a high volume of requests for analytics data should not be allowed to hamper critical transactions for other users. 201 | - Security: By limiting the number of the second-factor attempts (in 2-factor auth) that the users are allowed to perform, for example, the number of times they’re allowed to try with a wrong password. 202 | - To prevent abusive behavior and bad design practices: Without API limits, developers of client applications would use sloppy development tactics, for example, requesting the same information over and over again. 203 | - To keep costs and resource usage under control: Services are generally designed for normal input behavior, for example, a user writing a single post in a minute. Computers could easily push thousands/second through an API. Rate limiter enables controls on service APIs. 204 | - Revenue: Certain services might want to limit operations based on the tier of their customer’s service and thus create a revenue model based on rate limiting. There could be default limits for all the APIs a service offers. To go beyond that, the user has to buy higher limits 205 | - To eliminate spikiness in traffic: Make sure the service stays up for everyone else. 206 | 207 | # 8. Web Crawler 208 | 209 | ## 8.1. How to crawl? 210 | 211 | Breadth-first or depth-first? Breadth First Search (BFS) is usually used. However, Depth First Search (DFS) is also utilized in some situations, such as, if your crawler has already established a connection with the website, it might just DFS all the URLs within this website to save some handshaking overhead. 212 | 213 | Path-ascending crawling: Path-ascending crawling can help discover a lot of isolated resources or resources for which no inbound link would have been found in regular crawling of a particular Web site. In this scheme, a crawler would ascend to every path in each URL that it intends to crawl. 214 | 215 | ## 8.2. Component Design 216 | 217 | 1. The URL frontier: The URL frontier is the data structure that contains all the URLs that remain to be downloaded. We can crawl by performing a breadth-first traversal of the Web, starting from the pages in the seed set. Such traversals are easily implemented by using a FIFO queue. 218 | 219 | 2. The fetcher module: The purpose of a fetcher module is to download the document corresponding to a given URL using the appropriate network protocol like HTTP. As discussed above, webmasters create robot.txt to make certain parts of their websites off-limits for the crawler. To avoid downloading this file on every request, our crawler’s HTTP protocol module can maintain a fixed-sized cache mapping host-names to their robot’s exclusion rules. 220 | 221 | 3. Document input stream: Our crawler’s design enables the same document to be processed by multiple processing modules. To avoid downloading a document multiple times, we cache the document locally using an abstraction called a Document Input Stream (DIS). 222 | 223 | 4. Document Dedupe test: Many documents on the Web are available under multiple, different URLs. There are also many cases in which documents are mirrored on various servers. Both of these effects will cause any Web crawler to download the same document multiple times. To prevent the processing of a document more than once, we perform a dedupe test on each document to remove duplication. 224 | 225 | 5. URL filters: The URL filtering mechanism provides a customizable way to control the set of URLs that are downloaded. This is used to blacklist websites so that our crawler can ignore them. Before adding each URL to the frontier, the worker thread consults the user-supplied URL filter. We can define filters to restrict URLs by domain, prefix, or protocol type. 226 | 227 | 6. Domain name resolution: Before contacting a Web server, a Web crawler must use the Domain Name Service (DNS) to map the Web server’s hostname into an IP address. DNS name resolution will be a big bottleneck of our crawlers given the amount of URLs we will be working with. To avoid repeated requests, we can start caching DNS results by building our local DNS server. 228 | 229 | 7. URL dedupe test: While extracting links, any Web crawler will encounter multiple links to the same document. To avoid downloading and processing a document multiple times, a URL dedupe test must be performed on each extracted link before adding it to the URL frontier. 230 | 231 | 8. Checkpointing: A crawl of the entire Web takes weeks to complete. To guard against failures, our crawler can write regular snapshots of its state to the disk. An interrupted or aborted crawl can easily be restarted from the latest checkpoint. 232 | 233 | # 9. Facebook’s Newsfeed 234 | 235 | Component Design 236 | 237 | ## 9.1. Feed generation 238 | 239 | Offline generation for newsfeed: We can have dedicated servers that are continuously generating users’ newsfeed and storing them in memory. So, whenever a user requests for the new posts for their feed, we can simply serve it from the pre-generated, stored location. Using this scheme, user’s newsfeed is not compiled on load, but rather on a regular basis and returned to users whenever they request for it. 240 | 241 | We can store FeedItemIDs in a data structure similar to Linked HashMap or TreeMap, which can allow us to not only jump to any feed item but also iterate through the map easily. Whenever users want to fetch more feed items, they can send the last FeedItemID they currently see in their newsfeed, we can then jump to that FeedItemID in our hash-map and return next batch/page of feed items from there. 242 | 243 | ## 9.2. Feed publishing 244 | 245 | The process of pushing a post to all the followers is called fanout. By analogy, the push approach is called fanout-on-write, while the pull approach is called fanout-on-load. Let’s discuss different options for publishing feed data to users. 246 | 247 | 1. "Pull" model or Fan-out-on-load 248 | 249 | This method involves keeping all the recent feed data in memory so that users can pull it from the server whenever they need it. Clients can pull the feed data on a regular basis or manually whenever they need it. Possible problems with this approach are a) New data might not be shown to the users until they issue a pull request, b) It’s hard to find the right pull cadence, as most of the time pull requests will result in an empty response if there is no new data, causing waste of resources. 250 | 251 | 2. "Push" model or Fan-out-on-write. 252 | 253 | For a push system, once a user has published a post, we can immediately push this post to all the followers. The advantage is that when fetching feed you don’t need to go through your friend’s list and get feeds for each of them. It significantly reduces read operations. To efficiently handle this, users have to maintain a Long Poll request with the server for receiving the updates. A possible problem with this approach is that when a user has millions of followers (a celebrity-user) the server has to push updates to a lot of people. 254 | 255 | 3. Hybrid 256 | 257 | An alternate method to handle feed data could be to use a hybrid approach, i.e., to do a combination of fan-out-on-write and fan-out-on-load. Specifically, we can stop pushing posts from users with a high number of followers (a celebrity user) and only push data for those users who have a few hundred (or thousand) followers. For celebrity users, we can let the followers pull the updates. Since the push operation can be extremely costly for users who have a lot of friends or followers, by disabling fanout for them, we can save a huge number of resources. Another alternate approach could be that, once a user publishes a post, we can limit the fanout to only her online friends. Also, to get benefits from both the approaches, a combination of ‘push to notify’ and ‘pull for serving’ end-users is a great way to go. Purely a push or pull model is less versatile. 258 | 259 | ## 9.3. Data Partitioning 260 | 261 | 1. Sharding posts and metadata 262 | 263 | Since we have a huge number of new posts every day and our read load is extremely high too, we need to distribute our data onto multiple machines such that we can read/write it efficiently. For sharding our databases that are storing posts and their metadata, we can have a similar design as discussed under Designing Twitter. 264 | 265 | 2. Sharding feed data 266 | 267 | For feed data, which is being stored in memory, we can partition it based on UserID. We can try storing all the data of a user on one server. When storing, we can pass the UserID to our hash function that will map the user to a cache server where we will store the user’s feed objects. Also, for any given user, since we don’t expect to store more than 500 FeedItemIDs, we will not run into a scenario where feed data for a user doesn’t fit on a single server. To get the feed of a user, we would always have to query only one server. For future growth and replication, we must use Consistent Hashing. 268 | 269 | # 10. Yelp 270 | 271 | ## 10.1. Dynamic size grids 272 | 273 | Let’s assume we don’t want to have more than 500 places in a grid so that we can have a faster searching. So, whenever a grid reaches this limit, we break it down into four grids of equal size and distribute places among them. This means thickly populated areas like downtown San Francisco will have a lot of grids, and sparsely populated area like the Pacific Ocean will have large grids with places only around the coastal lines. 274 | 275 | What data-structure can hold this information? A tree in which each node has four children can serve our purpose. Each node will represent a grid and will contain information about all the places in that grid. If a node reaches our limit of 500 places, we will break it down to create four child nodes under it and distribute places among them. In this way, all the leaf nodes will represent the grids that cannot be further broken down. So leaf nodes will keep a list of places with them. This tree structure in which each node can have four children is called a QuadTree. 276 | 277 | # 11. Ticket Master 278 | 279 | How would the server keep track of all the active reservations that haven’t been booked yet? And how would the server keep track of all the waiting customers? 280 | 281 | We need two daemon services, one to keep track of all active reservations and remove any expired reservation from the system; let’s call it ActiveReservationService. The other service would be keeping track of all the waiting user requests and, as soon as the required number of seats become available, it will notify the (the longest waiting) user to choose the seats; let’s call it WaitingUserService. 282 | 283 | ## 11.1. Active Reservations Service 284 | 285 | We can keep all the reservations of a ‘show’ in memory in a data structure similar to Linked HashMap or a TreeMap in addition to keeping all the data in the database. We will need a linked HashMap kind of data structure that allows us to jump to any reservation to remove it when the booking is complete. Also, since we will have expiry time associated with each reservation, the head of the HashMap will always point to the oldest reservation record so that the reservation can be expired when the timeout is reached. 286 | 287 | To store every reservation for every show, we can have a HashTable where the ‘key’ would be ‘ShowID’, and the ‘value’ would be the Linked HashMap containing ‘BookingID’ and creation ‘Timestamp’. 288 | 289 | In the database, we will store the reservation in the ‘Booking’ table and the expiry time will be in the Timestamp column. The ‘Status’ field will have a value of ‘Reserved (1)’ and, as soon as a booking is complete, the system will update the ‘Status’ to ‘Booked (2)’ and remove the reservation record from the Linked HashMap of the relevant show. When the reservation is expired, we can either remove it from the Booking table or mark it ‘Expired (3)’ in addition to removing it from memory. 290 | 291 | ActiveReservationsService will also work with the external financial service to process user payments. Whenever a booking is completed, or a reservation gets expired, WaitingUsersService will get a signal so that any waiting customer can be served. 292 | 293 | ## 11.2. Waiting Users Service 294 | 295 | Just like ActiveReservationsService, we can keep all the waiting users of a show in memory in a Linked HashMap or a TreeMap. We need a data structure similar to Linked HashMap so that we can jump to any user to remove them from the HashMap when the user cancels their request. Also, since we are serving in a first-come-first-serve manner, the head of the Linked HashMap would always be pointing to the longest waiting user, so that whenever seats become available, we can serve users in a fair manner. 296 | 297 | We will have a HashTable to store all the waiting users for every Show. The ‘key’ would be 'ShowID, and the ‘value’ would be a Linked HashMap containing ‘UserIDs’ and their wait-start-time. 298 | 299 | Clients can use Long Polling for keeping themselves updated for their reservation status. Whenever seats become available, the server can use this request to notify the user. 300 | 301 | Reservation Expiration 302 | On the server, ActiveReservationsService keeps track of expiry (based on reservation time) of active reservations. As the client will be shown a timer (for the expiration time), which could be a little out of sync with the server, we can add a buffer of five seconds on the server to safeguard from a broken experience, such that the client never times out after the server, preventing a successful purchase. 303 | 304 | ## 11.3. Concurrency 305 | 306 | How to handle concurrency, such that no two users are able to book the same seat. We can use transactions in SQL databases to avoid any clashes. For example, if we are using an SQL server we can utilize Transaction Isolation Levels to lock the rows before we can update them. Here is the sample code: 307 | 308 | ```sql 309 | SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; 310 | 311 | BEGIN TRANSACTION; 312 | 313 | -- Suppose we intend to reserve three seats (IDs: 54, 55, 56) for ShowID=99 314 | Select * From Show_Seat where ShowID=99 && ShowSeatID in (54, 55, 56) && Status=0 -- free 315 | 316 | -- if the number of rows returned by the above statement is three, we can update to 317 | -- return success otherwise return failure to the user. 318 | update Show_Seat ... 319 | update Booking ... 320 | 321 | COMMIT TRANSACTION; 322 | ``` 323 | 324 | ‘Serializable’ is the highest isolation level and guarantees safety from Dirty, Nonrepeatable, and Phantoms reads. One thing to note here; within a transaction, if we read rows, we get a write lock on them so that they can’t be updated by anyone else. 325 | 326 | Once the above database transaction is successful, we can start tracking the reservation in ActiveReservationService. 327 | 328 | # 12. Load Balancing 329 | 330 | ## 12.1. Benefits 331 | 332 | - Users experience faster, uninterrupted service. Users won’t have to wait for a single struggling server to finish its previous tasks. Instead, their requests are immediately passed on to a more readily available resource. 333 | - Service providers experience less downtime and higher throughput. Even a full server failure won’t affect the end user experience as the load balancer will simply route around it to a healthy server. 334 | - Load balancing makes it easier for system administrators to handle incoming requests while decreasing wait time for users. 335 | - Smart load balancers provide benefits like predictive analytics that determine traffic bottlenecks before they happen. As a result, the smart load balancer gives an organization actionable insights. These are key to automation and can help drive business decisions. 336 | - System administrators experience fewer failed or stressed components. Instead of a single device performing a lot of work, load balancing has several devices perform a little bit of work. 337 | 338 | ## 12.2. Algorithms 339 | 340 | How does the load balancer choose the backend server? 341 | 342 | - Load balancers consider two factors before forwarding a request to a backend server. They will first ensure that the server they choose is actually responding appropriately to requests and then use a pre-configured algorithm to select one from the set of healthy servers. We will discuss these algorithms shortly. 343 | 344 | - __Health Checks__ - Load balancers should only forward traffic to “healthy” backend servers. To monitor the health of a backend server, “health checks” regularly attempt to connect to backend servers to ensure that servers are listening. If a server fails a health check, it is automatically removed from the pool, and traffic will not be forwarded to it until it responds to the health checks again. 345 | 346 | Methods. 347 | 348 | 1. Least Connection Method — This method directs traffic to the server with the fewest active connections. This approach is quite useful when there are a large number of persistent client connections which are unevenly distributed between the servers. 349 | 350 | 2. Least Response Time Method — This algorithm directs traffic to the server with the fewest active connections and the lowest average response time. 351 | 352 | 3. Least Bandwidth Method — This method selects the server that is currently serving the least amount of traffic measured in megabits per second (Mbps). 353 | 354 | 4. Round Robin Method — This method cycles through a list of servers and sends each new request to the next server. When it reaches the end of the list, it starts over at the beginning. It is most useful when the servers are of equal specification and there are not many persistent connections. 355 | 356 | 5. Weighted Round Robin Method — The weighted round-robin scheduling is designed to better handle servers with different processing capacities. Each server is assigned a weight (an integer value that indicates the processing capacity). Servers with higher weights receive new connections before those with less weights and servers with higher weights get more connections than those with less weights. 357 | 358 | 6. IP Hash — Under this method, a hash of the IP address of the client is calculated to redirect the request to a server. 359 | 360 | # 13. Caching 361 | 362 | ## 13.1. Cache Invalidation 363 | 364 | While caching is fantastic, it requires some maintenance to keep the cache coherent with the source of truth (e.g., database). If the data is modified in the database, it should be invalidated in the cache; if not, this can cause inconsistent application behavior. 365 | 366 | Solving this problem is known as cache invalidation; there are three main schemes that are used: 367 | 368 | 1. Write-through cache: Under this scheme, data is written into the cache and the corresponding database simultaneously. The cached data allows for fast retrieval and, since the same data gets written in the permanent storage, we will have complete data consistency between the cache and the storage. Also, this scheme ensures that nothing will get lost in case of a crash, power failure, or other system disruptions. 369 | 370 | Although, write-through minimizes the risk of data loss, since every write operation must be done twice before returning success to the client, this scheme has the disadvantage of higher latency for write operations. 371 | 372 | 2. Write-around cache: This technique is similar to write-through cache, but data is written directly to permanent storage, bypassing the cache. This can reduce the cache being flooded with write operations that will not subsequently be re-read, but has the disadvantage that a read request for recently written data will create a “cache miss” and must be read from slower back-end storage and experience higher latency. 373 | 374 | 3. Write-back cache: Under this scheme, data is written to cache alone, and completion is immediately confirmed to the client. The write to the permanent storage is done after specified intervals or under certain conditions. This results in low-latency and high-throughput for write-intensive applications; however, this speed comes with the risk of data loss in case of a crash or other adverse event because the only copy of the written data is in the cache. 375 | 376 | ## 13.2. Cache eviction policies 377 | 378 | 1. First In First Out (FIFO): The cache evicts the first block accessed first without any regard to how often or how many times it was accessed before. 379 | 380 | 2. Last In First Out (LIFO): The cache evicts the block accessed most recently first without any regard to how often or how many times it was accessed before. 381 | 382 | 3. Least Recently Used (LRU): Discards the least recently used items first. 383 | 384 | 4. Most Recently Used (MRU): Discards, in contrast to LRU, the most recently used items first. 385 | 386 | 5. Least Frequently Used (LFU): Counts how often an item is needed. Those that are used least often are discarded first. 387 | 388 | 6. Random Replacement (RR): Randomly selects a candidate item and discards it to make space when necessary. 389 | 390 | # 14. Data Partitioning 391 | 392 | ## 14.1. Partitioning Criteria 393 | 394 | 1. Key or Hash-based partitioning 395 | 396 | Under this scheme, we apply a hash function to some key attributes of the entity we are storing; that yields the partition number. For example, if we have 100 DB servers and our ID is a numeric value that gets incremented by one each time a new record is inserted. In this example, the hash function could be ‘ID % 100’, which will give us the server number where we can store/read that record. This approach should ensure a uniform allocation of data among servers. The fundamental problem with this approach is that it effectively fixes the total number of DB servers, since adding new servers means changing the hash function which would require redistribution of data and downtime for the service. A workaround for this problem is to use Consistent Hashing. 397 | 398 | 2. List partitioning 399 | 400 | In this scheme, each partition is assigned a list of values, so whenever we want to insert a new record, we will see which partition contains our key and then store it there. For example, we can decide all users living in Iceland, Norway, Sweden, Finland, or Denmark will be stored in a partition for the Nordic countries. 401 | 402 | 3. Round-robin partitioning 403 | 404 | This is a very simple strategy that ensures uniform data distribution. With ‘n’ partitions, the ‘i’ tuple is assigned to partition (i mod n). 405 | 406 | 4. Composite partitioning 407 | 408 | Under this scheme, we combine any of the above partitioning schemes to devise a new scheme. For example, first applying a list partitioning scheme and then a hash based partitioning. Consistent hashing could be considered a composite of hash and list partitioning where the hash reduces the key space to a size that can be listed. 409 | 410 | ## 14.2. Common Problems of Data Partitioning 411 | 412 | 1. Joins and Denormalization 413 | 414 | Performing joins on a database which is running on one server is straightforward, but once a database is partitioned and spread across multiple machines it is often not feasible to perform joins that span database partitions. Such joins will not be performance efficient since data has to be compiled from multiple servers. A common workaround for this problem is to denormalize the database so that queries that previously required joins can be performed from a single table. Of course, the service now has to deal with all the perils of denormalization such as data inconsistency. 415 | 416 | 2. Referential integrity 417 | 418 | As we saw that performing a cross-partition query on a partitioned database is not feasible, similarly, trying to enforce data integrity constraints such as foreign keys in a partitioned database can be extremely difficult. 419 | 420 | Most of RDBMS do not support foreign keys constraints across databases on different database servers. Which means that applications that require referential integrity on partitioned databases often have to enforce it in application code. Often in such cases, applications have to run regular SQL jobs to clean up dangling references. 421 | 422 | 3. Rebalancing 423 | 424 | There could be many reasons we have to change our partitioning scheme: 425 | 426 | - The data distribution is not uniform, e.g., there are a lot of places for a particular ZIP code that cannot fit into one database partition. 427 | - There is a lot of load on a partition, e.g., there are too many requests being handled by the DB partition dedicated to user photos. 428 | 429 | In such cases, either we have to create more DB partitions or have to rebalance existing partitions, which means the partitioning scheme changed and all existing data moved to new locations. Doing this without incurring downtime is extremely difficult. Using a scheme like directory based partitioning does make rebalancing a more palatable experience at the cost of increasing the complexity of the system and creating a new single point of failure (i.e. the lookup service/database). 430 | 431 | # 15. Proxy Server 432 | 433 | ## 15.1. Open Proxy 434 | 435 | An open proxy is a proxy server that is accessible by any Internet user. Generally, a proxy server only allows users within a network group (a closed proxy) to store and forward Internet services such as DNS or web pages to reduce and control the bandwidth used by the group. With an open proxy, however, any user on the Internet is able to use this forwarding service. There are 2 famous open proxy types: 436 | 437 | 1. Anonymous Proxy. Reveals its identity as a server but doesn't disclose the initial IP address. Though this proxy server can be discovered easily, it can be beneficial for some users as it hides their IP address. 438 | 2. Transparent Proxy: This proxy server again identifies itself, and with the support of HTTP headers, the first IP address can be viewed. The main benefit of using this sort of server is its ability to cache the websites. 439 | 440 | ## 15.2. Reverse Proxy 441 | 442 | A reverse proxy retrieves resources on behalf of a client from one or more servers. These resources are then returned to the client, appearing as if they originated from the proxy server itself. 443 | 444 | # 16. SQL & NoSQL 445 | 446 | ## 16.1. SQL 447 | 448 | Relational databases store data in rows and columns. Each row contains all the information about one entity and each column contains all the separate data points. Some of the most popular relational databases are MySQL, Oracle, MS SQL Server, SQLite, Postgres, and MariaDB. 449 | 450 | ## 16.2. NoSQL 451 | 452 | 1. __Key-Value Stores__: Data is stored in an array of key-value pairs. The ‘key’ is an attribute name which is linked to a ‘value’. Well-known key-value stores include Redis, Voldemort, and Dynamo. 453 | 454 | 2. __Document Databases__: In these databases, data is stored in documents (instead of rows and columns in a table) and these documents are grouped together in collections. Each document can have an entirely different structure. Document databases include the CouchDB and MongoDB. 455 | 456 | 3. __Wide-Column Databases__: Instead of ‘tables,’ in columnar databases we have column families, which are containers for rows. Unlike relational databases, we don’t need to know all the columns up front and each row doesn’t have to have the same number of columns. Columnar databases are best suited for analyzing large datasets - big names include Cassandra and HBase. 457 | 458 | 4. __Graph Databases__: These databases are used to store data whose relations are best represented in a graph. Data is saved in graph structures with nodes (entities), properties (information about the entities), and lines (connections between the entities). Examples of graph database include Neo4J and InfiniteGraph. 459 | 460 | ## 16.3. Differences: SQL vs. NoSQL 461 | 462 | 1. Storage: SQL stores data in tables where each row represents an entity and each column represents a data point about that entity; for example, if we are storing a car entity in a table, different columns could be ‘Color’, ‘Make’, ‘Model’, and so on. 463 | 464 | NoSQL databases have different data storage models. The main ones are key-value, document, graph, and columnar. We will discuss differences between these databases below. 465 | 466 | 2. Schema: In SQL, each record conforms to a fixed schema, meaning the columns must be decided and chosen before data entry and each row must have data for each column. The schema can be altered later, but it involves modifying the whole database and going offline. 467 | 468 | In NoSQL, schemas are dynamic. Columns can be added on the fly and each ‘row’ (or equivalent) doesn’t have to contain data for each ‘column.’ 469 | 470 | 3. Querying: SQL databases use SQL (structured query language) for defining and manipulating the data, which is very powerful. In a NoSQL database, queries are focused on a collection of documents. Sometimes it is also called UnQL (Unstructured Query Language). Different databases have different syntax for using UnQL. 471 | 472 | 4. Scalability: In most common situations, SQL databases are vertically scalable, i.e., by increasing the horsepower (higher Memory, CPU, etc.) of the hardware, which can get very expensive. It is possible to scale a relational database across multiple servers, but this is a challenging and time-consuming process. 473 | 474 | On the other hand, NoSQL databases are horizontally scalable, meaning we can add more servers easily in our NoSQL database infrastructure to handle a lot of traffic. Any cheap commodity hardware or cloud instances can host NoSQL databases, thus making it a lot more cost-effective than vertical scaling. A lot of NoSQL technologies also distribute data across servers automatically. 475 | 476 | 5. Reliability or ACID Compliancy (Atomicity, Consistency, Isolation, Durability): The vast majority of relational databases are ACID compliant. So, when it comes to data reliability and safe guarantee of performing transactions, SQL databases are still the better bet. 477 | 478 | 6. Most of the NoSQL solutions sacrifice ACID compliance for performance and scalability. 479 | 480 | ## 16.4. Choose which? 481 | 482 | ### 16.4.1. SQL 483 | 484 | 1. We need to ensure ACID compliance. ACID compliance reduces anomalies and protects the integrity of your database by prescribing exactly how transactions interact with the database. Generally, NoSQL databases sacrifice ACID compliance for scalability and processing speed, but for many e-commerce and financial applications, an ACID-compliant database remains the preferred option. 485 | 486 | 2. Your data is structured and unchanging. If your business is not experiencing massive growth that would require more servers and if you’re only working with data that is consistent, then there may be no reason to use a system designed to support a variety of data types and high traffic volume. 487 | 488 | ### 16.4.2. NoSQL 489 | 490 | When all the other components of our application are fast and seamless, NoSQL databases prevent data from being the bottleneck. Big data is contributing to a large success for NoSQL databases, mainly because it handles data differently than the traditional relational databases. A few popular examples of NoSQL databases are MongoDB, CouchDB, Cassandra, and HBase. 491 | 492 | 1. Storing large volumes of data that often have little to no structure. A NoSQL database sets no limits on the types of data we can store together and allows us to add new types as the need changes. With document-based databases, you can store data in one place without having to define what “types” of data those are in advance. 493 | 494 | 2. Making the most of cloud computing and storage. Cloud-based storage is an excellent cost-saving solution but requires data to be easily spread across multiple servers to scale up. Using commodity (affordable, smaller) hardware on-site or in the cloud saves you the hassle of additional software and NoSQL databases like Cassandra are designed to be scaled across multiple data centers out of the box, without a lot of headaches. 495 | 496 | 3. Rapid development. NoSQL is extremely useful for rapid development as it doesn’t need to be prepped ahead of time. If you’re working on quick iterations of your system which require making frequent updates to the data structure without a lot of downtime between versions, a relational database will slow you down. 497 | 498 | # 17. CAP Theorem 499 | 500 | CAP theorem states that it is impossible for a distributed software system to simultaneously provide more than two out of three of the following guarantees (CAP): Consistency, Availability, and Partition tolerance. When we design a distributed system, trading off among CAP is almost the first thing we want to consider. CAP theorem says while designing a distributed system, we can pick only two of the following three options: 501 | 502 | - Consistency: All nodes see the same data at the same time. Consistency is achieved by updating several nodes before allowing further reads. 503 | 504 | - Availability: Every request gets a response on success/failure. Availability is achieved by replicating the data across different servers. 505 | 506 | - Partition tolerance: The system continues to work despite message loss or partial failure. A partition-tolerant system can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages. 507 | 508 | ![CAP](https://user-images.githubusercontent.com/56275127/114495750-3e991280-9be4-11eb-96f6-5d164acd9d18.png) 509 | 510 | We cannot build a general data store that is continually available, sequentially consistent, and tolerant to any partition failures. We can only build a system that has any two of these three properties. Because, to be consistent, all nodes should see the same set of updates in the same order. But if the network loses a partition, updates in one partition might not make it to the other partitions before a client reads from the out-of-date partition after having read from the up-to-date one. The only thing that can be done to cope with this possibility is to stop serving requests from the out-of-date partition, but then the service is no longer 100% available. 511 | 512 | # 18. Consistent Hashing 513 | 514 | ## 18.1. Improve caching system 515 | 516 | Distributed Hash Table (DHT) is one of the fundamental components used in distributed scalable systems. Hash Tables need a key, a value, and a hash function where hash function maps the key to a location where the value is stored. 517 | 518 | index = hash_function(key) 519 | 520 | Suppose we are designing a distributed caching system. Given ‘n’ cache servers, an intuitive hash function would be ‘key % n’. It is simple and commonly used. But it has two major drawbacks: 521 | 522 | - It is NOT horizontally scalable. Whenever a new cache host is added to the system, all existing mappings are broken. It will be a pain point in maintenance if the caching system contains lots of data. Practically, it becomes difficult to schedule a downtime to update all caching mappings. 523 | - It may NOT be load balanced, especially for non-uniformly distributed data. In practice, it can be easily assumed that the data will not be distributed uniformly. For the caching system, it translates into some caches becoming hot and saturated while the others idle and are almost empty. 524 | 525 | ## 18.2. Consistent Hashing 526 | 527 | Consistent hashing is a very useful strategy for distributed caching systems and DHTs. It allows us to distribute data across a cluster in such a way that will minimize reorganization when nodes are added or removed. Hence, the caching system will be easier to scale up or scale down. 528 | 529 | In Consistent Hashing, when the hash table is resized (e.g. a new cache host is added to the system), only ‘k/n’ keys need to be remapped where ‘k’ is the total number of keys and ‘n’ is the total number of servers. Recall that in a caching system using the ‘mod’ as the hash function, all keys need to be remapped. 530 | 531 | In Consistent Hashing, objects are mapped to the same host if possible. When a host is removed from the system, the objects on that host are shared by other hosts; when a new host is added, it takes its share from a few hosts without touching other’s shares. 532 | 533 | ## 18.3. Algorithm 534 | 535 | As a typical hash function, consistent hashing maps a key to an integer. Suppose the output of the hash function is in the range of [0, 256]. Imagine that the integers in the range are placed on a ring such that the values are wrapped around. 536 | 537 | 1. Given a list of cache servers, hash them to integers in the range. 538 | 2. To map a key to a server, 539 | 540 | - Hash it to a single integer. 541 | - Move clockwise on the ring until finding the first cache it encounters. 542 | - That cache is the one that contains the key. See animation below as an example: key1 maps to cache A; key2 maps to cache C. 543 | 544 | To add a new server, say D, keys that were originally residing at C will be split. Some of them will be shifted to D, while other keys will not be touched. 545 | 546 | To remove a cache or, if a cache fails, say A, all keys that were originally mapped to A will fall into B, and only those keys need to be moved to B; other keys will not be affected. 547 | 548 | For load balancing, as we discussed in the beginning, the real data is essentially randomly distributed and thus may not be uniform. It may make the keys on caches unbalanced. 549 | 550 | To handle this issue, we add “virtual replicas” for caches. Instead of mapping each cache to a single point on the ring, we map it to multiple points on the ring, i.e. replicas. This way, each cache is associated with multiple portions of the ring. 551 | 552 | If the hash function “mixes well,” as the number of replicas increases, the keys will be more balanced. -------------------------------------------------------------------------------- /Beyond Cracking the Coding Interview/Beyond Cracking the Coding Interview.md: -------------------------------------------------------------------------------- 1 | Beyond Cracking the Coding Interview - by Gayle Laakmann McDowell, Mike Mroczka, Aline Lerner, Nil Mamano, 2025 2 | 3 | > [Book Link](https://www.amazon.com/Beyond-Cracking-Coding-Interview-Successfully/dp/195570600X) 4 | 5 |

6 | 7 | # Contents 8 | 9 | 10 | 11 | - [20. Anatomy of Coding Interview](#20-anatomy-of-coding-interview) 12 | - [25. Dynamic Arrays](#25-dynamic-arrays) 13 | - [26. String Manipulation](#26-string-manipulation) 14 | - [27. Two Pointers](#27-two-pointers) 15 | - [28. Grids & Matrices](#28-grids-matrices) 16 | - [29. Binary Search](#29-binary-search) 17 | - [30. Set & Maps](#30-set-maps) 18 | - [31. Sorting](#31-sorting) 19 | - [32. Stacks & Queues](#32-stacks-queues) 20 | - [33. Recursion](#33-recursion) 21 | - [34. Linked Lists](#34-linked-lists) 22 | - [35. Trees ](#35-trees) 23 | - [36. Graphs](#36-graphs) 24 | - [37. Heaps ](#37-heaps) 25 | - [38. Sliding Windows ](#38-sliding-windows) 26 | - [39. Backtracking](#39-backtracking) 27 | - [40. Dynamic Programming](#40-dynamic-programming) 28 | - [41. Greedy Algorithms](#41-greedy-algorithms) 29 | - [42. Topological Sort ](#42-topological-sort) 30 | - [43. Prefix Sums ](#43-prefix-sums) 31 | 32 | 33 | 34 | 35 | ## 20. Anatomy of Coding Interview 36 | 37 | Get buy-in with magic question: I'd like to use A algorithm with B time and C space to solve problem. Should I code this now, or should I keep thinking? 38 | 39 | 40 | ## 25. Dynamic Arrays 41 | 42 | ```py 43 | class DynamicArray: 44 | def __init__(self): 45 | self.capacity = 10 46 | self._size = 0 47 | self.fixed_array = [None] * self.capacity 48 | 49 | def get(self, i): 50 | if i < 0 or i >= self._size: 51 | raise IndexError('index out of bounds') 52 | return self.fixed_array[i] 53 | 54 | def set(self, i, x): 55 | if i < 0 or i >= self._size: 56 | raise IndexError('index out of bounds') 57 | self.fixed_array[i] = x 58 | 59 | def size(self): 60 | return self._size 61 | 62 | def append(self, x): 63 | if self._size == self.capacity: 64 | self.resize(self.capacity * 2) 65 | self.fixed_array[self._size] = x 66 | self._size += 1 67 | 68 | def resize(self, new_capacity): 69 | new_fixed_size_arr = [None] * new_capacity 70 | for i in range(self._size): 71 | new_fixed_size_arr[i] = self.fixed_array[i] 72 | self.fixed_array = new_fixed_size_arr 73 | self.capacity = new_capacity 74 | 75 | def pop_back(self): 76 | if self._size == 0: 77 | raise IndexError('pop from empty array') 78 | self._size -= 1 79 | if self._size / self.capacity < 0.25 and self.capacity > 10: 80 | self.resize(self.capacity // 2) 81 | 82 | def pop(self, i): 83 | if i < 0 or i >= self._size: 84 | raise IndexError('Index out of bounds') 85 | saved_element = self.fixed_array[i] 86 | for index in range(i, self._size - 1): 87 | self.fixed_array[index] = self.fixed_array[index + 1] 88 | self.pop_back() 89 | return saved_element 90 | ``` 91 | 92 | ## 26. String Manipulation 93 | 94 | ```py 95 | def is_uppercase(c): 96 | return ord(c) >= ord('A') and ord(c) <= ord('Z') 97 | 98 | def is_digit(c): 99 | return ord(c) >= ord('0') and ord(c) <= ord('9') 100 | 101 | def is_alphanumeric(c): 102 | return is_lowercase(c) or is_uppercase(c) or is_digit(c) 103 | 104 | def to_uppercase(c): 105 | if not is_lowercase(c): 106 | return c 107 | return chr(ord(c) - ord('a') + ord('A')) 108 | 109 | def split(s, c): 110 | if not s: 111 | return [] 112 | res = [] 113 | current = [] 114 | for char in s: 115 | if char == c: 116 | res.append(''.join(current)) 117 | current = [] 118 | else: 119 | current.append(char) 120 | res.append(''.join(current)) 121 | return res 122 | 123 | def join(arr, s): 124 | res = [] 125 | for i in range(len(arr)): 126 | if i != 0: 127 | for c in s: 128 | res.append(c) 129 | for c in arr[i]: 130 | res.append(c) 131 | return array_to_string(res) 132 | ``` 133 | 134 | 135 | ## 27. Two Pointers 136 | 137 | ```py 138 | def palindrome(s): 139 | l, r = 0, len(s) - 1 140 | while l < r: 141 | if s[l] != s[r]: 142 | return False 143 | l += 1 144 | r -= 1 145 | return True 146 | 147 | def smaller_prefixes(arr): 148 | sp, fp = 0, 0 149 | slow_sum, fast_sum = 0, 0 150 | while fp < len(arr): 151 | slow_sum += arr[sp] 152 | fast_sum += arr[fp] + arr[fp+1] 153 | if slow_sum >= fast_sum: 154 | return False 155 | sp += 1 156 | fp += 2 157 | return True 158 | 159 | def common_elements(arr1, arr2): 160 | p1, p2 = 0, 0 161 | res = [] 162 | while p1 < len(arr1) and p2 < len(arr2): 163 | if arr1[p1] == arr2[p2]: 164 | res.append(arr1[p1]) 165 | p1 += 1 166 | p2 += 1 167 | elif arr1[p1] < arr2[p2]: 168 | p1 += 1 169 | else: 170 | p2 += 1 171 | return res 172 | 173 | def palindrome_sentence(s): 174 | l, r = 0, len(s) - 1 175 | while l < r: 176 | if not s[l].isalpha(): 177 | l += 1 178 | elif not s[r].isalpha(): 179 | r -= 1 180 | else: 181 | if s[l].lower() != s[r].lower(): 182 | return False 183 | l += 1 184 | r -= 1 185 | return True 186 | 187 | def reverse_case_match(s): 188 | l, r = 0, len(s) - 1 189 | while l < len(s) and r >= 0: 190 | if not s[l].islower(): 191 | l += 1 192 | elif not s[r].isupper(): 193 | r -= 1 194 | else: 195 | if s[l] != s[r].lower(): 196 | return False 197 | l += 1 198 | r -= 1 199 | return True 200 | 201 | def merge(arr1, arr2): 202 | p1, p2 = 0, 0 203 | res = [] 204 | while p1 < len(arr1) and p2 < len(arr2): 205 | if arr1[p1] < arr2[p2]: 206 | res.append(arr1[p1]) 207 | p1 += 1 208 | else: 209 | res.append(arr2[p2]) 210 | p2 += 1 211 | while p1 < len(arr1): 212 | res.append(arr1[p1]) 213 | p1 += 1 214 | while p2 < len(arr2): 215 | res.append(arr2[p2]) 216 | p2 += 1 217 | return res 218 | 219 | def two_sum(arr): 220 | l, r = 0, len(arr) - 1 221 | while l < r: 222 | if arr[l] + arr[r] > 0: 223 | r -= 1 224 | elif arr[l] + arr[r] < 0: 225 | l += 1 226 | else: 227 | return True 228 | return False 229 | 230 | def sort_valley_array(arr): 231 | if len(arr) == 0: 232 | return [] 233 | l, r = 0, len(arr) - 1 234 | res = [0] * len(arr) 235 | i = len(arr) - 1 236 | while l < r: 237 | if arr[l] >= arr[r]: 238 | res[i] = arr[l] 239 | l += 1 240 | i -= 1 241 | else: 242 | res[i] = arr[r] 243 | r -= 1 244 | i -= 1 245 | res[0] = arr[l] 246 | return res 247 | 248 | def intersection(int1, int2): 249 | overlap_start = max(int1[0], int2[0]) 250 | overlap_end = min(int1[1], int2[1]) 251 | return [overlap_start, overlap_end] 252 | def interval_intersection(arr1, arr2): 253 | p1, p2 = 0, 0 254 | n1, n2 = len(arr1), len(arr2) 255 | res = [] 256 | while p1 < n1 and p2 < n2: 257 | int1, int2 = arr1[p1], arr2[p2] 258 | if int1[1] < int2[0]: 259 | p1 += 1 260 | elif int2[1] < int1[0]: 261 | p2 += 1 262 | else: 263 | res.append(intersection(int1, int2)) 264 | if int1[1] < int2[1]: 265 | p1 += 1 266 | else: 267 | p2 += 1 268 | return res 269 | 270 | def reverse(arr): 271 | l, r = 0, len(arr) - 1 272 | while l < r: 273 | arr[l], arr[r] = arr[r], arr[l] 274 | l += 1 275 | r -= 1 276 | 277 | def sort_even(arr): 278 | l, r = 0, len(arr) - 1 279 | while l < r: 280 | if arr[l] % 2 == 0: 281 | l += 1 282 | elif arr[r] % 2 == 1: 283 | r -= 1 284 | else: 285 | arr[l], arr[r] = arr[r], arr[l] 286 | l += 1 287 | r -= 1 288 | 289 | def remove_duplicates(arr): 290 | s, w = 0, 0 291 | while s < len(arr): 292 | must_keep = s == 0 or arr[s] != arr[s-1] 293 | if must_keep: 294 | arr[w] = arr[s] 295 | w += 1 296 | s += 1 297 | return w 298 | 299 | def move_word(arr, word): 300 | seeker, writer = 0, 0 301 | i = 0 302 | while seeker < len(arr): 303 | if i < len(word) and arr[seeker] == word[i]: 304 | seeker += 1 305 | i += 1 306 | else: 307 | arr[writer] = arr[seeker] 308 | seeker += 1 309 | writer += 1 310 | for c in word: 311 | arr[writer] = c 312 | writer += 1 313 | ``` 314 | 315 | 316 | ## 28. Grids & Matrices 317 | 318 | ```py 319 | def is_valid(room, r, c): 320 | return 0 <= r < len(room) and 0 <= c < len(room[0]) and room[r][c] != 1 321 | def valid_moves(room, r, c): 322 | moves = [] 323 | directions = [[-1, 0], [1, 0], [0, -1], [0, 1]] 324 | for dir_r, dir_c in directions: 325 | new_r = r + dir_r 326 | new_c = c + dir_c 327 | if is_valid(room, new_r, new_c): 328 | moves.append([new_r, new_c]) 329 | return moves 330 | 331 | def queen_valid_moves(board, piece, r, c): 332 | moves = [] 333 | king_directions = [ 334 | [-1, 0], [1, 0], [0, -1], [0, 1] # vertical and horizontal 335 | [-1, -1], [-1, 1], [1, -1], [1, 1] # diagonal 336 | ] 337 | knight_directions = [[-2, 1], [-1, 2], [1, 2], [2, 1], [2, -1], [1, -2], [-1, -2], [-2, -1]] 338 | if piece == "knight": 339 | directions = knight_directions 340 | else: # king and queen 341 | directions = king_directions 342 | for dir_r, dir_c in directions: 343 | new_r, new_c = r + dir_r, c + dir_c 344 | if piece == "queen": 345 | while is_valid(board, new_r, new_c): 346 | new_r += dir_r 347 | new_c += dir_c 348 | elif is_valid(board, new_r, new_c): 349 | moves.append([new_r, new_c]) 350 | return moves 351 | 352 | def safe_cells(board): 353 | n = len(board) 354 | res = [[0] * n for _ in range(n)] 355 | for r in range(n): 356 | for c in range(n): 357 | if board[r][c] == 1: 358 | res[r][c] = 1 359 | mark_reachable_cells(board, r, c, res) 360 | return res 361 | def mark_reachable_cells(board, r, c, res): 362 | directions = [ 363 | [-1, 0], [1, 0], [0, -1], [0, 1] # vertical and horizontal 364 | [-1, -1], [-1, 1], [1, -1], [1, 1] # diagonal 365 | ] 366 | for dir_r, dir_c in directions: 367 | new_r, new_c = r + dir_r, c + dir_c 368 | while is_valid(board, new_r, new_c): 369 | res[new_r][new_c] = 1 370 | new_r += dir_r 371 | new_c += dir_c 372 | 373 | def is_valid(grid, r, c): 374 | return 0 <= r < len(grid) and 0 <= c < len(grid[0]) and grid[r][c] == 0 375 | def spiral(n): 376 | val = n * n - 1 377 | res = [[0] * n for _ in range(n)] 378 | r, c = n - 1, n - 1 379 | directions = [[-1, 0], [0, -1], [1, 0], [0, 1]] # counterclockwise 380 | dir = 0 # start going up 381 | while val > 0: 382 | res[r][c] = val 383 | val -= 1 384 | if not is_valid(res, r + directions[dir][0], c + directions[dir][1]): 385 | dir = (dir + 1) % 4 # change directions counterclockwise 386 | r, c = r + directions[dir][0], c + directions[dir][1] 387 | return res 388 | 389 | def distance_to_river(field): 390 | R, C = len(field), len(field[0]) 391 | def has_footprints(r, c): 392 | return 0 <= r < R and 0 <= c < C and field[r][c] == 1 393 | r, c = 0, 0 394 | while field[r][c] != 1: 395 | r += 1 396 | closest = r 397 | directions_row = [-1, 0, 1] 398 | while c < C: 399 | for dir_r in directions_row: 400 | new_r, new_c = r + dir_r, c + 1 401 | if has_footprints(new_r, new_c): 402 | r, c = new_r, new_c 403 | closest = min(closest, r) 404 | break 405 | return closest 406 | 407 | def valid_rows(board): 408 | R, C = len(board), len(board[0]) 409 | for r in range(R): 410 | seen = set() 411 | for c in range(C): 412 | if board[r][c] in seen: 413 | return False 414 | if board[r][c] != 0: 415 | seen.add(board[r][c]) 416 | return True 417 | 418 | def valid_subgrid(board, r, c): 419 | seen = set() 420 | for new_r in range(r, r + 3): 421 | for new_c in range(c, c + 3): 422 | if board[new_r][new_c] in seen: 423 | return False 424 | if board[new_r][new_c] != 0: 425 | seen.add(board[new_r][new_c]) 426 | return True 427 | def valid_subgrids(board): 428 | for r in range(3): 429 | for c in range(3): 430 | if not valid_subgrid(board, r * 3, c * 3): 431 | return False 432 | return True 433 | 434 | def subgrid_maximums(grid): 435 | R, C = len(grid), len(grid[0]) 436 | res = [row.copy() for row in grid] 437 | for r in range(R - 1, -1, -1): 438 | for c in range(C - 1, -1, -1): 439 | if r + 1 < R: 440 | res[r][c] = max(res[r][c], grid[r + 1][c]) 441 | if c + 1 < C: 442 | res[r][c] = max(res[r][c], grid[r][c + 1]) 443 | return res 444 | 445 | def backward_sum(grid): 446 | R, C = len(grid), len(grid[0]) 447 | res = [row.copy() for row in grid] 448 | for r in range(R - 1, -1, -1): 449 | for c in range(C - 1, -1, -1): 450 | if r + 1 < R: 451 | res[r][c] += res[r + 1][c] 452 | if c + 1 < C: 453 | res[r][c] += res[r][c + 1] 454 | if r + 1 < R and c + 1 < C: # subtract double-counted subgrid 455 | res[r][c] -= res[r + 1][c + 1] 456 | return res 457 | 458 | class Matrix: 459 | def __init__(self, grid): 460 | self.matrix = [row.copy() for row in grid] 461 | def transpose(self): 462 | matrix = self.matrix 463 | for r in range(len(matrix)): 464 | for c in range(r): 465 | matrix[r][c], matrix[c][r] = matrix[c][r], matrix[r][c] 466 | def reflect_horizontally(self): 467 | for row in self.matrix: 468 | row.reverse() 469 | def reflect_vertically(self): 470 | self.matrix.reverse() 471 | def rotate_clockwise(self): 472 | self.transpose() 473 | self.reflect_horizontally() 474 | def rotate_counterclockwise(self): 475 | self.transpose() 476 | self.reflect_vertically() 477 | ``` 478 | 479 | 480 | ## 29. Binary Search 481 | 482 | ```py 483 | def binary_search(arr, target): 484 | n = len(arr) 485 | if n == 0: 486 | return -1 487 | l, r = 0, n - 1 488 | if arr[l] >= target or arr[r] < target: 489 | if arr[l] == target: 490 | return 0 491 | return -1 492 | while r - l > 1: 493 | mid = (l + r) // 2 494 | if arr[mid] < target: 495 | l = mid 496 | else: 497 | r = mid 498 | if arr[r] == target: 499 | return r 500 | return -1 501 | 502 | def is_before(val): 503 | return not is_stolen(val) 504 | def find_bike(t1, t2): 505 | l, r = t1, t2 506 | while r - l > 1: 507 | mid = (l + r) // 2 508 | if is_before(mid): 509 | l = mid 510 | else: 511 | r = mid 512 | return r 513 | 514 | def valley_min_index(arr): 515 | def is_before(i): 516 | return i == 0 or arr[i] < arr[i - 1] 517 | l, r = 0, len(arr) - 1 518 | if is_before(r): 519 | return arr[r] 520 | while r - l > 1: 521 | mid = (l + r) // 2 522 | if is_before(mid): 523 | l = mid 524 | else: 525 | r = mid 526 | return mid(arr[l], arr[r]) 527 | 528 | def two_array_two_sum(sorted_arr, unsorted_arr): 529 | for i, val in enumerate(unsorted_arr): 530 | idx = binary_search(sorted_arr, -val) 531 | if idx != -1: 532 | return [idx, i] 533 | return [-1, -1] 534 | 535 | def is_before(grid, i, target): 536 | num_cols = len(grid[0]) 537 | row, col = i // num_cols, i % num_cols 538 | return grid[row][col] < target 539 | 540 | def find_through_api(target): 541 | def is_before(idx): 542 | return fetch(idx) < target 543 | l, r = 0, 1 544 | # get rightmost boundary 545 | while fetch(r) != -1: 546 | r *= 2 547 | # binary search 548 | # ... 549 | 550 | # is it impossible to split arr into k subarrays, each with sum <= max_sum? 551 | def is_before(arr, k, max_sum): 552 | splits_required = get_splits_required(arr, max_sum) 553 | return splits_required > k 554 | # return min number of subarrays with a given max sum, assume max_sum >= max(arr) 555 | def get_splits_required(arr, max_sum): 556 | splits_required = 1 557 | current_sum = 0 558 | for num in arr: 559 | if current_sum + num > max_sum: 560 | splits_required += 1 561 | current_sum = num # start new subarray with current number 562 | else: 563 | current_sum += num 564 | return splits_required 565 | def min_subarray_sum_split(arr, k): 566 | l, r = max(arr), sum(arr) # range for max subarray sum 567 | if not is_before(arr, k, l): 568 | return l 569 | while r - l > 1: 570 | mid = (l + r) // 2 571 | if is_before(arr, k, mid): 572 | l = mid 573 | else: 574 | r = mid 575 | return r 576 | 577 | def num_refills(a, b): 578 | # can we pour 'num_pours' times? 579 | def is_before(num_pours): 580 | return num_pours * b <= a 581 | # exponential search (repeatedly doubling until find upper bound) 582 | k = 1 583 | while is_before(k * 2): 584 | k *= 2 585 | # binary search between k and k * 2 586 | l, r = k, k * 2 587 | while r - l > 1: 588 | gap = r - l 589 | half_gap = gap >> 1 # bit shift instead of division 590 | mid = l + half_gap 591 | if is_before(mid): 592 | l = mid 593 | else: 594 | r = mid 595 | return l 596 | 597 | def get_ones_in_row(row): 598 | if row[0] == 0: 599 | return 0 600 | if row[-1] == 1: 601 | return len(row) 602 | def is_before_row(idx): 603 | return row[idx] == 1 604 | l, r = 0, len(row) 605 | while r - l > 1: 606 | mid = (l + r) // 2 607 | if is_before_row(mid): 608 | l = mid 609 | else: 610 | r = mid 611 | return r 612 | def is_before(picture): 613 | water = 0 614 | for row in picture: 615 | water += get_ones_in_row(row) 616 | total = len(picture[0]) ** 2 617 | return water / total < 0.5 618 | ``` 619 | 620 | 621 | ## 30. Set & Maps 622 | 623 | ```py 624 | def account_sharing(connections): 625 | seen = set() 626 | for ip, username in connections: 627 | if username in seen: 628 | return ip 629 | seen.add(username) 630 | return "" 631 | 632 | def most_shared_account(connections): 633 | user_to_count = dict() 634 | for _, user in connections: 635 | if not user in user_to_count: 636 | user_to_count[user] = 0 637 | user_to_count[user] += 1 638 | most_shared_user = None 639 | for user, count in user_to_count.items(): 640 | if not most_shared_account or count > user_to_count[most_shared_user]: 641 | most_shared_user = user 642 | return most_shared_user 643 | 644 | def multi_account_cheating(users): 645 | unique_lists = set() 646 | for _, ips in users: 647 | immutable_list = tuple(sorted(ips)) 648 | if immutable_list in unique_lists: 649 | return True 650 | unique_lists.add(immutable_list) 651 | return False 652 | 653 | class DomainResolver: 654 | def __init__(self): 655 | self.ip_to_domains = dict() 656 | self.domain_to_subdomains = dict() 657 | def register_domain(self, ip, domain): 658 | if ip not in self.ip_to_domains: 659 | self.ip_to_domains[ip] = set() 660 | self.ip_to_domains[ip].add(domain) 661 | def register_subdomain(self, domain, subdomain): 662 | if domain not in self.domain_to_subdomains: 663 | self.domain_to_subdomains[domain] = set() 664 | self.domain_to_subdomains[domain].add(subdomain) 665 | def has_subdomain(self, ip, domain, subdomain): 666 | if ip not in self.ip_to_domains: 667 | return False 668 | if domain not in self.domain_to_subdomains: 669 | return False 670 | return subdomain in self.domain_to_subdomains[domain] 671 | 672 | def find_squared(arr): 673 | # create map from number to index (allow multiple indices per number) 674 | num_to_indices = {} 675 | for i, num in enumerate(arr): 676 | if num not in num_to_indices: 677 | num_to_indices[num] = [] 678 | num_to_indices[num].append(i) 679 | res = [] 680 | # iterate through each number and check if its square exists in map 681 | for i, num in enumerate(arr): 682 | square = num ** 2 683 | if square in num_to_indices: 684 | for j in num_to_indices[square]: 685 | res.append([i, j]) 686 | return res 687 | 688 | def suspect_students(answers, m, students): 689 | def same_row(desk1, desk2): 690 | return (desk1 - 1) // m == (desk2 - 1) // m 691 | desk_to_index = {} 692 | for i, [student_id, desk, student_answers] in enumerate(students): 693 | if student_answers != answers: 694 | desk_to_index[desk] = i 695 | sus_pairs = [] 696 | for student_id, desk, answers in students: 697 | other_desk = desk + 1 698 | if same_row(desk, other_desk) and other_desk in desk_to_index: 699 | other_student = students[desk_to_index[other_desk]] 700 | if answers == other_student[2]: 701 | sus_pairs.append([student_id, other_student[0]]) 702 | return sus_pairs 703 | 704 | def alphabetic_sum_product(words, target): 705 | sums = set() 706 | for word in words: 707 | sums.add(alphabetical_sum(word)) 708 | for i in sums: 709 | if target % i != 0: 710 | continue 711 | for j in sums: 712 | k = target / (i * j) 713 | if k in sums: 714 | return True 715 | return False 716 | 717 | def find_anomalies(log): 718 | opened = {} # ticket -> agent who opened it 719 | working_on = {} # agent -> ticket they're working on 720 | seen = set() # tickets that were opened or closed 721 | anomalies = set() 722 | for agent, action, ticket in log: 723 | if ticket in anomalies: 724 | continue 725 | if action == "open": 726 | if ticket in seen: 727 | anomalies.add(ticket) 728 | continue 729 | if agent in working_on: 730 | # if agent is working on another ticket, that ticket is anomalous 731 | anomalies.add(working_on[agent]) 732 | opened[ticket] = agent 733 | working_on[agent] = ticket 734 | seen.add(ticket) 735 | else: 736 | if ticket not in opened or opened[ticket] != agent: 737 | anomalies.add(ticket) 738 | continue 739 | if agent not in working_on or working_on[agent] != ticket: 740 | anomalies.add(ticket) 741 | continue 742 | del working_on[agent] 743 | del opened[ticket] 744 | # any tickets still open are anomalous 745 | anomalies.update(opened.keys()) 746 | return list(anomalies) 747 | 748 | def set_intersection(sets): 749 | res = sets[0] 750 | for i in range(1, len(sets)): 751 | res = {elem for elem in sets[i] if elem in res} 752 | return res 753 | ``` 754 | 755 | 756 | ## 31. Sorting 757 | 758 | ```py 759 | def mergesort(arr): 760 | n = len(arr) 761 | if n <= 1: 762 | return arr 763 | left = mergesort(arr[:n // 2]) 764 | right = mergesort(arr[n // 2:]) 765 | return merge(left, right) 766 | 767 | def quicksort(arr): 768 | if len(arr) <= 1: 769 | return arr 770 | pivot = random.choice(arr) 771 | smaller, equal, larger = [], [], [] 772 | for x in arr: 773 | if x < pivot: smaller.append(x) 774 | if x == pivot: equal.append(x) 775 | if x > pivot: larger.append(x) 776 | return quicksort(smaller) + equal + quicksort(larger) 777 | 778 | def counting_sort(arr): 779 | if not arr: return [] 780 | R = max(arr) 781 | counts = [0] * (R + 1) 782 | for x in arr: 783 | counts[x] += 1 784 | res = [] 785 | for x in range(R + 1): 786 | while counts[x] > 0: 787 | res.append(x) 788 | counts[x] -= 1 789 | return res 790 | 791 | def descending_sort(strings): 792 | return sorted(strings, key=lambda s: s.lower(), reverse=True) 793 | 794 | def sort_by_interval_end(intervals): 795 | return sorted(intervals, key=lambda interval: interval[1]) 796 | 797 | def sort_value_then_suit(deck): 798 | suit_map = {'clubs': 0, 'hearts': 1, 'spades': 2, 'diamonds': 3} 799 | return sorted(deck, key=lambda card: (card.value, suit_map[card.suit])) 800 | 801 | def new_deck_order(deck): 802 | suit_map = {'hearts': 0, 'clubs': 1, 'diamonds': 2, 'spades': 3} 803 | return sorted(deck, key=lambda card: (suit_map[card.suit], card.value)) 804 | 805 | def stable_sort_by_value(deck): 806 | return sorted(deck, key=lambda card: card.value) 807 | 808 | def letter_occurrences(word): 809 | letter_to_count = dict() 810 | for c in word: 811 | if c not in letter_to_count: 812 | letter_to_count[c] = 0 813 | letter_to_count[c] += 1 814 | tuples = [] 815 | for letter, count in letter_to_count.items(): 816 | tuples.append((letter, count)) 817 | tuples.sort(key=lambda x: (-x[1], x[0])) 818 | res = [] 819 | for letter, _ in tuples: 820 | res.append(letter) 821 | return res 822 | 823 | def are_circles_nested(circles): 824 | circles.sort(key = lambda c: c[1], reverse=True) 825 | for i in range(len(circles) - 1): 826 | if not contains(circles[i], circles[i + 1]): 827 | return False 828 | return True 829 | def contains(c1, c2): 830 | (x1, y1), r1 = c1 831 | (x2, y2), r2 = c2 832 | center_distance = sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2) 833 | return center_distance + r2 < r1 834 | 835 | def process_operations(nums, operations): 836 | n = len(nums) 837 | deleted = set() 838 | sorted_indices = [] 839 | for i in range(n): 840 | sorted_indices.append(i) 841 | sorted_indices.sort(key=lambda i: nums[i]) 842 | smallest_idx = 0 843 | for op in operations: 844 | if 0 <= op < n: 845 | deleted.add(op) 846 | else: 847 | # skip until the next non-deleted smallest index 848 | while smallest_idx < n and sorted_indices[smallest_idx] in deleted: 849 | smallest_idx += 1 850 | if smallest_idx < n: 851 | deleted.add(sorted_indices[smallest_idx]) 852 | smallest_idx += 1 853 | res = [] 854 | for i in range(n): 855 | if not i in deleted: 856 | res.append(nums[i]) 857 | return res 858 | 859 | class Spreadsheet: 860 | def __init__(self, rows, cols): 861 | self.rows = rows 862 | self.cols = cols 863 | self.sheet = [] 864 | for _ in range(rows): 865 | self.sheet.append([0] * cols) 866 | def set(self, row, col, value): 867 | self.sheet[row][col] = value 868 | def get(self, row, col): 869 | return self.sheet[row][col] 870 | def sort_rows_by_column(self, col): 871 | self.sheet.sort(key=lambda row: row[col]) 872 | def sort_columns_by_row(self, row): 873 | columns_with_values = [] 874 | for col in range(self.cols): 875 | columns_with_values.append((col, self.sheet[row][col])) 876 | sorted_columns = sorted(columns_with_values, key=lambda x: x[1]) 877 | sorted_sheet = [] 878 | for r in range(self.rows): 879 | new_row = [] 880 | for col, _ in sorted_columns: 881 | new_row.append(self.sheet[r][col]) 882 | sorted_sheet.append(new_row) 883 | self.sheet = sorted_sheet 884 | 885 | def bucket_sort(books): 886 | if not books: return [] 887 | min_year = min(book.year_published for book in books) 888 | max_year = max(book.year_published for book in books) 889 | buckets = [[] for _ in range(max_year - min_year + 1)] 890 | for book in books: 891 | buckets[book.year_published - min_year].append(book) 892 | res = [] 893 | for bucket in buckets: 894 | for book in bucket: 895 | res.append(book) 896 | return res 897 | 898 | def quickselect(arr, k): 899 | if len(arr) == 1: 900 | return arr[0] 901 | pivot_index = random.randint(0, len(arr) - 1) 902 | pivot = arr[pivot_index] 903 | smaller, larger = [], [] 904 | for x in arr: 905 | if x < pivot: smaller.append(x) 906 | elif x > pivot: larger.append(x) 907 | if k <= len(smaller): 908 | return quickselect(smaller, k) 909 | elif k == len(smaller) + 1: 910 | return pivot 911 | else: 912 | return quickselect(larger, k - len(smaller) - 1) 913 | def first_k(arr, k): 914 | if len(arr) == 0: return [] 915 | kth_val = quickselect(arr, k) 916 | return [x for x in arr if x <= kth_val] 917 | ``` 918 | 919 | 920 | ## 32. Stacks & Queues 921 | 922 | ```py 923 | class Stack: 924 | def __init__(self): 925 | self.array = [] 926 | def push(self, value): 927 | self.array.append(value) 928 | def pop(self): 929 | if self.is_empty(): 930 | raise IndexError('stack is empty') 931 | val = self.array[-1] 932 | self.array.pop() 933 | return val 934 | def peek(self): 935 | if self.is_empty(): 936 | raise IndexError('stack is empty') 937 | return self.array[-1] 938 | def size(self): 939 | return len(self.array) 940 | 941 | def compress_array(arr): 942 | stack = [] 943 | for num in arr: 944 | while stack and stack[-1] == num: 945 | num += stack.pop() 946 | stack.append(num) 947 | return stack 948 | 949 | def compress_array_by_k(arr, k): 950 | stack = [] 951 | def merge(num): 952 | if not stack or stack[-1][0] != num: 953 | stack.append([num, 1]) 954 | elif stack[-1][1] < k - 1: 955 | stack[-1][1] += 1 956 | else: 957 | stack.pop() 958 | merge(num * k) 959 | for num in arr: 960 | merge(num) 961 | res = [] 962 | for num, count in stack: 963 | for _ in range(count): 964 | res.append(num) 965 | return res 966 | 967 | class ViewerCounter: 968 | def __init__(self, window): 969 | self.queues = {"guest": Queue(), "follower": Queue(), "subscriber": Queue()} 970 | self.window = window 971 | def join(self, t, v): 972 | self.queues[v].put(t) 973 | def get_viewers(self, t, v): 974 | queue = self.queues[v] 975 | while not queue.empty() and queue.peek() < t - self.window: 976 | queue.pop() 977 | return queue.size() 978 | 979 | def current_url(actions): 980 | stack = [] 981 | for action, value in actions: 982 | if action == "go": 983 | stack.append(value) 984 | else: 985 | while len(stack) > 1 and value > 0: 986 | stack.pop() 987 | value -= 1 988 | return stack[-1] 989 | 990 | def current_url_followup(actions): 991 | stack = [] 992 | forward_stack = [] 993 | for action, value in actions: 994 | if action == "go": 995 | stack.append(value) 996 | forward_stack = [] 997 | elif action == "back": 998 | while len(stack) > 1 and value > 0: 999 | forward_stack.append(stack.pop()) 1000 | value -= 1 1001 | else: 1002 | while forward_stack and value > 0: 1003 | stack.append(forward_stack.pop()) 1004 | value -= 1 1005 | return stack[-1] 1006 | 1007 | def balanced(s): 1008 | height = 0 1009 | for c in s: 1010 | if c == '(': 1011 | height += 1 1012 | else: 1013 | height -= 1 1014 | if height < 0: 1015 | return False 1016 | return height == 0 1017 | 1018 | def max_balanced_partition(s): 1019 | height = 0 1020 | res = 0 1021 | for c in s: 1022 | if c == '(': 1023 | height += 1 1024 | else: 1025 | height -= 1 1026 | if height == 0: 1027 | res += 1 1028 | return res 1029 | 1030 | def balanced_brackets(s, brackets): 1031 | open_to_close = dict() 1032 | close_set = set() 1033 | for pair in brackets: 1034 | open_to_close[pair[0]] = pair[1] 1035 | close_set.add(pair[1]) 1036 | stack = [] 1037 | for c in s: 1038 | if c in open_to_close: 1039 | stack.append(open_to_close[c]) 1040 | elif c in close_set: 1041 | if not stack or stack[-1] != c: 1042 | return False 1043 | stack.pop() 1044 | return len(stack) == 0 1045 | ``` 1046 | 1047 | 1048 | ## 33. Recursion 1049 | 1050 | ```py 1051 | def moves(seq): 1052 | res = [] 1053 | def moves_rec(pos): 1054 | if pos == len(seq): 1055 | return 1056 | if seq[pos] == '2': 1057 | moves_rec(pos+1) 1058 | moves_rec(pos+2) 1059 | else: 1060 | res.append(seq[pos]) 1061 | moves_rec(pos+1) 1062 | moves_rec(0) 1063 | return ''.join(res) 1064 | 1065 | def nested_array_sum(arr): 1066 | res = 0 1067 | for elem in arr: 1068 | if isinstance(elem, int): 1069 | res += elem 1070 | else: 1071 | res += nested_array_sum(elem) 1072 | return res 1073 | 1074 | def reverse_in_place(arr): 1075 | reverse_rec(arr, 0, len(arr) - 1) 1076 | def reverse_rec(arr, i, j): 1077 | if i >= j: 1078 | return 1079 | arr[i], arr[j] = arr[j], arr[i] 1080 | reverse_rec(arr, i + 1, j - 1) 1081 | 1082 | def power(a, p, m): 1083 | if p == 0: 1084 | return 1 1085 | if p % 2 == 0: 1086 | half = power(a, p // 2, m) 1087 | return (half * half) % m 1088 | return (a * power(a, p - 1, m)) % m 1089 | 1090 | def fib(n): 1091 | memo = {} 1092 | def fib_rec(i): 1093 | if i <= 1: 1094 | return 1 1095 | if i in memo: 1096 | return memo[i] 1097 | memo[i] = fib_rec(i - 1) + fib_rec(i - 2) 1098 | return memo[i] 1099 | return fib_rec(n) 1100 | 1101 | def blocks(n): 1102 | memo = dict() 1103 | def roof(n): 1104 | if n == 1: 1105 | return 1 1106 | if n in memo: 1107 | return memo[n] 1108 | memo[n] = roof(n - 1) * 2 + 1 1109 | return memo[n] 1110 | def blocks_rec(n): 1111 | if n == 1: 1112 | return 1 1113 | return blocks_rec(n - 1) * 2 + roof(n) 1114 | return blocks_rec(n) 1115 | 1116 | def max_laminal_sum(arr): 1117 | # return max sum for subliminal array in arr[l:r] 1118 | def max_laminal_sum_rec(l, r): 1119 | if r - l == 1: 1120 | return arr[l] 1121 | mid = (l + r) // 2 1122 | option1 = max_laminal_sum_rec(l, mid) 1123 | option2 = max_laminal_sum_rec(mid, r) 1124 | option3 = sum(arr) 1125 | return max(option1, option2, option3) 1126 | return max_laminal_sum_rec(0, len(arr)) 1127 | ``` 1128 | 1129 | 1130 | ## 34. Linked Lists 1131 | 1132 | ```py 1133 | class Node: 1134 | def __init__(self, val): 1135 | self.val = val 1136 | self.prev = None 1137 | self.next = None 1138 | 1139 | def add_to_end(head, val): 1140 | cur = head 1141 | while cur.next: 1142 | cur = cur.next 1143 | cur.next = Node(val) 1144 | 1145 | class SinglyLinkedList: 1146 | def __init__(self): 1147 | self.head = None 1148 | self._size = 0 1149 | def size(self): 1150 | return self._size 1151 | def push_front(self, val): 1152 | new = Node(val) 1153 | new.next = self.head 1154 | self.head = new 1155 | self._size += 1 1156 | def pop_front(self): 1157 | if not self.head: 1158 | return None 1159 | val = self.head.val 1160 | self.head = self.head.next 1161 | self._size -= 1 1162 | return val 1163 | def push_back(self, val): 1164 | new = Node(val) 1165 | self._size += 1 1166 | if not self.head: 1167 | self.head = new 1168 | return 1169 | cur = self.head 1170 | while cur.next: 1171 | cur = cur.next 1172 | cur.next = new 1173 | def pop_back(self): 1174 | if not self.head: 1175 | return None 1176 | self._size -= 1 1177 | if not self.head.next: 1178 | val = self.head.val 1179 | self.head = None 1180 | return val 1181 | cur = self.head 1182 | while cur.next and cur.next.next: 1183 | cur = cur.next 1184 | val = cur.next.val 1185 | cur.next = None 1186 | return val 1187 | def contains(self, val): 1188 | cur = self.head 1189 | while cur: 1190 | if cur.val == val: 1191 | return cur 1192 | cur = cur.next 1193 | return None 1194 | 1195 | class Node: 1196 | def __init__(self, val): 1197 | self.val = val 1198 | self.next = None 1199 | class Queue: 1200 | def __init__(self): 1201 | self.head = None 1202 | self.tail = None 1203 | self._size = 0 1204 | def empty(self): 1205 | return not self.head 1206 | def size(self): 1207 | return self._size 1208 | def push(self, val): 1209 | new = Node(val) 1210 | if self.tail: 1211 | self.tail.next = new 1212 | self.tail = new 1213 | if not self.head: 1214 | self.head = new 1215 | self._size += 1 1216 | def pop(self): 1217 | if self.empty(): 1218 | raise IndexError('empty queue') 1219 | val = self.head.val 1220 | self.head = self.head.next 1221 | if not self.head: 1222 | self.tail = None 1223 | self._size -= 1 1224 | return val 1225 | 1226 | def copy_list(head): 1227 | if not head: 1228 | return None 1229 | new_head = Node(head.val) 1230 | cur_new = new_head 1231 | cur_old = head.next 1232 | while cur_old: 1233 | cur_new.next = Node(cur_old.val) 1234 | cur_new = cur_new.next 1235 | cur_old = cur_old.next 1236 | return new_head 1237 | 1238 | def reverse_list(head): 1239 | prev = None 1240 | cur = head 1241 | while cur: 1242 | nxt = cur.next 1243 | cur.next = prev 1244 | prev = cur 1245 | cur = nxt 1246 | return prev 1247 | 1248 | def reverse_section(head, left, right): 1249 | dummy = Node(0) 1250 | dummy.next = head 1251 | # find nodes before and after section 1252 | if left == 0: 1253 | prev = dummy 1254 | else: 1255 | prev = node_at_index(head, left - 1) 1256 | if not prev or not prev.next: 1257 | # nothing to reverse 1258 | return head 1259 | nxt = node_at_index(head, right + 1) # may be none 1260 | # break out section 1261 | section_head = prev.next 1262 | prev.next = None 1263 | section_tail = section_head 1264 | while section_tail.next != nxt: 1265 | section_tail = section_tail.next 1266 | section_tail.next = None 1267 | # reverse section, same as reverse linked list solution 1268 | old_section_head = section_head 1269 | new_section_head = reverse_list(section_head) 1270 | # reattach section 1271 | prev.next = new_section_head 1272 | old_section_head.next = nxt 1273 | return dummy.next 1274 | 1275 | def has_cycle(head): 1276 | slow, fast = head, head 1277 | while fast and fast.next: 1278 | slow = slow.next 1279 | fast = fast.next.next 1280 | if slow == fast: 1281 | return True 1282 | return False 1283 | 1284 | def convert_to_array(self, node): 1285 | cur = node 1286 | while cur.prev: 1287 | cur = cur.prev 1288 | res = [] 1289 | while cur: 1290 | res.append(cur.val) 1291 | cur = cur.next 1292 | return res 1293 | 1294 | def get_middle(head): 1295 | slow, fast = head, head 1296 | while fast and fast.next: 1297 | slow = slow.next 1298 | fast = fast.next.next 1299 | return slow 1300 | 1301 | def remove_kth_node(head, k): 1302 | if not head: 1303 | return None 1304 | dummy = Node(0) 1305 | dummy.next = head 1306 | fast = dummy 1307 | slow = dummy 1308 | for _ in range(k): 1309 | fast = fast.next 1310 | while fast and fast.next: 1311 | fast = fast.next 1312 | slow = slow.next 1313 | slow.next = slow.next.next 1314 | return dummy.next 1315 | 1316 | def merge(head1, head2): 1317 | dummy = Node(0) 1318 | cur = dummy 1319 | p1, p2 = head1, head2 1320 | while p1 and p2: 1321 | cur.next = p1 1322 | cur = cur.next 1323 | p1 = p1.next 1324 | cur.next = p2 1325 | p2 = p2.next 1326 | cur = cur.next 1327 | if p1: 1328 | cur.next = p1 1329 | else: 1330 | cur.next = p2 1331 | return dummy.next 1332 | 1333 | def remove_duplicates(head): 1334 | cur = head 1335 | while cur and cur.next: 1336 | if cur.val == cur.next.val: 1337 | cur.next = cur.next.next 1338 | else: 1339 | cur = cur.next 1340 | return head 1341 | ``` 1342 | 1343 | 1344 | ## 35. Trees 1345 | 1346 | ```py 1347 | # DFS 1348 | class Node: 1349 | def __init__(self, val, left=None, right=None): 1350 | self.val = val 1351 | self.left = left 1352 | self.right = right 1353 | def is_leaf(node): 1354 | if not node: 1355 | return False 1356 | return not node.left and not node.right 1357 | def children_values(node): 1358 | if not node: 1359 | return [] 1360 | values = [] 1361 | if node.left: 1362 | values.append(node.left.val) 1363 | if node.right: 1364 | values.append(node.right.val) 1365 | return values 1366 | def grandchildren_values(node): 1367 | if not node: 1368 | return [] 1369 | values = [] 1370 | for child in [node.left, node.right]: 1371 | if child and child.left: 1372 | values.append(child.left.val) 1373 | if child and child.right: 1374 | values.append(child.right.val) 1375 | return values 1376 | def subtree_size(node): 1377 | if not node: 1378 | return 0 1379 | left_size = subtree_size(node.left) 1380 | right_size = subtree_size(node.right) 1381 | return left_size + right_size + 1 # 1 for node 1382 | def subtree_height(node): 1383 | if not node: 1384 | return 0 1385 | left_height = subtree_height(node.left) 1386 | right_height = subtree_height(node.right) 1387 | return max(left_height, right_height) + 1 1388 | 1389 | class Node: 1390 | def __init__(self, id, parent, left, right): 1391 | self.id = id 1392 | self.parent = parent 1393 | self.left = left 1394 | self.right = right 1395 | def is_root(node): 1396 | return not node.parent 1397 | def ancestor_ids(node): 1398 | ids = [] 1399 | while node.parent: 1400 | node = node.parent 1401 | ids.append(node.id) 1402 | return ids 1403 | def depth(node): 1404 | res = 0 1405 | while node.parent: 1406 | node = node.parent 1407 | res += 1 1408 | return res 1409 | def LCA(node1, node2): 1410 | depth1 = depth(node1) 1411 | depth2 = depth(node2) 1412 | while depth1 > depth2: 1413 | node1 = node1.parent 1414 | depth1 -= 1 1415 | while depth2 > depth1: 1416 | node2 = node2.parent 1417 | depth2 -= 1 1418 | while node1.id != node2.id: 1419 | node1 = node1.parent 1420 | node2 = node2.parent 1421 | return node1.id 1422 | def distance(node1, node2): 1423 | lca_id = LCA(node1, node2) 1424 | dist = 0 1425 | while node1.id != lca_id: 1426 | dist += 1 1427 | node1 = node1.parent 1428 | while node2.id != lca_id: 1429 | dist += 1 1430 | node2 = node2.parent 1431 | return dist 1432 | def size(node): 1433 | if not node: 1434 | return 0 1435 | return size(node.left) + size(node.right) + 1 1436 | def preorder(root): 1437 | if not root: 1438 | return 1439 | print(root.val) 1440 | preorder(root.left) 1441 | preorder(root.right) 1442 | def inorder(root): 1443 | if not root: 1444 | return 1445 | inorder(root.left) 1446 | print(root.val) 1447 | inorder(root.right) 1448 | def postorder(root): 1449 | if not root: 1450 | return 1451 | postorder(root.left) 1452 | postorder(root.right) 1453 | print(root.val) 1454 | def visit(node, info_passed_down): 1455 | if base_case: 1456 | return info_to_pass_up 1457 | a = vist(node.left, info_to_pass_down) 1458 | b = visit(node.right, info_to_pass_down) 1459 | global_state = info_stored_globally 1460 | return info_to_pass_up 1461 | 1462 | def longest_aligned_chain(root): 1463 | res = 0 1464 | def visit(node, depth): # inner recursive function 1465 | nonlocal res # to make res visible inside visit() 1466 | if not node: 1467 | return 0 1468 | left_chain = visit(node.left, depth + 1) 1469 | right_chain = visit(node.right, depth + 1) 1470 | current_chain = 0 1471 | if node.val == depth: 1472 | current_chain = 1 + max(left_chain, right_chain) 1473 | res = max(res, current_chain) 1474 | return current_chain 1475 | visit(root, 0) # trigger DFS, which updates global res 1476 | return res 1477 | 1478 | def hidden_message(root): 1479 | message = [] 1480 | def visit(node): 1481 | if not node: 1482 | return 1483 | if node.text[0] == 'b': 1484 | message.append(node.text[1]) 1485 | visit(node.left) 1486 | visit(node.right) 1487 | elif node.text[0] == 'i': 1488 | visit(node.left) 1489 | message.append(node.text[1]) 1490 | visit(node.right) 1491 | else: 1492 | visit(node.left) 1493 | visit(node.right) 1494 | message.append(node.text[1]) 1495 | visit(root) 1496 | return ''.join(message) 1497 | 1498 | def most_stacked(root): 1499 | pos_to_count = dict() 1500 | def visit(node, r, c): 1501 | if not node: 1502 | return 1503 | if (r, c) not in pos_to_count: 1504 | pos_to_count[(r, c)] = 0 1505 | pos_to_count[(r, c)] += 1 1506 | visit(node.left, r + 1, c) 1507 | visit(node.right, r, c + 1) 1508 | visit(root, 0, 0) 1509 | return max(pos_to_count.values()) 1510 | 1511 | def invert(root): 1512 | if not root: 1513 | return None 1514 | root.left, root.right = invert(root.right), invert(root.left) 1515 | return root 1516 | 1517 | def evaluate(root): 1518 | if root.kind == "num": 1519 | return root.num 1520 | children_evals = [] 1521 | for child in root.children: 1522 | children_evals.append(evaluate(child)) 1523 | if root.kind == "sum": 1524 | return sum(children_evals) 1525 | if root.kind == "product": 1526 | return product(children_evals) 1527 | if root.kind == "max": 1528 | return max(children_evals) 1529 | if root.kind == "min": 1530 | return min(children_evals) 1531 | raise ValueError('invalid node kind') 1532 | 1533 | # BFS 1534 | def level_order(root): 1535 | Q = Queue() 1536 | Q.add(root) 1537 | while not Q.empty(): 1538 | node = Q.pop() 1539 | if not node: 1540 | continue 1541 | print(node.eval) 1542 | Q.add(node.left) 1543 | Q.add(node.right) 1544 | 1545 | def node_depth_queue_recipe(root): 1546 | Q = Queue() 1547 | Q.add((root, 0)) 1548 | while not Q.empty(): 1549 | node, depth = Q.pop() 1550 | if not node: 1551 | continue 1552 | # do something with node and depth 1553 | Q.add((node.left, depth+1)) 1554 | Q.add((node.right, depth+1)) 1555 | 1556 | def left_view(root): 1557 | if not root: 1558 | return [] 1559 | Q = Queue() 1560 | Q.add((root, 0)) 1561 | res = [root.val] 1562 | current_depth = 0 1563 | while not Q.empty(): 1564 | node, depth = Q.pop() 1565 | if not node: 1566 | continue 1567 | if depth == current_depth + 1: 1568 | res.append(node.val) 1569 | current_depth += 1 1570 | Q.add((node.left, depth+1)) 1571 | Q.add((node.right, depth+1)) 1572 | return res 1573 | 1574 | def level_counts(root): 1575 | Q = Queue() 1576 | Q.add((root, 0)) 1577 | level_count = defaultdict(int) 1578 | while not Q.empty(): 1579 | node, depth = Q.pop() 1580 | if not node: 1581 | continue 1582 | level_count[depth] += 1 1583 | Q.add((node.left, depth + 1)) 1584 | Q.add((node.right, depth + 1)) 1585 | return level_count 1586 | def most_prolific_level(root): 1587 | level_count = level_counts(root) 1588 | res = -1 1589 | max_prolificness = -1 # less than any valid prolificness 1590 | for level in level_count: 1591 | if level + 1 not in level_count: 1592 | continue 1593 | prolificness = level_count[level + 1] / level_count[level] 1594 | if prolificness > max_prolificness: 1595 | max_prolificness = prolificness 1596 | res = level 1597 | return res 1598 | 1599 | def zig_zag_order(root): 1600 | res = [] 1601 | Q = Queue() 1602 | Q.add((root, 0)) 1603 | cur_level = [] 1604 | cur_depth = 0 1605 | while not Q.empty(): 1606 | node, depth = Q.pop() 1607 | if not node: 1608 | continue 1609 | if depth > cur_depth: 1610 | if cur_depth % 2 == 0: 1611 | res += cur_level 1612 | else: 1613 | res += cur_level[::-1] # reverse order 1614 | cur_level = [] 1615 | cur_depth = depth 1616 | cur_level.append(node) 1617 | Q.add((node.left, depth + 1)) 1618 | Q.add((node.right, depth + 1)) 1619 | if cur_depth % 2 == 0: # add last level 1620 | res += cur_level 1621 | else: 1622 | res += cur_level[::-1] 1623 | return res 1624 | 1625 | # Binary Search Tree 1626 | def find(root, target): 1627 | cur_node = root 1628 | while cur_node: 1629 | if cur_node.val == target: 1630 | return True 1631 | elif cur_node.val > target: 1632 | cur_node = cur_node.left 1633 | else: 1634 | cur_node = cur_node.right 1635 | return False 1636 | 1637 | def find_closest(root, target): 1638 | cur_node = root 1639 | next_above, next_below = math.inf, -math.inf 1640 | while cur_node: 1641 | if cur_node.val == target: 1642 | return cur_node.val 1643 | elif cur_node.val > target: 1644 | next_above = cur_node.val 1645 | cur_node = cur_node.left 1646 | else: 1647 | next_below = cur_node.val 1648 | cur_node = cur_node.right 1649 | if next_above - target < target - next_below: 1650 | return next_above 1651 | return next_below 1652 | 1653 | def is_bst(root): 1654 | prev_value = -math.inf 1655 | res = True 1656 | def visit(node): 1657 | nonlocal prev_value, res 1658 | if not node or not res: 1659 | return 1660 | visit(node.left) 1661 | if node.val < prev_value: 1662 | res = False 1663 | else: 1664 | prev_value = node.val 1665 | visit(node.right) 1666 | visit(root) 1667 | return res 1668 | ``` 1669 | 1670 | 1671 | ## 36. Graphs 1672 | 1673 | ```py 1674 | def num_nodes(graph): 1675 | return len(graph) 1676 | def num_edges(graph): 1677 | count = 0 1678 | for node in range(len(graph)): 1679 | count += len(graph[node]) 1680 | return count // 2 # halved because we counted each edge from both endpoints 1681 | def degree(graph, node): 1682 | return len(graph[node]) 1683 | def print_neighbors(graph, node): 1684 | for nbr in graph[node]: 1685 | print(nbr) 1686 | def build_adjency_list(V, edge_list): 1687 | graph = [[] for _ in range(V)] 1688 | for node1, node2 in edge_list: 1689 | graph[node1].append(node2) 1690 | graph[node2].append(node1) 1691 | return graph 1692 | def adjacent(graph, node1, node2): 1693 | for nbr in graph[node1]: 1694 | if nbr == node2: return True 1695 | return False 1696 | 1697 | def validate(graph): 1698 | V = len(graph) 1699 | for node in range(V): 1700 | seen = get() 1701 | for nbr in graph[node]: 1702 | if nbr < 0 or nbr >= V: return False # invalid node index 1703 | if nbr == node: return False # self-loop 1704 | if nbr in seen: return False # parallel edge 1705 | seen.add(nbr) 1706 | edges = set() 1707 | for node1 in range(V): 1708 | for node2 in graph[node1]: 1709 | edge = (min(node1, node2), max(node1, node2)) 1710 | if edge in edges: 1711 | edges.remove(edge) 1712 | else: 1713 | edges.add(edge) 1714 | return len(edges) == 0 1715 | 1716 | def graph_dfs(graph, start)： 1717 | visited = {start} 1718 | def visit(node): 1719 | # do something 1720 | for nbr in graph[node]: 1721 | if not nbr in visited: 1722 | visited.add(nbr) 1723 | visit(nbr) 1724 | visit(start) 1725 | 1726 | def tree_dfs(root): 1727 | def visit(node): 1728 | # do something 1729 | if root.left: 1730 | visit(root.left) 1731 | if root.right: 1732 | visit(root.right) 1733 | if root: 1734 | visit(root) 1735 | 1736 | def count_connected_components(graph): 1737 | count = 0 1738 | visited = set() 1739 | for node in range(len(graph)): 1740 | if node not in visited: 1741 | visited.add(node) 1742 | visit(node) 1743 | count += 1 1744 | return count 1745 | 1746 | def path(graph, node1, node2): 1747 | predecessors = {node2: None} # starting node doesn't have predecessor 1748 | def visit(node): 1749 | for nbr in graph[node]: 1750 | if nbr not in predecessors: 1751 | predecessors[nbr] = node 1752 | visit(nbr) 1753 | visit(node2) 1754 | if node1 not in predecessors: 1755 | return [] # node1 node2 disconnected 1756 | path = [node1] 1757 | while path[len(path) - 1] != node2: 1758 | path.append(predecessors[path[len(path) - 1]]) 1759 | return path 1760 | 1761 | def is_tree(graph): 1762 | predecessors = {0: None} # start from node 0 (doesn't matter) 1763 | found_cycle = False 1764 | def visit(node): 1765 | nonlocal found_cycle 1766 | if found_cycle: 1767 | return 1768 | for nbr in graph[node]: 1769 | if nbr not in predecessors: 1770 | predecessors[nbr] = node 1771 | visit(nbr) 1772 | elif nbr != predecessors[node]: 1773 | found_cycle = True 1774 | visit(0) 1775 | connected = len(predecessors) == len(graph) 1776 | return not found_cycle and connected 1777 | 1778 | def connected_component_queries(graph, queries): 1779 | node_to_cc = {} 1780 | def visit(node, cc_id): 1781 | if node in node_to_cc: 1782 | return 1783 | node_to_cc[node] = cc_id 1784 | for nbr in graph[node]: 1785 | visit(nbr, cc_id) 1786 | cc_id = 0 1787 | for node in range(len(graph)): 1788 | if node not in node_to_cc: 1789 | visit(node, cc_id) 1790 | cc_id += 1 1791 | res = [] 1792 | for node1, node2 in queries: 1793 | res.append(node_to_cc[node1] == node_to_cc[node2]) 1794 | return res 1795 | 1796 | def strongly_connected(graph): 1797 | V = len(graph) 1798 | visited = set() 1799 | visit(graph, visited, 0) 1800 | if len(visited) < V: 1801 | return False 1802 | reverse_graph = [[] for _ in range(V)] 1803 | for node in range(V): 1804 | for nbr in graph[node]: 1805 | reverse_graph[nbr].append(node) 1806 | reverse_visited = set() 1807 | visit(reverse_graph, reverse_visited, 0) 1808 | return len(reverse_visited) == V 1809 | 1810 | def max_hilliness(graph, heights): 1811 | node_to_cc = label_nodes_with_cc_ids(graph) 1812 | V = len(graph) 1813 | cc_to_elevation_gain_sum = {} 1814 | cc_to_num_edges = {} 1815 | for node in range(V): 1816 | cc = node_to_cc[node] 1817 | if cc not in cc_to_num_edges: 1818 | cc_to_elevation_gain_sum[cc] = 0 1819 | cc_to_num_edges[cc] = 0 1820 | for nbr in graph[node]: 1821 | if nbr > node: 1822 | cc_to_num_edges[cc] += 1 1823 | cc_to_elevation_gain_sum[cc] += abs(heights[node] - heights[nbr]) 1824 | res = 0 1825 | for cc in cc_to_num_edges: 1826 | res = max(res, cc_to_elevation_gain_sum[cc] / cc_to_num_edges[cc]) 1827 | 1828 | def first_time_all_connected(V, cables): 1829 | def visit(graph, visited, node): 1830 | for nbr in graph[node]: 1831 | if nbr not in visited: 1832 | visited.add(nbr) 1833 | visit(graph, visited, nbr) 1834 | 1835 | def is_before(cable_index): 1836 | graph = [[] for _ in range(V)] 1837 | for i in range(cable_index + 1): 1838 | node1, node2 = cables[i] 1839 | graph[node1].append(node2) 1840 | graph[node2].append(node1) 1841 | visited = {0} 1842 | visit(graph, visited, 0) 1843 | return len(visited) < V 1844 | l, r = 0, len(cables) - 1 1845 | if is_before(r): 1846 | return -1 1847 | while r - l > 1: 1848 | mid = l + (r - l) // 2 1849 | if is_before(mid): 1850 | l = mid 1851 | else: 1852 | r = mid 1853 | return r 1854 | 1855 | # BFS 1856 | def graph_bfs(graph, start): 1857 | Q = Queue() 1858 | Q.push(start) 1859 | distances = {start: 0} 1860 | while not Q.empty(): 1861 | node = Q.pop() 1862 | for nbr in graph[node]: 1863 | if nbr not in distances: 1864 | distances[nbr] = distances[node] + 1 1865 | Q.push(nbr) 1866 | # do something 1867 | 1868 | def tree_bfs(root): 1869 | Q = Queue() 1870 | Q.push(root) 1871 | while not Q.empty(): 1872 | node = Q.pop() 1873 | if not node: 1874 | continue 1875 | # do something 1876 | Q.push(node.left) 1877 | Q.push(node.right) 1878 | 1879 | def shortest_path_queries(graph, start, queries): 1880 | Q = Queue() 1881 | Q.push(start) 1882 | predecessors = {start: None} 1883 | while not Q.empty(): 1884 | node = Q.pop() 1885 | for nbr in graph[node]: 1886 | if nbr not in predecessors: 1887 | predecessors[nbr] = node 1888 | Q.push(nbr) 1889 | res = [] 1890 | for node in queries: 1891 | if node not in predecessors: 1892 | res.append([]) 1893 | else: 1894 | path = [node] 1895 | while path[len(path) - 1] != start: 1896 | path.append(predecessors[path[len(path) - 1]]) 1897 | path.reverse() 1898 | res.append(path) 1899 | return res 1900 | 1901 | def walking_distance_to_coffee(graph, node1, node2, node3): 1902 | distances1 = bfs(graph, node1) # BFS 1903 | distances2 = bfs(graph, node2) 1904 | distances3 = bfs(graph, node3) 1905 | res = math.inf 1906 | for i in range(len(graph)): 1907 | res = min(res, distances1[i] + distances2[i] + distances3[i]) 1908 | return res 1909 | 1910 | def multisource_bfs(graph, sources): 1911 | Q = Queue() 1912 | distances = {} 1913 | for start in sources: 1914 | Q.push(start) 1915 | distances[start] = 0 1916 | while not Q.empty(): # BFS 1917 | node = Q.pop() 1918 | for nbr in graph[node]: 1919 | if nbr not in distances: 1920 | distances[nbr] = distances[node] + 1 1921 | Q.push(nbr) 1922 | # do something 1923 | 1924 | def grid_dfs(grid, start_r, start_c): 1925 | # returns if (r, c) is in bounds, not visited, and walkable 1926 | def is_valid(r, c): 1927 | # do something 1928 | directions = [(-1, 0), (1, 0), (0, 1), (0, -1)] 1929 | visited = {(start_r, start_c)} 1930 | def visit(r, c): 1931 | # do something with (r, c) 1932 | for dir_r, dir_c in directions: 1933 | nbr_r, nbr_c = r + dir_r, c + dir_c 1934 | if is_valid(nbr_r, nbr_c): 1935 | visited.add((nbr_r, nbr_c)) 1936 | visit(nbr_r, nbr_c) 1937 | visit(start_r, start_c) 1938 | 1939 | def grid_bfs(grid, start_r, start_c): 1940 | # returns if (r, c) is in bounds, not visited, and walkable 1941 | def is_valid(r, c): 1942 | # do something 1943 | directions = [(-1, 0), (1, 0), (0, 1), (0, -1)] 1944 | Q = Queue() 1945 | Q.push((start_r, start_c)) 1946 | distances = {(start_r, start_c): 0} 1947 | while not Q.empty(): 1948 | r, c = Q.pop() 1949 | for dir_r, dir_c in directions: 1950 | nbr_r, nbr_c = r + dir_r, c + dir_c 1951 | if is_valid(nbr_r, nbr_c): 1952 | distances[(nbr_r, nbr_c)] = distances[(r, c)] + 1 1953 | Q.push((nbr_r, nbr_c)) 1954 | # do something with distances 1955 | 1956 | def count_islands(grid): 1957 | R, C = len(grid), len(grid[0]) 1958 | count = 0 1959 | visited = set() 1960 | for r in range(R): 1961 | for c in range(C): 1962 | if grid[r][c] == 1 and (r, c) not in visited: 1963 | visited.add((r, c)) 1964 | dfs(grid, visited, r, c) # normal grid DFS 1965 | count += 1 1966 | return count 1967 | 1968 | def exit_distances(maze): 1969 | R, C = len(maze), len(maze[0]) 1970 | directions = [(-1, 0), (1, 0), (0, 1), (0, -1)] 1971 | distances = [[-1] * C for _ in range(R)] 1972 | Q = Queue() 1973 | for r in range(R): 1974 | for c in range(C): 1975 | if maze[r][c] == 'o': 1976 | distances[r][c] = 0 1977 | Q.push((r, c)) 1978 | while not Q.empty(): 1979 | r, c = Q.pop() 1980 | for dir_r, dir_c in directions: 1981 | nbr_r, nbr_c = r + dir_r, c + dir_c 1982 | if (0 <= nbr_r < R and 0 <= nbr_c < C and 1983 | maze[nbr_r][nbr_c] != 'x' and distances[nbr_r][nbr_c] == -1): 1984 | distances[nbr_r][nbr_c] = distances[r][c] + 1 1985 | Q.push((nbr_r, nbr_c)) 1986 | return distances 1987 | 1988 | def segment_distance(min1, max1, min2, max2): 1989 | return max(0, max(min1, min2) - min(max1, max2)) 1990 | def distance(furniture1, furniture2): 1991 | x_min1, y_min1, x_max1, y_max1 = furniture1 1992 | x_min2, y_min2, x_max2, y_max2 = furniture2 1993 | x_gap = segment_distance(x_min1, x_max1, x_min2, x_max2) 1994 | y_gap = segment_distance(y_min1, y_max1, y_min2, y_max2) 1995 | if x_gap == 0: 1996 | return y_gap 1997 | elif y_gap == 0: 1998 | return x_gap 1999 | else: 2000 | return math.sqrt(x_gap ** 2 + y_gap ** 2) 2001 | def can_reach(furniture, d): 2002 | V = len(furniture) 2003 | graph = [[] for _ in range(V)] 2004 | for i in range(V): 2005 | for j in range(i + 1, V): 2006 | if distance(furniture[i], furniture[j]) <= d: 2007 | graph[i].append(j) 2008 | graph[j].append(i) 2009 | visited = {0} 2010 | def visit(node): # DFS 2011 | # ... 2012 | visit(0) 2013 | return V-1 in visited 2014 | ``` 2015 | 2016 | 2017 | ## 37. Heaps 2018 | 2019 | ```py 2020 | def first_k(arr, k): 2021 | arr.sort() 2022 | return arr[:k] 2023 | def first_k_min_heap(arr, k): 2024 | min_heap = Heap(priority_comparator=lambda x, y: x < y, heap=arr) 2025 | res = [] 2026 | for i in range(k): 2027 | res.append(min_heap.pop()) 2028 | return res 2029 | 2030 | def parent(idx): 2031 | if idx == 0: 2032 | return -1 # root has no parent 2033 | return (idx - 1) // 2 2034 | def left_child(idx): 2035 | return 2 * idx + 1 2036 | def right_child(idx): 2037 | return 2 * idx + 2 2038 | 2039 | class Heap: 2040 | # if higher_priority(x, y) is True, x has higher priority than y 2041 | def __init__(self, higher_priority=lambda x, y: x < y, heap=None): 2042 | self.heap = [] 2043 | if heap is not None: 2044 | self.heap = heap 2045 | self.heap = heap if heap is not None else [] 2046 | self.higher_priority = higher_priority 2047 | if heap: 2048 | self.heapify() 2049 | def size(self): 2050 | return len(self.heap) 2051 | def top(self): 2052 | if not self.heap: 2053 | return None 2054 | return self.heap[0] 2055 | def push(self, elem): 2056 | self.heap.append(elem) 2057 | self.bubble_up(len(self.heap)-1) 2058 | def bubble_up(self, idx): 2059 | if idx == 0: 2060 | return # root can't be bubbled up 2061 | parent_idx = parent(idx) 2062 | if self.higher_priority(self.heap[idx], self.heap[parent_idx]): 2063 | self.heap[idx], self.heap[parent_idx] = self.heap[parent_idx], self.heap[idx] 2064 | self.bubble_up[parent_idx] 2065 | def pop(self): 2066 | if not self.heap: return None 2067 | top = self.heap[0] 2068 | if len(self.heap) == 1: 2069 | self.heap = [] 2070 | return top 2071 | self.heap[0] = self.heap[-1] 2072 | self.heap.pop() 2073 | self.bubble_down(0) 2074 | return top 2075 | def bubble_down(self, idx): 2076 | l_i, r_i = left_child(idx), right_child(idx) 2077 | is_leaf = l_i >= len(self.heap) 2078 | if is_leaf: return # leaves can't be bubbled down 2079 | child_i = l_i # index for highest priority child 2080 | if r_i < len(self.heap) and self.higher_priority(self.heap[r_i], self.heap[l_i]): 2081 | child_i = r_i 2082 | if self.higher_priority(self.heap[child_i], self.heap[idx]): 2083 | self.heap[idx], self.heap[child_i] = self.heap[child_i], self.heap[idx] 2084 | self.bubble_down(child_i) 2085 | def heapify(self): 2086 | for idx in range(len(self.heap) // 2, -1, -1): 2087 | self.bubble_down(idx) 2088 | 2089 | def heapsort(arr): 2090 | min_heap = Heap(priority_comparator=lambda x, y: x < y, heap=arr) 2091 | res = [] 2092 | for _ in range(len(arr)): 2093 | res.append(min_heap.pop()) 2094 | return res 2095 | 2096 | class TopSongs: 2097 | def __init__(self, k): 2098 | self.k = k 2099 | self.min_heap = Heap(higher_priority=lambda x, y: x[1] < y[1]) 2100 | def register_plays(self, title, plays): 2101 | if self.min_heap.size() < self.k: 2102 | self.min_heap.push((title, plays)) 2103 | elif plays > self.min_heap.top()[1]: 2104 | self.min_heap.pop() 2105 | self.min_heap.push((title, plays)) 2106 | def top_k(self): 2107 | top_songs = [] 2108 | for title, _ in self.min_heap.heap: 2109 | top_songs.append(title) 2110 | return top_songs 2111 | 2112 | class TopSongsWithUpdates: 2113 | def __init__(self, k): 2114 | self.k = k 2115 | self.max_heap = Heap(higher_priority=lambda x, y: x[1] > y[1]) 2116 | self.total_plays = {} 2117 | def register_plays(self, title, plays): 2118 | new_total_plays = plays 2119 | if title in self.total_plays: 2120 | new_total_plays += self.total_plays[title] 2121 | sekf.total_plays[title] = new_total_plays 2122 | self.max_heap.push((title, new_total_plays)) 2123 | def top_k(self): 2124 | top_songs = [] 2125 | while len(top_songs) < self.k and self.max_heap.size() > 0: 2126 | title, plays = self.max_heap.pop() 2127 | if self.total_plays[title] == plays: # not stale 2128 | top_songs.append(title) 2129 | # restore max-heap 2130 | for title in top_songs: 2131 | self.max_heap.push((title, self.total_plays[title])) 2132 | return top_songs 2133 | 2134 | class PopularSongs: 2135 | def __init__(self): 2136 | # max-heap for lower half 2137 | self.lower_max_heap = Heap(higher_priority=lambda x, y: x > y) 2138 | # min-heap for upper half 2139 | self.upper_min_heap = Heap() 2140 | self.play_counts = {} 2141 | def register_plays(self, title, plays): 2142 | self.play_counts[title] = plays 2143 | if self.upper_min_heap.size() == 0 or plays >= self.upper_min_heap.top(): 2144 | self.upper_min_heap.push(plays) 2145 | else: 2146 | self.lower_max_heap.push(plays) 2147 | # distribute elements if they're off by more than one 2148 | if self.lower_max_heap.size() > self.upper_min_heap.size(): 2149 | self.upper_min_heap.push(self.lower_max_heap.pop()) 2150 | elif self.upper_min_heap.size() > self.lower_max_heap.size() + 1: 2151 | self.lower_max_heap.push(self.upper_min_heap.pop()) 2152 | def is_popular(self, title): 2153 | if title not in self.play_counts: 2154 | return False 2155 | if self.lower_max_heap.size() == self.upper_min_heap.size(): 2156 | median = (self.upper_min_heap.top() + self.lower_max_heap.top()) / 2 2157 | else: 2158 | median = self.upper_min_heap.top() 2159 | return self.play_counts[title] > median 2160 | 2161 | def top_k_across_genres(genres, k): 2162 | initial_elems = [] # (plays, genre_index, song_index) tuples. 2163 | for genre_index, song_list in enumerate(genres): 2164 | plays = song_list[0][1] 2165 | initial_elems.append((plays, genre_index, 0)) 2166 | max_heap = Heap(higher_priority=lambda x, y: x[0] > y[0], heap=initial_elems) 2167 | top_k = [] 2168 | while len(top_k) < k and max_heap.size() > 0: 2169 | plays, genre_index, song_index = max_heap.pop() 2170 | song_name = genres[genre_index][song_index][0] 2171 | top_k.append(song_name) 2172 | song_index += 1 2173 | if song_index < len(genres[genre_index]): 2174 | plays = genres[genre_index][song_index][1] 2175 | max_heap.push((plays, genre_index, song_index)) 2176 | return top_k 2177 | 2178 | def make_playlist(songs): 2179 | # group songs by artist 2180 | artist_to_songs = {} 2181 | for song, artist in songs: 2182 | if artist not in artist_to_songs: 2183 | artist_to_songs[artist] = [] 2184 | artist_to_songs[artist].append(song) 2185 | heap = Heap(higher_priority=lambda a, b: len(a[1]) > len(b[1])) 2186 | for artist, song_list in artist_to_songs.items(): 2187 | heap.push((artist, song_list)) 2188 | res = [] 2189 | last_artist = None 2190 | while heap.size() > 0: 2191 | artist, song_list = heap.pop() 2192 | if artist != last_artist: 2193 | res.append(song_list.pop()) 2194 | last_artist = artist 2195 | if song_list: # if artist has more songs, readd it 2196 | heap.push((artist, song_list)) 2197 | else: 2198 | # find different artist 2199 | if heap.size() == 0: 2200 | return [] # no valid solution 2201 | artist2, song_list2 = heap.pop() 2202 | res.append(song_list2.pop()) 2203 | last_artist = artist2 2204 | # readd artists we popped 2205 | if song_list2: 2206 | heap.push((artist2, song_list2)) 2207 | heap.push((artist, song_list)) 2208 | return res 2209 | 2210 | def sum_of_powers(primes, n): 2211 | m = 10**9 + 7 2212 | # initialize heap with first power of each prime 2213 | # each element is tuple (power, base) 2214 | elems = [(p, p) for p in primes] 2215 | min_heap = Heap(higher_priority=lambda x, y: x[0] < y[0], heap=elems) 2216 | res = 0 2217 | for _ in range(n): 2218 | power, base = min_heap.pop() 2219 | res = (res + power) % m 2220 | min_heap.push(((power * base) % m, base)) 2221 | return res 2222 | ``` 2223 | 2224 | 2225 | ## 38. Sliding Windows 2226 | 2227 | ```py 2228 | def most_weekly_sales(sales): 2229 | l, r = 0, 0 2230 | window_sum = 0 2231 | cur_max = 0 2232 | while r < len(sales): 2233 | window_sum += sales[r] 2234 | r += 1 2235 | if r - l == 7: 2236 | cur_max = max(cur_max, window_sum) 2237 | window_sum -= sales[l] 2238 | l += 1 2239 | return cur_max 2240 | 2241 | def has_unique_k_days(best_seller, k): 2242 | l, r = 0, 0 2243 | window_counts = {} 2244 | while r < len(best_seller): 2245 | if not best_seller[r] in window_counts: 2246 | window_counts[best_seller[r]] = 0 2247 | window_counts[best_seller[r]] += 1 2248 | r += 1 2249 | if r - l == k: 2250 | if len(window_counts) == k: 2251 | return True 2252 | window_counts[best_seller[l]] -= 1 2253 | if window_counts[best_seller[l]] == 0: 2254 | del window_counts[best_seller[l]] 2255 | l += 1 2256 | return False 2257 | 2258 | def max_no_bad_days(sales): 2259 | l, r = 0, 0 2260 | cur_max = 0 2261 | while r < len(sales): 2262 | can_grow = sales[r] >= 10 2263 | if can_grow: 2264 | r += 1 2265 | cur_max = max(cur_max, r - l) 2266 | else: 2267 | l = r + 1 2268 | r = r + 1 2269 | return cur_max 2270 | 2271 | def has_enduring_best_seller_streak(best_seller, k): 2272 | l, r = 0, 0 2273 | cur_max = 0 2274 | while r < len(best_seller): 2275 | can_grow = l == r or best_seller[l] == best_seller[r] 2276 | if can_grow: 2277 | r += 1 2278 | if r - l == k: 2279 | return True 2280 | else: 2281 | l = r 2282 | return False 2283 | 2284 | def max_subarray_sum(arr): 2285 | max_val = max(arr) 2286 | if max_val <= 0: # edge case without positive values 2287 | return max_val 2288 | l, r = 0, 0 2289 | window_sum = 0 2290 | cur_max = 0 2291 | while r < len(arr): 2292 | can_grow = window_sum + arr[r] >= 0 2293 | if can_grow: 2294 | window_sum += arr[r] 2295 | r += 1 2296 | cur_max = max(cur_max, window_sum) 2297 | else: 2298 | window_sum = 0 2299 | l = r + 1 2300 | r = r + 1 2301 | return cur_max 2302 | 2303 | def max_at_most_3_bad_days(sales): 2304 | l, r = 0, 0 2305 | window_bad_days = 0 2306 | cur_max = 0 2307 | while r < len(sales): 2308 | can_grow = sales[r] >= 10 or window_bad_days < 3 2309 | if can_grow: 2310 | if sales[r] < 10: 2311 | window_bad_days += 1 2312 | r += 1 2313 | cur_max = max(cur_max, r - l) 2314 | else: 2315 | if sales[l] < 10: 2316 | window_bad_days -= 1 2317 | l += 1 2318 | return cur_max 2319 | 2320 | def max_consecutive_with_k_boosts(projected_sales, k): 2321 | l, r = 0, 0 2322 | used_boosts = 0 2323 | cur_max = 0 2324 | while r < len(projected_sales): 2325 | can_grow = used_boosts + max(10 - projected_sales[r], 0) <= k 2326 | if can_grow: 2327 | used_boosts += max(10 - projected_sales[r], 0) 2328 | r += 1 2329 | cur_max = max(cur_max, r - l) 2330 | elif l == r: 2331 | r += 1 2332 | l += 1 2333 | else: 2334 | used_boosts -= max(10 - projected_sales[l], 0) 2335 | l += 1 2336 | return cur_max 2337 | 2338 | def max_at_most_k_distinct(best_seller, k): 2339 | l, r = 0, 0 2340 | window_counts = {} 2341 | cur_max = 0 2342 | while r < len(best_seller): 2343 | can_grow = best_seller[r] in window_counts or len(window_counts) + 1 <= k 2344 | if can_grow: 2345 | if not best_seller[r] in window_counts: 2346 | window_counts[best_seller[r]] = 0 2347 | window_counts[best_seller[r]] += 1 2348 | r += 1 2349 | cur_max = max(cur_max, r - l) 2350 | else: 2351 | window_counts[best_seller[l]] -= 1 2352 | if window_counts[best_seller[l]] == 0: 2353 | del window_counts[best_seller[l]] 2354 | l += 1 2355 | return cur_max 2356 | 2357 | def shortest_over_20_sales(sales): 2358 | l, r = 0, 0 2359 | window_sum = 0 2360 | cur_min = math.inf 2361 | while True: 2362 | must_grow = window_sum <= 20 2363 | if must_grow: 2364 | if r == len(sales): 2365 | break 2366 | window_sum += sales[r] 2367 | r += 1 2368 | else: 2369 | cur_min = min(cur_min, r - l) 2370 | window_sum -= sales[l] 2371 | l += 1 2372 | if cur_min == math.inf: 2373 | return -1 2374 | return cur_min 2375 | 2376 | def shortest_with_all_letters(s1, s2): 2377 | l, r = 0, 0 2378 | missing = {} 2379 | for c in s2: 2380 | if not c in missing: 2381 | missing[c] = 0 2382 | missing[c] += 1 2383 | distinct_missing = len(missing) 2384 | cur_min = math.inf 2385 | while True: 2386 | must_grow = distinct_missing > 0 2387 | if must_grow: 2388 | if r == len(s1): 2389 | break 2390 | if s1[r] in missing: 2391 | missing[s1[r]] -= 1 2392 | if missing[s1[r]] == 0: 2393 | distinct_missing -= 1 2394 | r += 1 2395 | else: 2396 | cur_min = min(cur_min, r - l) 2397 | if s1[l] in missing: 2398 | missing[s1[l]] += 1 2399 | if missing[s1[l]] == 1: 2400 | distinct_missing += 1 2401 | l += 1 2402 | return cur_min if cur_min != math.inf else -1 2403 | 2404 | def smallest_range_with_k_elements(arr, k): 2405 | arr.sort() 2406 | l, r = 0, 0 2407 | best_low, best_high = 0, math.inf 2408 | while True: 2409 | must_grow = (r - l) < k 2410 | if must_grow: 2411 | if r == len(arr): 2412 | break 2413 | r += 1 2414 | else: 2415 | if arr[r - 1] - arr[l] < best_high - best_low: 2416 | best_low, best_high = arr[l], arr[r - 1] 2417 | l += 1 2418 | return [best_low, best_high] 2419 | 2420 | def count_at_most_k_bad_days(sales, k): 2421 | l, r = 0, 0 2422 | window_bad_days = 0 2423 | count = 0 2424 | while r < len(sales): 2425 | can_grow = sales[r] >= 10 or window_bad_days < k 2426 | if can_grow: 2427 | if sales[r] < 10: 2428 | window_bad_days += 1 2429 | r += 1 2430 | count += r - l 2431 | else: 2432 | if sales[l] < 10: 2433 | window_bad_days -= 1 2434 | l += 1 2435 | return count 2436 | 2437 | def count_exactly_k_bad_days(sales, k): 2438 | if k == 0: 2439 | return count_at_most_k_bad_days(sales, 0) 2440 | return count_at_most_k_bad_days(sales, k) - count_at_most_k_bad_days(sales, k - 1) 2441 | 2442 | def count_at_least_k_bad_days(sales, k): 2443 | n = len(sales) 2444 | total_subarrays = n * (n + 1) // 2 2445 | if k == 0: 2446 | return total_subarrays 2447 | return total_subarrays - count_at_most_k_bad_days(sales, k - 1) 2448 | 2449 | def count_at_most_k_drops(arr, k): 2450 | l, r = 0, 0 2451 | window_drops = 0 2452 | count = 0 2453 | while r < len(arr): 2454 | can_grow = r == 0 or arr[r] >= arr[r - 1] or window_drops < k 2455 | if can_grow: 2456 | if r > 0 and arr[r] < arr[r - 1]: 2457 | window_drops += 1 2458 | r += 1 2459 | count += r - l 2460 | else: 2461 | if arr[l] > arr[l + 1]: 2462 | window_drops -= 1 2463 | l += 1 2464 | return count 2465 | def count_exactly_k_drops(arr, k): 2466 | if k == 0: 2467 | return count_at_least_k_drops(arr, 0) 2468 | return count_at_most_k_drops(arr, k) - count_at_most_k_drops(arr, k - 1) 2469 | def count_at_least_k_drops(arr, k): 2470 | n = len(arr) 2471 | total_count = n * (n + 1) // 2 2472 | if k == 0: 2473 | return total_count 2474 | return total_count - count_at_most_k_drops(arr, k - 1) 2475 | 2476 | def count_bad_days_range(sales, k1, k2): 2477 | if k1 == 0: 2478 | return count_at_least_k_bad_days(sales, k2) 2479 | return count_at_least_k_bad_days(sales, k2) - count_at_least_k_bad_days(sales, k1 - 1) 2480 | 2481 | def count_all_3_groups(arr): 2482 | n = len(arr) 2483 | total_count = n * (n + 1) // 2 2484 | return total_count - count_at_most_2_groups(arr) 2485 | def count_at_most_2_groups(arr): 2486 | l, r = 0, 0 2487 | window_counts = {} 2488 | count = 0 2489 | while r < len(arr): 2490 | can_grow = arr[r] % 3 in window_counts or len(window_counts) < 2 2491 | if can_grow: 2492 | if not arr[r] % 3 in window_counts: 2493 | window_counts[arr[r] % 3] = 0 2494 | window_counts[arr[r] % 3] += 1 2495 | r += 1 2496 | count += r - l 2497 | else: 2498 | window_counts[arr[l] % 3] -= 1 2499 | if window_counts[arr[l] % 3] == 0: 2500 | del window_counts[arr[l] % 3] 2501 | l += 1 2502 | return count 2503 | ``` 2504 | 2505 | 2506 | ## 39. Backtracking 2507 | 2508 | ```py 2509 | def max_sum_path(grid): 2510 | # inefficient backtracking solution, DP is better 2511 | max_sum = -math.inf 2512 | R, C = len(grid), len(grid[0]) 2513 | def visit(r, c, cur_sum): 2514 | nonlocal max_sum 2515 | if r == R - 1 and c == C - 1: 2516 | max_sum = max(max_sum, cur_sum) 2517 | return 2518 | if r + 1 < R: 2519 | visit(r + 1, c, cur_sum + grid[r + 1][c]) # go down 2520 | if c + 1 < C: 2521 | visit(r, c + 1, cur_sum + grid[r][c + 1]) # go right 2522 | visit(0, 0, grid[0][0]) 2523 | return max_sum 2524 | 2525 | # backtracking 2526 | def visit(partial_solution): 2527 | if full_solution(partial_solution): 2528 | # process leaf/full solution 2529 | else: 2530 | for choice in choices(partial_solution): 2531 | # prune children where possible 2532 | child = apply_choice(partial_solution) 2533 | visit(child) 2534 | visit(empty_solution) 2535 | 2536 | def all_subsets(S): 2537 | res = [] # gloabl list of subsets 2538 | subset = [] # state of current partial solution 2539 | def visit(i): 2540 | if i == len(S): 2541 | res.append(subset.copy()) 2542 | return 2543 | # choice 1: pick S[i] 2544 | subset.append(S[i]) 2545 | visit(i + 1) 2546 | subset.pop() # cleanup work, undo choice 1 2547 | # choice 2: skip S[i] 2548 | visit(i + 1) 2549 | visit(0) 2550 | return res 2551 | 2552 | def generate_permutation(arr): 2553 | res = [] 2554 | perm = arr.copy() 2555 | def visit(i): 2556 | if i == len(perm) - 1: 2557 | res.append(perm.copy()) 2558 | return 2559 | for j in range(i, len(perm)): 2560 | perm[i], perm[j] = perm[j], perm[i] # pick perm[j] 2561 | visit(i + 1) 2562 | perm[i], perm[j] = perm[j], perm[i] # cleanup work, undo change 2563 | visit(0) 2564 | return res 2565 | 2566 | def generate_sentences(sentence, synonyms): 2567 | words = sentence.split() 2568 | res = [] 2569 | cur_sentence = [] 2570 | def visit(i): 2571 | if i == len(words): 2572 | res.append(" ".join(cur_sentence)) 2573 | return 2574 | if words[i] not in synonyms: 2575 | choices = [words[i]] 2576 | else: 2577 | choices = synonyms.get(words[i]) 2578 | for choice in choices: 2579 | cur_sentence.append(choice) 2580 | visit(i + 1) 2581 | cur_sentence.pop() # undo change 2582 | visit(0) 2583 | return res 2584 | 2585 | def jumping_numbers(n): 2586 | res = [] 2587 | def visit(num): 2588 | if num >= n: 2589 | return 2590 | res.append(num) 2591 | last_digit = num % 10 2592 | if last_digit > 0: 2593 | visit(num * 10 + (last_digit - 1)) 2594 | if last_digit < 9: 2595 | visit(num * 10 + (last_digit + 1)) 2596 | for num in range(1, 10): 2597 | visit(num) 2598 | return sorted(res) 2599 | 2600 | def maximize_style(budget, prices, ratings): 2601 | best_rating_sum = 0 2602 | best_items = [] 2603 | n = len(prices) 2604 | items = [] 2605 | def visit(i, cur_cost, cur_rating_sum): 2606 | nonlocal best_items, best_rating_sum 2607 | if i == n: 2608 | if cur_rating_sum > best_rating_sum: 2609 | best_rating_sum = cur_rating_sum 2610 | best_items = items.copy() 2611 | return 2612 | # choice 1: skip item i 2613 | visit(i + 1, cur_cost, cur_rating_sum) 2614 | # choice 2: pick item i (if within budget) 2615 | if cur_cost + prices[i] <= budget: 2616 | items.append(i) 2617 | visit(i + 1, cur_cost + prices[i], cur_rating_sum + ratings[i]) 2618 | items.pop() 2619 | visit(0, 0, 0) 2620 | return best_items 2621 | ``` 2622 | 2623 | 2624 | ## 40. Dynamic Programming 2625 | 2626 | ```py 2627 | def delay(times): 2628 | n = len(times) 2629 | if n < 3: 2630 | return 0 2631 | memo = {} 2632 | def delay_rec(i): 2633 | if i >= n - 3: 2634 | return times[i] 2635 | if i in memo: 2636 | return memo[i] 2637 | memo[i] = times[i] + min(delay_rec(i + 1), delay_rec(i + 2), delay_rec(i + 3)) 2638 | return memo[i] 2639 | return min(delay_rec[0], delay_rec(1), delay_rec(2)) 2640 | 2641 | # memoization 2642 | # memo = empty map 2643 | # f(subproblem_id): 2644 | # if subproblem is base case: 2645 | # return result direcly 2646 | # if subproblem in memo map: 2647 | # return cached result 2648 | # memo[subproblem_id] = recurrence relation formula 2649 | # return memo[subproblem_id] 2650 | # return f(initial subproblem) 2651 | 2652 | def max_path(grid): 2653 | R, C = len(grid), len(grid[0]) 2654 | memo = {} 2655 | def max_path_rec(r, c): 2656 | if r == R - 1 and c == C - 1: 2657 | return grid[r][c] 2658 | if (r, c) in memo: 2659 | return memo[(r, c)] 2660 | elif r == R - 1: 2661 | memo[(r, c)] = grid[r][c] + max_path_rec(r, c + 1) 2662 | elif c == C - 1: 2663 | memo[(r, c)] = grid[r][c] + max_path_recI(r + 1, c) 2664 | else: 2665 | memo[(r, c)] = grid[r][c] + max(max_path_rec(r + 1, c), max_path_rec(r, c + 1)) 2666 | return memo[(r, c)] 2667 | return max_path_rec(0, 0) 2668 | 2669 | def min_split(arr, k): 2670 | n = len(arr) 2671 | memo = {} 2672 | def min_split_rec(i, x): 2673 | if (i, x) in memo: 2674 | return memo[(i, x)] 2675 | # base case 2676 | if n - i == x: # put each element in its own subarray 2677 | memo[(i, x)] = max(arr[i:]) 2678 | elif x == 1: # put all elements in one subarray 2679 | memo[(i, x)] = sum(arr[i:]) 2680 | else: # general case 2681 | current_sum = 0 2682 | res = math.inf 2683 | for p in range(i, n - x + 1): 2684 | current_sum += arr[p] 2685 | res = min(res, max(current_sum, min_split_rec(p + 1, x - 1))) 2686 | memo[(i, x)] = res 2687 | return memo[(i, x)] 2688 | return min_split_rec(0, k) 2689 | 2690 | def num_ways(): 2691 | memo = {} 2692 | def num_ways_rec(i): 2693 | if i > 21: 2694 | return 1 2695 | if 16 <= i <= 21: 2696 | return 0 2697 | if i in memo: 2698 | return memo[i] 2699 | memo[i] = 0 2700 | for card in range(1, 11): 2701 | memo[i] += num_ways_rec(i + card) 2702 | return memo[i] 2703 | return num_ways_rec(0) 2704 | 2705 | def lcs(s1, s2): 2706 | memo = {} 2707 | def lcs_rec(i1, i2): 2708 | if i1 == len(s1) or i2 == len(s2): 2709 | return 0 2710 | if (i1, i2) in memo: 2711 | return memo[(i1, i2)] 2712 | if s1[i1] == s2[i2]: 2713 | memo[(i1, i2)] = 1 + lcs_rec(i1 + 1, i2 + 1) 2714 | else: 2715 | memo[(i1, i2)] = max(lcs_rec(i1 + 1, i2), lcs_rec(i1, i2 + 1)) 2716 | return memo[(i1, i2)] 2717 | return lcs_rec(0, 0) 2718 | 2719 | def lcs_reconstruction(s1, s2): 2720 | memo = {} 2721 | def lcs_res(i1, i2): 2722 | if i1 == len(s1) or i2 == len(s2): 2723 | return "" 2724 | if (i1, i2) in memo: 2725 | return memo[(i1, i2)] 2726 | if s1[i1] == s2[i2]: 2727 | memo[(i1, i2)] = s1[i1] + lcs_res(i1 + 1, i2 + 1) 2728 | else: 2729 | opt1, opt2 = lcs_rec(i1 + 1, i2), lcs_res(i1, i2 + 1) 2730 | if len(opt1) >= len(opt2): 2731 | memo[(i1, i2)] = opt1 2732 | else: 2733 | memo[(i1, i2)] = opt2 2734 | return memo[(i1, i2)] 2735 | return lcs_res(0, 0) 2736 | 2737 | def lcs_reconstruction_optimal(s1, s2): 2738 | memo = {} 2739 | def lcs_rec(s1, s2): 2740 | # same as before 2741 | i1, i2 = 0, 0 2742 | res = [] 2743 | while i1 < len(s1) and i2 < len(s2): 2744 | if s1[i1] == s2[i2]: 2745 | res.append(s1[i1]) 2746 | i1 += 1 2747 | i2 += 1 2748 | elif lcs_rec(i1 + 1, i2) > lcs_rec(i1, i2 + 1): 2749 | i1 += 1 2750 | else: 2751 | i2 += 1 2752 | return ''.join(res) 2753 | 2754 | def delay(times): 2755 | n = len(times) 2756 | if n < 3: 2757 | return 0 2758 | dp = [0] * n 2759 | dp[n - 1], dp[n - 2], dp[n - 3] = times[n - 1], times[n - 2], times[n - 3] 2760 | for i in range(n - 4, -1, -1): 2761 | dp[i] = times[i] + min(dp[i + 1], dp[i + 2], dp[i + 3]) 2762 | return min(dp[0], dp[1], dp[2]) 2763 | 2764 | def delay_optimized(times): 2765 | n = len(times) 2766 | if n < 3: 2767 | return 0 2768 | dp1, dp2, dp3 = times[n - 3], times[n - 2], times[n - 1] 2769 | for i in range(n - 4, -1, -1): 2770 | cur = times[i] + min(dp1, dp2, dp3) 2771 | dp1, dp2, dp3 = cur, dp1, dp2 2772 | return min(dp1, dp2, dp3) 2773 | ``` 2774 | 2775 | 2776 | ## 41. Greedy Algorithms 2777 | 2778 | ```py 2779 | def most_non_overlapping_intervals(intervals): 2780 | intervals.sort(key=lambda x: x[1]) 2781 | count = 0 2782 | prev_end = -math.inf 2783 | for l, r in intervals: 2784 | if l > prev_end: 2785 | count += 1 2786 | prev_end = r 2787 | return count 2788 | 2789 | def can_reach_goal(jumping_points, k, max_aging): 2790 | n = len(jumping_points) 2791 | gaps = [] 2792 | for i in range(1, n): 2793 | gaps.append(jumping_points[i] - jumping_points[i - 1]) 2794 | gaps.sort() 2795 | total_aging = sum(gaps[:n - 1 - k]) 2796 | return total_aging <= max_aging 2797 | 2798 | def minimize_distance(points, center1, center2): 2799 | n = len(points) 2800 | assignment = [0] * n 2801 | baseline = 0 2802 | c1_count = 0 2803 | for i, p in enumerate(points): 2804 | if dist(p, center1) <= dist(p, center2): 2805 | assignment[i] = 1 2806 | baseline += dist(p, center1) 2807 | c1_count += 1 2808 | else: 2809 | assignment[i] = 2 2810 | baseline += dist(p, center2) 2811 | if c1_count == n // 2: 2812 | return baseline 2813 | switch_costs = [] 2814 | for i, p in enumerate(points): 2815 | if assignment[i] == 1 and c1_count > n // 2: 2816 | switch_costs.append(dist(p, center2) - dist(p, center1)) 2817 | if assignment[i] == 2 and c1_count < n // 2: 2818 | switch_costs.append(dist(p, center1) - dist(p, center2)) 2819 | res = baseline 2820 | switch_costs.sort() 2821 | for cost in switch_costs[:abs(c1_count - n // 2)]: 2822 | res += cost 2823 | return res 2824 | 2825 | def min_middle_sum(arr): 2826 | arr.sort() 2827 | middle_sum = 0 2828 | for i in range(len(arr) // 3): 2829 | middle_sum += arr[i * 2 + 1] 2830 | return middle_sum 2831 | 2832 | def min_script_runs(meetings): 2833 | meetings.sort(key=lambda x: x[1]) 2834 | count = 0 2835 | prev_end = -math.inf 2836 | for l, r in meetings: 2837 | if l > prev_end: 2838 | count += 1 2839 | prev_end = r 2840 | return count 2841 | 2842 | def latest_reachable_year(jumping_points, k, max_aging): 2843 | gaps = [] 2844 | for i in range(1, len(jumping_points)): 2845 | gaps.append(jumping_points[i] - jumping_points[i - 1]) 2846 | min_heap = Heap() 2847 | total_gap_sum = 0 2848 | sum_heap = 0 2849 | for i, gap in enumerate(gaps): 2850 | aged = total_gap_sum - sum_heap 2851 | min_heap.push(gap) 2852 | sum_heap += gap 2853 | total_gap_sum += gap 2854 | if min_heap.size() > k: 2855 | smallest_jump = min_heap.pop() 2856 | sum_heap -= smallest_jump 2857 | new_aged = total_gap_sum - sum_heap 2858 | if new_aged > max_aging: 2859 | # we can't reach the end of gap i 2860 | # we get to jumping_points[i] and age naturally from there 2861 | remaining_aging = max_aging - aged 2862 | return jumping_points[i] + remaining_aging 2863 | # reached last jumping point 2864 | aged = total_gap_sum - sum_heap 2865 | remaining_aging = max_aging - aged 2866 | return jumping_points[len(jumping_points) - 1] + remaining_aging 2867 | ``` 2868 | 2869 | 2870 | ## 42. Topological Sort 2871 | 2872 | ```py 2873 | def topological_sort(graph): 2874 | # initialization 2875 | V = len(graph) 2876 | in_degrees = [0 for _ in range(V)] 2877 | for node in range(V): 2878 | for nbr in graph[node]: # for weighted graphs, unpack edges: nbr, _ 2879 | in_degrees[nbr] += 1 2880 | degree_zero = [] 2881 | for node in range(V): 2882 | if in_degrees[node] == 0: 2883 | degree_zero.append(node) 2884 | # main peel-off loop 2885 | topo_order = [] 2886 | while degree_zero: 2887 | node = degree_zero.pop() 2888 | topo_order.append(node) 2889 | for nbr in graph[node]: # for weighted graphs, unpack edges: nbr, _ 2890 | in_degrees[nbr] -= 1 2891 | if in_degrees[nbr] == 0: 2892 | degree_zero.append(nbr) 2893 | if len(topo_order) < V: 2894 | return [] # there'a a cycle, some nodes couldn't be peeled off 2895 | return topo_order 2896 | 2897 | def distance(graph, start): 2898 | topo_order = topological_sort(graph) 2899 | distances = {start: 0} 2900 | for node in topo_order: 2901 | if node not in distance: continue 2902 | for nbr, weight in graph[node]: 2903 | if nbr not in distances or distances[node] + weight < distances[nbr]: 2904 | distances[nbr] = distances[node] + weight 2905 | res = [] 2906 | for i in range(len(graph)): 2907 | if i in distances: 2908 | res.append(distances[i]) 2909 | else: 2910 | res.append(math.inf) 2911 | return res 2912 | 2913 | def shortest_path(graph, start, goal): 2914 | topo_order = topological_sort(graph) 2915 | distances = {start: 0} 2916 | predecessors = {} 2917 | for node in topo_order: 2918 | if node not in distances: continue 2919 | for nbr, weight in graph[node]: 2920 | if nbr not in distances or distances[node] + weight < distance[nbr]: 2921 | distances[nbr] = distances[node] + weight 2922 | predecessors[nbr] = node 2923 | if goal not in distances: 2924 | return [] 2925 | path = [goal] 2926 | while path[-1] != start: 2927 | path.append(predecessors[path[-1]]) 2928 | path.reverse() 2929 | return path 2930 | 2931 | def path_count(graph, start): 2932 | topo_order = topological_sort(graph) 2933 | counts = [0] * len(graph) 2934 | counts[start] = 1 2935 | for node in topo_order: 2936 | for nbr in graph[node]: 2937 | counts[nbr] += counts[node] 2938 | return counts 2939 | 2940 | def compile_time(seconds, imports): 2941 | V = len(seconds) 2942 | graph = [[] for _ in range(V)] 2943 | for package in range(V): 2944 | for imported_package in imports[package]: 2945 | graph[imported_package].append(package) 2946 | topo_order = topological_sort(graph) 2947 | durations = {} 2948 | for node in topo_order: 2949 | if node not in durations: 2950 | durations[node] = seconds[node] 2951 | for nbr in graph[node]: 2952 | if nbr not in durations: 2953 | durations[nbr] = 0 2954 | durations[nbr] = max(durations[nbr], seconds[nbr] + durations[node]) 2955 | return max(durations.values()) 2956 | ``` 2957 | 2958 | 2959 | ## 43. Prefix Sums 2960 | 2961 | ```py 2962 | def channel_views(views, periods): 2963 | prefix_sum = [0] * len(views) 2964 | prefix_sum[0] = views[0] 2965 | for i in range(1, len(views)): 2966 | prefix_sum[i] = prefix_sum[i - 1] + views[i] 2967 | res = [] 2968 | for l, r in periods: 2969 | if l == 0: 2970 | res.append(prefix_sum[r]) 2971 | else: 2972 | res.append(prefix_sum[r] - prefix_sum[l - 1]) 2973 | return res 2974 | 2975 | # # initialization 2976 | # # initialize prefix_sum with the same length as input array 2977 | # prefix_sum[0] = arr[0] # at least one element 2978 | # for i from 1 to len(arr) - 1: 2979 | # prefix_sum[i] = prefix_sum[i - 1] + arr[i] 2980 | # # query: sum of subarray [l, r] 2981 | # if l == 0: 2982 | # return prefix_sum[r] 2983 | # return prefix_sum[r] - prefix_sum[l - 1] 2984 | 2985 | def good_reception_scores(likes, dislikes, periods): 2986 | positive_days = [0] * len(likes) 2987 | for i in range(likes): 2988 | if likes[i] > dislikes[i]: 2989 | positive_days[i] = 1 2990 | # build prefix sum for positive_days array and query it with each period 2991 | 2992 | def exclusive_product_array(arr): 2993 | m = 10 ** 9 + 7 2994 | n = len(arr) 2995 | prefix_product = [1] * n 2996 | prefix_product[0] = arr[0] 2997 | for i in range(1, n): 2998 | prefix_product[i] = (prefix_product[i - 1] * arr[i]) % m 2999 | postfix_product = [1] * n 3000 | postfix_product[n - 1] = arr[n - 1] 3001 | for i in range(n - 2, -1, -1): 3002 | postfix_product[i] = (postfix_product[i + 1] * arr[i]) % m 3003 | res = [1] * n 3004 | res[0] = postfix_product[1] 3005 | res[n - 1] = prefix_product[n - 2] 3006 | for i in range(1, n - 1): 3007 | res[i] = (prefix_product[i - 1] * postfix_product[i + 1]) % m 3008 | return res 3009 | 3010 | def balanced_index(arr): 3011 | prefix_sum = 0 3012 | postfix_sum = sum(arr) - arr[0] 3013 | for i in range(len(arr)): 3014 | if prefix_sum == postfix_sum: 3015 | return i 3016 | prefix_sum += arr[i] 3017 | if i + 1 < len(arr): 3018 | postfix_sum -= arr[i + 1] 3019 | return -1 3020 | 3021 | def max_total_deviation(likes, dislikes): 3022 | scores = [likes[i] - dislikes[i] for i in range(len(likes))] 3023 | scores.sort() 3024 | n = len(scores) 3025 | prefix_sum = [0] * n 3026 | prefix_sum[0] = scores[0] 3027 | for i in range(1, n): 3028 | prefix_sum[i] = prefix_sum[i - 1] + scores[i] 3029 | max_deviation = 0 3030 | for i in range(n): 3031 | left, right = 0, 0 3032 | if i > 0: 3033 | left = i * scores[i] - prefix_sum[i - 1] 3034 | if i < n - 1: 3035 | right = prefix_sum[n - 1] - prefix_sum[i] - (n - i - 1) * scores[i] 3036 | max_deviation = max(max_deviation, left + right) 3037 | return max_deviation 3038 | 3039 | def count_subarrays(arr, k): 3040 | prefix_sum = # ... 3041 | prefix_sum_to_count = {0: 1} # for empty prefix 3042 | count = 0 3043 | for val in prefix_sum: 3044 | if val - k in prefix_sum_to_count: 3045 | count += prefix_sum_to_count[val - k] 3046 | if val not in prefix_sum_to_count: 3047 | prefix_sum_to_count[val] = 0 3048 | prefix_sum_to_count[val] += 1 3049 | return count 3050 | 3051 | def longest_subarray_with_sum_k(arr, k): 3052 | prefix_sum = # ... 3053 | prefix_sum_to_index = {0: -1} # for empty prefix 3054 | res = -1 3055 | for r, val in enumerate(prefix_sum): 3056 | if val - k in prefix_sum_to_index: 3057 | l = prefix_sum_to_index[val - k] 3058 | res = max(res, r - l) 3059 | if val not in prefix_sum_to_index: 3060 | prefix_sum_to_index[val] = r 3061 | return res 3062 | 3063 | def range_updates(n, votes): 3064 | diff = [0] * n 3065 | for l, r, v in votes: 3066 | diff[l] += v 3067 | if r + 1 < n: 3068 | diff[r + 1] -= v 3069 | prefix_sum = [0] * n 3070 | prefix_sum[0] = diff[0] 3071 | for i in range(1, n): 3072 | prefix_sum[i] = prefix_sum[i - 1] + diff[i] 3073 | return prefix_sum 3074 | 3075 | def most_booked_slot(slots, bookings): 3076 | n = len(slots) 3077 | diff = [0] * n 3078 | for l, r, c in bookings: 3079 | diff[l] += c 3080 | if r + 1 < n: 3081 | diff[r + 1] -= c 3082 | prefix_sum = [0] * n 3083 | prefix_sum[0] = diff[0] 3084 | for i in range(1, n): 3085 | prefix_sum[i] = prefix_sum[i - 1] + diff[i] 3086 | max_bookings, max_index = 0, -1 3087 | for i in range(n): 3088 | total_bookings = prefix_sum[i] + slots[i] 3089 | if total_bookings > max_bookings: 3090 | max_bookings, max_index = total_bookings, i 3091 | return max_index 3092 | ``` 3093 | --------------------------------------------------------------------------------