├── Linux
└── Essential Linux.pdf
├── Kaggle Course Notes
└── Kaggle Course Notes.pdf
├── Princeton Algorithm
└── Princeton Algorithm Coursera Notes Junfan Zhu.pdf
├── Coding Interview Patterns: Nail Your Next Coding Interview
└── Bonus_Pdf.pdf
├── Grokking the System Design Interview
├── Grokking the System Design Interview.pdf
└── Grokking the System Design Interview.md
├── README.md
└── Beyond Cracking the Coding Interview
└── Beyond Cracking the Coding Interview.md
/Linux/Essential Linux.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/junfanz1/Software-Engineer-Coding-Interviews/HEAD/Linux/Essential Linux.pdf
--------------------------------------------------------------------------------
/Kaggle Course Notes/Kaggle Course Notes.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/junfanz1/Software-Engineer-Coding-Interviews/HEAD/Kaggle Course Notes/Kaggle Course Notes.pdf
--------------------------------------------------------------------------------
/Princeton Algorithm/Princeton Algorithm Coursera Notes Junfan Zhu.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/junfanz1/Software-Engineer-Coding-Interviews/HEAD/Princeton Algorithm/Princeton Algorithm Coursera Notes Junfan Zhu.pdf
--------------------------------------------------------------------------------
/Coding Interview Patterns: Nail Your Next Coding Interview/Bonus_Pdf.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/junfanz1/Software-Engineer-Coding-Interviews/HEAD/Coding Interview Patterns: Nail Your Next Coding Interview/Bonus_Pdf.pdf
--------------------------------------------------------------------------------
/Grokking the System Design Interview/Grokking the System Design Interview.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/junfanz1/Software-Engineer-Coding-Interviews/HEAD/Grokking the System Design Interview/Grokking the System Design Interview.pdf
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | # Software-Engineer-Coding-Interviews
3 |
4 |
5 |
6 | - [1. System Design Interview](#1-system-design-interview)
7 | * [ByteByteGo - GenAI/ML/Modern System Design Interview](#bytebytego-genaimlmodern-system-design-interview)
8 | * [Educative - GenAI/Modern System Design Interview](#educative-genaimodern-system-design-interview)
9 | - [2. Coding Interview](#2-coding-interview)
10 | - [3. Linux, Git](#3-linux-git)
11 | - [4. Algorithms, Data Science](#4-algorithms-data-science)
12 | * [Star History](#star-history)
13 |
14 |
15 |
16 |
17 | ---
18 |
19 |
20 | # 1. System Design Interview
21 |
22 |
23 | ## ByteByteGo - GenAI/ML/Modern System Design Interview
24 |
25 |
26 | > [System Design Interview, An Insider's Guide, Second Edition - by Alex Xu, 2020](https://www.amazon.com/System-Design-Interview-insiders-Second/dp/B08CMF2CQF) | [__PDF Notes-Chinese__](https://github.com/junfanz1/Quant-Books-Notes/blob/main/System%20Design/Notes%20on%20System%20Design.pdf)
27 |
28 | > [Generative AI System Design Interview - by Ali Aminian, Hao Sheng, 2024](https://www.amazon.com/Generative-AI-System-Design-Interview/dp/1736049143) | [__Markdown Notes__](https://github.com/junfanz1/AI-LLM-ML-CS-Quant-Review/blob/main/System%20Design/GenAI%20System%20Design%20Interview.md)
29 |
30 | > [Machine Learning System Design Interview - by Ali Aminian, Alex Xu, 2023](https://www.amazon.com/Machine-Learning-System-Design-Interview/dp/1736049127) | [__Markdown Notes__](https://github.com/junfanz1/AI-LLM-ML-CS-Quant-Review/blob/main/System%20Design/ML%20System%20Design%20Interview.md)
31 |
32 |
37 |
38 |
39 | ## Educative - GenAI/Modern System Design Interview
40 |
41 | > [Educative - Grokking System Design Interview](https://www.educative.io/verify-certificate/B86jYxWPP3JhA8lAZw0B2Mhr92YjJNmG5Ty) | [__PDF Notes__](https://github.com/junfanz1/CS-Online-Course-Notes/blob/main/Grokking%20the%20System%20Design%20Interview/Grokking%20the%20System%20Design%20Interview.pdf) | [__Markdown Notes__](https://github.com/junfanz1/CS-Online-Course-Notes/blob/main/Grokking%20the%20System%20Design%20Interview/Grokking%20the%20System%20Design%20Interview.md)
42 |
43 | > [Educative - Grokking the Modern System Design Interview](https://www.educative.io/courses/grokking-the-system-design-interview) | [__Markdown Notes__](https://github.com/junfanz1/AI-LLM-ML-CS-Quant-Overview/blob/main/System%20Design/Modern%20System%20Design.md)
44 |
45 | > [Educative - GenAI System Design](https://www.educative.io/verify-certificate/RgxzXQFQkKyYgKrGjTX1RQpE9J3vT6) | [__Notes__](https://github.com/junfanz1/AI-LLM-ML-CS-Quant-Readings/blob/main/System%20Design/GenAI%20System%20Design.md)
46 |
47 |
48 |
49 | # 2. Coding Interview
50 |
51 | > [Coding Interview Patterns: Nail Your Next Coding Interview - by Alex Xu, Shaun Gunawardane, 2024](https://www.amazon.com/Coding-Interview-Patterns-Nail-Your/dp/1736049135) | [__Markdown Notes__](https://github.com/junfanz1/Coding-Interview-Practices/blob/main/Coding%20Interview%20Patterns:%20Nail%20Your%20Next%20Coding%20Interview/Coding%20Interview%20Patterns,%20Alex%20Xu.md) | [__Bonus PDF of the Book__](https://github.com/junfanz1/Coding-Interview-Practices/blob/main/Coding%20Interview%20Patterns%3A%20Nail%20Your%20Next%20Coding%20Interview/Bonus_Pdf.pdf)
52 |
53 | > [Beyond Cracking the Coding Interview - by Gayle Laakmann McDowell, Mike Mroczka, Aline Lerner, Nil Mamano, 2025](https://www.amazon.com/Beyond-Cracking-Coding-Interview-Successfully/dp/195570600X) | [__Markdown Notes__](https://github.com/junfanz1/Software-Engineer-Coding-Interviews/blob/main/Beyond%20Cracking%20the%20Coding%20Interview/Beyond%20Cracking%20the%20Coding%20Interview.md)
54 |
55 | > [Educative - Grokking the Coding Interview Patterns in Python](https://www.educative.io/courses/grokking-coding-interview-in-python) | [__Markdown Notes__](https://github.com/junfanz1/Software-Engineer-Coding-Interviews/blob/main/Grokking%20the%20Coding%20Interview%20Patterns%20in%20Python/Grokking%20the%20Coding%20Interview%20Patterns%20in%20Python.md)
56 |
57 |
58 |
59 |

60 |

61 |
62 |
63 |
64 | # 3. Linux, Git
65 |
66 | Linux, Git CheatSheet | [__PDF Notes__](https://github.com/junfanz1/Coding-Interview-Practices/blob/main/Linux/Essential%20Linux.pdf)
67 |
68 |
69 | # 4. Algorithms, Data Science
70 |
71 | Algorithm Part I and Part II, by Robert Sedgewick and Kevin Wayne, Princeton Coursera.
72 |
73 | > [Char 1-6](https://www.coursera.org/learn/algorithms-part1) | [Char 7-12](https://www.coursera.org/learn/algorithms-part2) | [__PDF Notes__](https://github.com/junfanz1/CS-Online-Course-Notes/blob/main/Princeton%20Algorithm/Princeton%20Algorithm%20Coursera%20Notes%20Junfan%20Zhu.pdf) | [__Markdown Notes__](https://github.com/junfanz1/CS-Online-Course-Notes/blob/main/Princeton%20Algorithm/Princeton%20Algorithm%20Coursera%20Notes.md)
74 |
75 | Kaggle Notes
76 |
77 | > [Kaggle Mini-courses](https://www.kaggle.com/learn) | [__PDF Notes__](https://github.com/junfanz1/CS-Online-Course-Notes/blob/main/Kaggle%20Course%20Notes/Kaggle%20Course%20Notes.pdf) | [__Markdown Notes__](https://github.com/junfanz1/CS-Online-Course-Notes/blob/main/Kaggle%20Course%20Notes/Kaggle%20Course%20Notes.md)
78 |
79 |
80 | ---
81 |
82 |
97 |
98 |
99 |
100 | ## Star History
101 |
102 | [](https://star-history.com/#junfanz1/Coding-Interview-Practices&Date)
103 |
104 |
--------------------------------------------------------------------------------
/Grokking the System Design Interview/Grokking the System Design Interview.md:
--------------------------------------------------------------------------------
1 | Grokking the System Design Interview
2 | ===========
3 |
4 | $2021-04-12$
5 |
6 | Junfan Zhu
7 | -----------
8 |
9 | (`junfanz@gatech.edu`; `junfanzhu@uchicago.edu`)
10 |
11 | Course Links
12 | -------------
13 |
14 | https://www.educative.io/courses/grokking-the-system-design-interview/
15 |
16 | ----
17 |
18 | Table of Contents
19 |
20 |
21 |
22 | - [Grokking the System Design Interview](#grokking-the-system-design-interview)
23 | - [Junfan Zhu](#junfan-zhu)
24 | - [Course Links](#course-links)
25 | - [1. Back-of-the-envelope estimation](#1-back-of-the-envelope-estimation)
26 | - [2. Shortening URL](#2-shortening-url)
27 | - [2.1. Encoding actual URL](#21-encoding-actual-url)
28 | - [2.2. Cache](#22-cache)
29 | - [2.3. Load Balancer (LB)](#23-load-balancer-lb)
30 | - [3. DropBox](#3-dropbox)
31 | - [3.1. Clarify Requirements and Goals of the System](#31-clarify-requirements-and-goals-of-the-system)
32 | - [4. Facebook Messenger](#4-facebook-messenger)
33 | - [4.1. Message Handling](#41-message-handling)
34 | - [4.2. Storing and retrieving the messages from the database](#42-storing-and-retrieving-the-messages-from-the-database)
35 | - [5. YouTube](#5-youtube)
36 | - [5.1. Metadata Sharding](#51-metadata-sharding)
37 | - [5.1.1. Sharding based on UserID](#511-sharding-based-on-userid)
38 | - [5.1.2. Sharding based on VideoID](#512-sharding-based-on-videoid)
39 | - [5.2. Load Balancing](#52-load-balancing)
40 | - [6. Designing Typeahead Suggestion](#6-designing-typeahead-suggestion)
41 | - [7. API Rate Limiter](#7-api-rate-limiter)
42 | - [8. Web Crawler](#8-web-crawler)
43 | - [8.1. How to crawl?](#81-how-to-crawl)
44 | - [8.2. Component Design](#82-component-design)
45 | - [9. Facebook’s Newsfeed](#9-facebooks-newsfeed)
46 | - [9.1. Feed generation](#91-feed-generation)
47 | - [9.2. Feed publishing](#92-feed-publishing)
48 | - [9.3. Data Partitioning](#93-data-partitioning)
49 | - [10. Yelp](#10-yelp)
50 | - [10.1. Dynamic size grids](#101-dynamic-size-grids)
51 | - [11. Ticket Master](#11-ticket-master)
52 | - [11.1. Active Reservations Service](#111-active-reservations-service)
53 | - [11.2. Waiting Users Service](#112-waiting-users-service)
54 | - [11.3. Concurrency](#113-concurrency)
55 | - [12. Load Balancing](#12-load-balancing)
56 | - [12.1. Benefits](#121-benefits)
57 | - [12.2. Algorithms](#122-algorithms)
58 | - [13. Caching](#13-caching)
59 | - [13.1. Cache Invalidation](#131-cache-invalidation)
60 | - [13.2. Cache eviction policies](#132-cache-eviction-policies)
61 | - [14. Data Partitioning](#14-data-partitioning)
62 | - [14.1. Partitioning Criteria](#141-partitioning-criteria)
63 | - [14.2. Common Problems of Data Partitioning](#142-common-problems-of-data-partitioning)
64 | - [15. Proxy Server](#15-proxy-server)
65 | - [15.1. Open Proxy](#151-open-proxy)
66 | - [15.2. Reverse Proxy](#152-reverse-proxy)
67 | - [16. SQL & NoSQL](#16-sql--nosql)
68 | - [16.1. SQL](#161-sql)
69 | - [16.2. NoSQL](#162-nosql)
70 | - [16.3. Differences: SQL vs. NoSQL](#163-differences-sql-vs-nosql)
71 | - [16.4. Choose which?](#164-choose-which)
72 | - [16.4.1. SQL](#1641-sql)
73 | - [16.4.2. NoSQL](#1642-nosql)
74 | - [17. CAP Theorem](#17-cap-theorem)
75 | - [18. Consistent Hashing](#18-consistent-hashing)
76 | - [18.1. Improve caching system](#181-improve-caching-system)
77 | - [18.2. Consistent Hashing](#182-consistent-hashing)
78 | - [18.3. Algorithm](#183-algorithm)
79 |
80 |
81 |
82 | ---------
83 |
84 | # 1. Back-of-the-envelope estimation
85 |
86 | Scaling, partitioning, load balancing, and caching.
87 |
88 | - What scale is expected from the system?
89 | - How much storage will we need?
90 | - What network bandwidth usage are we expecting?
91 |
92 | # 2. Shortening URL
93 |
94 | ## 2.1. Encoding actual URL
95 |
96 | Use MD5 algorithm as hash function $\Rightarrow$ produce a 128-bit hash value.
97 |
98 | ## 2.2. Cache
99 |
100 | __Which cache eviction policy would best fit our needs?__
101 |
102 | When the cache is full, and we want to replace a link with a newer/hotter URL, how would we choose? . Under this policy, we discard the least recently used URL first. We can use a Linked Hash Map or a similar data structure to store our URLs and Hashes, which will also keep track of the URLs that have been accessed recently.
103 |
104 | To further increase the efficiency, we can replicate our caching servers to distribute the load between them.
105 |
106 | ## 2.3. Load Balancer (LB)
107 |
108 | We can add a Load balancing layer at three places in our system:
109 |
110 | - Between Clients and Application servers
111 | - Between Application Servers and database servers
112 | - Between Application Servers and Cache servers
113 |
114 | # 3. DropBox
115 |
116 | ## 3.1. Clarify Requirements and Goals of the System
117 |
118 | What do we wish to achieve from a Cloud Storage system? Here are the top-level requirements for our system:
119 |
120 | - Users should be able to upload and download their files/photos from any device.
121 | Users should be able to share files or folders with other users.
122 | - Our service should support automatic synchronization between devices, i.e., after updating a file on one device, it should get synchronized on all devices.
123 | - The system should support storing large files up to a GB.
124 | - ACID-ity is required. Atomicity, Consistency, Isolation and Durability of all file operations should be guaranteed.
125 | - Our system should support offline editing. Users should be able to add/delete/modify files while offline, and as soon as they come online, all their changes should be synced to the remote servers and other online devices.
126 |
127 |
128 | # 4. Facebook Messenger
129 |
130 | ## 4.1. Message Handling
131 |
132 | How does the messenger maintain the sequencing of the messages? We can store a timestamp with each message, which is the time the message is received by the server. This will still not ensure the correct ordering of messages for clients. The scenario where the server timestamp cannot determine the exact order of messages would look like this:
133 |
134 | - User-1 sends a message M1 to the server for User-2.
135 | - The server receives M1 at T1.
136 | - Meanwhile, User-2 sends a message M2 to the server for User-1.
137 | - The server receives the message M2 at T2, such that T2 > T1.
138 | - The server sends message M1 to User-2 and M2 to User-1.
139 | - So User-1 will see M1 first and then M2, whereas User-2 will see M2 first and then M1.
140 |
141 | To resolve this, we need to keep a sequence number with every message for each client. This sequence number will determine the exact ordering of messages for EACH user. With this solution, both clients will see a different view of the message sequence, but this view will be consistent for them on all devices.
142 |
143 | ## 4.2. Storing and retrieving the messages from the database
144 |
145 | Whenever the chat server receives a new message, it needs to store it in the database. To do so, we have two options:
146 |
147 | - Start a separate thread, which will work with the database to store the message.
148 | - Send an asynchronous request to the database to store the message.
149 |
150 | We have to keep certain things in mind while designing our database:
151 |
152 | - How to efficiently work with the database connection pool.
153 | - How to retry failed requests.
154 | - Where to log those requests that failed even after some retries.
155 | - How to retry these logged requests (that failed after the retry) when all the issues have resolved.
156 |
157 | __Which storage system we should use?__
158 |
159 | We need to have a database that can support a very high rate of small updates and also fetch a range of records quickly. We cannot use RDBMS like MySQL or NoSQL like MongoDB because we cannot afford to read/write a row from the database every time a user receives/sends a message. This will not only make the basic operations of our service run with high latency but also create a huge load on databases.
160 |
161 | Both of our requirements can be easily met with a wide-column database solution like HBase. HBase is a column-oriented key-value NoSQL database that can store multiple values against one key into multiple columns. HBase is modeled after Google’s BigTable and runs on top of Hadoop Distributed File System (HDFS). HBase groups data together to store new data in a memory buffer and, once the buffer is full, it dumps the data to the disk. This way of storage not only helps to store a lot of small data quickly but also fetching rows by the key or scanning ranges of rows. HBase is also an efficient database to store variable-sized data, which is also required by our service.
162 |
163 | __Design Summary:__
164 |
165 | Clients will open a connection to the chat server to send a message; the server will then pass it to the requested user. All the active users will keep a connection open with the server to receive messages. Whenever a new message arrives, the chat server will push it to the receiving user on the long poll request. Messages can be stored in HBase, which supports quick small updates, and range based searches. The servers can broadcast the online status of a user to other relevant users. Clients can pull status updates for users who are visible in the client’s viewport on a less frequent basis.
166 |
167 | # 5. YouTube
168 |
169 | ## 5.1. Metadata Sharding
170 |
171 | Since we have a huge number of new videos every day and our read load is extremely high, therefore, we need to distribute our data onto multiple machines so that we can perform read/write operations efficiently.
172 |
173 | ### 5.1.1. Sharding based on UserID
174 |
175 | We can try storing all the data for a particular user on one server. While storing, we can pass the UserID to our hash function, which will map the user to a database server where we will store all the metadata for that user’s videos. While querying for videos of a user, we can ask our hash function to find the server holding the user’s data and then read it from there. To search videos by titles, we will have to query all servers, and each server will return a set of videos. A centralized server will then aggregate and rank these results before returning them to the user.
176 |
177 | ### 5.1.2. Sharding based on VideoID
178 |
179 | Our hash function will map each VideoID to a random server where we will store that Video’s metadata. To find videos of a user, we will query all servers, and each server will return a set of videos. A centralized server will aggregate and rank these results before returning them to the user. This approach solves our problem of popular users but shifts it to popular videos.
180 |
181 | ## 5.2. Load Balancing
182 |
183 | We should use Consistent Hashing among our cache servers, which will also help in balancing the load between cache servers. Since we will be using a static hash-based scheme to map videos to hostnames, it can lead to an uneven load on the logical replicas due to each video’s different popularity. For instance, if a video becomes popular, the logical replica corresponding to that video will experience more traffic than other servers. These uneven loads for logical replicas can then translate into uneven load distribution on corresponding physical servers. To resolve this issue, any busy server in one location can redirect a client to a less busy server in the same cache location. We can use dynamic HTTP redirections for this scenario. Consistent hashing will not only help in replacing a dead server but also help in distributing load among servers.
184 |
185 | However, the use of redirections also has its drawbacks. First, since our service tries to load balance locally, it leads to multiple redirections if the host that receives the redirection can’t serve the video. Also, each redirection requires a client to make an additional HTTP request; it also leads to higher delays before the video starts playing back. Moreover, inter-tier (or cross data-center) redirections lead a client to a distant cache location because the higher tier caches are only present at a small number of locations.
186 |
187 | # 6. Designing Typeahead Suggestion
188 |
189 | We can have a Map-Reduce (MR) set-up to process all the logging data periodically say every hour. These MR jobs will calculate frequencies of all searched terms in the past hour. We can then update our trie with this new data. We can take the current snapshot of the trie and update it with all the new terms and their frequencies. We should do this offline as we don’t want our read queries to be blocked by update trie requests. We can have two options:
190 |
191 | - We can make a copy of the trie on each server to update it offline. Once done we can switch to start using it and discard the old one.
192 | - Another option is we can have a primary-secondary configuration for each trie server. We can update the secondary while the primary is serving traffic. Once the update is complete, we can make the secondary our new primary. We can later update our old primary, which can then start serving traffic, too.
193 |
194 | # 7. API Rate Limiter
195 |
196 | Rate Limiting helps to protect services against abusive behaviors targeting the application layer like Denial-of-service (DOS) attacks, brute-force password attempts, brute-force credit card transactions, etc. These attacks are usually a barrage of HTTP/S requests which may look like they are coming from real users, but are typically generated by machines (or bots). As a result, these attacks are often harder to detect and can more easily bring down a service, application, or an API.
197 |
198 | Rate limiting is also used to prevent revenue loss, to reduce infrastructure costs, to stop spam, and to stop online harassment. Following is a list of scenarios that can benefit from Rate limiting by making a service (or API) more reliable:
199 |
200 | - Misbehaving clients/scripts: Either intentionally or unintentionally, some entities can overwhelm a service by sending a large number of requests. Another scenario could be when a user is sending a lot of lower-priority requests and we want to make sure that it doesn’t affect the high-priority traffic. For example, users sending a high volume of requests for analytics data should not be allowed to hamper critical transactions for other users.
201 | - Security: By limiting the number of the second-factor attempts (in 2-factor auth) that the users are allowed to perform, for example, the number of times they’re allowed to try with a wrong password.
202 | - To prevent abusive behavior and bad design practices: Without API limits, developers of client applications would use sloppy development tactics, for example, requesting the same information over and over again.
203 | - To keep costs and resource usage under control: Services are generally designed for normal input behavior, for example, a user writing a single post in a minute. Computers could easily push thousands/second through an API. Rate limiter enables controls on service APIs.
204 | - Revenue: Certain services might want to limit operations based on the tier of their customer’s service and thus create a revenue model based on rate limiting. There could be default limits for all the APIs a service offers. To go beyond that, the user has to buy higher limits
205 | - To eliminate spikiness in traffic: Make sure the service stays up for everyone else.
206 |
207 | # 8. Web Crawler
208 |
209 | ## 8.1. How to crawl?
210 |
211 | Breadth-first or depth-first? Breadth First Search (BFS) is usually used. However, Depth First Search (DFS) is also utilized in some situations, such as, if your crawler has already established a connection with the website, it might just DFS all the URLs within this website to save some handshaking overhead.
212 |
213 | Path-ascending crawling: Path-ascending crawling can help discover a lot of isolated resources or resources for which no inbound link would have been found in regular crawling of a particular Web site. In this scheme, a crawler would ascend to every path in each URL that it intends to crawl.
214 |
215 | ## 8.2. Component Design
216 |
217 | 1. The URL frontier: The URL frontier is the data structure that contains all the URLs that remain to be downloaded. We can crawl by performing a breadth-first traversal of the Web, starting from the pages in the seed set. Such traversals are easily implemented by using a FIFO queue.
218 |
219 | 2. The fetcher module: The purpose of a fetcher module is to download the document corresponding to a given URL using the appropriate network protocol like HTTP. As discussed above, webmasters create robot.txt to make certain parts of their websites off-limits for the crawler. To avoid downloading this file on every request, our crawler’s HTTP protocol module can maintain a fixed-sized cache mapping host-names to their robot’s exclusion rules.
220 |
221 | 3. Document input stream: Our crawler’s design enables the same document to be processed by multiple processing modules. To avoid downloading a document multiple times, we cache the document locally using an abstraction called a Document Input Stream (DIS).
222 |
223 | 4. Document Dedupe test: Many documents on the Web are available under multiple, different URLs. There are also many cases in which documents are mirrored on various servers. Both of these effects will cause any Web crawler to download the same document multiple times. To prevent the processing of a document more than once, we perform a dedupe test on each document to remove duplication.
224 |
225 | 5. URL filters: The URL filtering mechanism provides a customizable way to control the set of URLs that are downloaded. This is used to blacklist websites so that our crawler can ignore them. Before adding each URL to the frontier, the worker thread consults the user-supplied URL filter. We can define filters to restrict URLs by domain, prefix, or protocol type.
226 |
227 | 6. Domain name resolution: Before contacting a Web server, a Web crawler must use the Domain Name Service (DNS) to map the Web server’s hostname into an IP address. DNS name resolution will be a big bottleneck of our crawlers given the amount of URLs we will be working with. To avoid repeated requests, we can start caching DNS results by building our local DNS server.
228 |
229 | 7. URL dedupe test: While extracting links, any Web crawler will encounter multiple links to the same document. To avoid downloading and processing a document multiple times, a URL dedupe test must be performed on each extracted link before adding it to the URL frontier.
230 |
231 | 8. Checkpointing: A crawl of the entire Web takes weeks to complete. To guard against failures, our crawler can write regular snapshots of its state to the disk. An interrupted or aborted crawl can easily be restarted from the latest checkpoint.
232 |
233 | # 9. Facebook’s Newsfeed
234 |
235 | Component Design
236 |
237 | ## 9.1. Feed generation
238 |
239 | Offline generation for newsfeed: We can have dedicated servers that are continuously generating users’ newsfeed and storing them in memory. So, whenever a user requests for the new posts for their feed, we can simply serve it from the pre-generated, stored location. Using this scheme, user’s newsfeed is not compiled on load, but rather on a regular basis and returned to users whenever they request for it.
240 |
241 | We can store FeedItemIDs in a data structure similar to Linked HashMap or TreeMap, which can allow us to not only jump to any feed item but also iterate through the map easily. Whenever users want to fetch more feed items, they can send the last FeedItemID they currently see in their newsfeed, we can then jump to that FeedItemID in our hash-map and return next batch/page of feed items from there.
242 |
243 | ## 9.2. Feed publishing
244 |
245 | The process of pushing a post to all the followers is called fanout. By analogy, the push approach is called fanout-on-write, while the pull approach is called fanout-on-load. Let’s discuss different options for publishing feed data to users.
246 |
247 | 1. "Pull" model or Fan-out-on-load
248 |
249 | This method involves keeping all the recent feed data in memory so that users can pull it from the server whenever they need it. Clients can pull the feed data on a regular basis or manually whenever they need it. Possible problems with this approach are a) New data might not be shown to the users until they issue a pull request, b) It’s hard to find the right pull cadence, as most of the time pull requests will result in an empty response if there is no new data, causing waste of resources.
250 |
251 | 2. "Push" model or Fan-out-on-write.
252 |
253 | For a push system, once a user has published a post, we can immediately push this post to all the followers. The advantage is that when fetching feed you don’t need to go through your friend’s list and get feeds for each of them. It significantly reduces read operations. To efficiently handle this, users have to maintain a Long Poll request with the server for receiving the updates. A possible problem with this approach is that when a user has millions of followers (a celebrity-user) the server has to push updates to a lot of people.
254 |
255 | 3. Hybrid
256 |
257 | An alternate method to handle feed data could be to use a hybrid approach, i.e., to do a combination of fan-out-on-write and fan-out-on-load. Specifically, we can stop pushing posts from users with a high number of followers (a celebrity user) and only push data for those users who have a few hundred (or thousand) followers. For celebrity users, we can let the followers pull the updates. Since the push operation can be extremely costly for users who have a lot of friends or followers, by disabling fanout for them, we can save a huge number of resources. Another alternate approach could be that, once a user publishes a post, we can limit the fanout to only her online friends. Also, to get benefits from both the approaches, a combination of ‘push to notify’ and ‘pull for serving’ end-users is a great way to go. Purely a push or pull model is less versatile.
258 |
259 | ## 9.3. Data Partitioning
260 |
261 | 1. Sharding posts and metadata
262 |
263 | Since we have a huge number of new posts every day and our read load is extremely high too, we need to distribute our data onto multiple machines such that we can read/write it efficiently. For sharding our databases that are storing posts and their metadata, we can have a similar design as discussed under Designing Twitter.
264 |
265 | 2. Sharding feed data
266 |
267 | For feed data, which is being stored in memory, we can partition it based on UserID. We can try storing all the data of a user on one server. When storing, we can pass the UserID to our hash function that will map the user to a cache server where we will store the user’s feed objects. Also, for any given user, since we don’t expect to store more than 500 FeedItemIDs, we will not run into a scenario where feed data for a user doesn’t fit on a single server. To get the feed of a user, we would always have to query only one server. For future growth and replication, we must use Consistent Hashing.
268 |
269 | # 10. Yelp
270 |
271 | ## 10.1. Dynamic size grids
272 |
273 | Let’s assume we don’t want to have more than 500 places in a grid so that we can have a faster searching. So, whenever a grid reaches this limit, we break it down into four grids of equal size and distribute places among them. This means thickly populated areas like downtown San Francisco will have a lot of grids, and sparsely populated area like the Pacific Ocean will have large grids with places only around the coastal lines.
274 |
275 | What data-structure can hold this information? A tree in which each node has four children can serve our purpose. Each node will represent a grid and will contain information about all the places in that grid. If a node reaches our limit of 500 places, we will break it down to create four child nodes under it and distribute places among them. In this way, all the leaf nodes will represent the grids that cannot be further broken down. So leaf nodes will keep a list of places with them. This tree structure in which each node can have four children is called a QuadTree.
276 |
277 | # 11. Ticket Master
278 |
279 | How would the server keep track of all the active reservations that haven’t been booked yet? And how would the server keep track of all the waiting customers?
280 |
281 | We need two daemon services, one to keep track of all active reservations and remove any expired reservation from the system; let’s call it ActiveReservationService. The other service would be keeping track of all the waiting user requests and, as soon as the required number of seats become available, it will notify the (the longest waiting) user to choose the seats; let’s call it WaitingUserService.
282 |
283 | ## 11.1. Active Reservations Service
284 |
285 | We can keep all the reservations of a ‘show’ in memory in a data structure similar to Linked HashMap or a TreeMap in addition to keeping all the data in the database. We will need a linked HashMap kind of data structure that allows us to jump to any reservation to remove it when the booking is complete. Also, since we will have expiry time associated with each reservation, the head of the HashMap will always point to the oldest reservation record so that the reservation can be expired when the timeout is reached.
286 |
287 | To store every reservation for every show, we can have a HashTable where the ‘key’ would be ‘ShowID’, and the ‘value’ would be the Linked HashMap containing ‘BookingID’ and creation ‘Timestamp’.
288 |
289 | In the database, we will store the reservation in the ‘Booking’ table and the expiry time will be in the Timestamp column. The ‘Status’ field will have a value of ‘Reserved (1)’ and, as soon as a booking is complete, the system will update the ‘Status’ to ‘Booked (2)’ and remove the reservation record from the Linked HashMap of the relevant show. When the reservation is expired, we can either remove it from the Booking table or mark it ‘Expired (3)’ in addition to removing it from memory.
290 |
291 | ActiveReservationsService will also work with the external financial service to process user payments. Whenever a booking is completed, or a reservation gets expired, WaitingUsersService will get a signal so that any waiting customer can be served.
292 |
293 | ## 11.2. Waiting Users Service
294 |
295 | Just like ActiveReservationsService, we can keep all the waiting users of a show in memory in a Linked HashMap or a TreeMap. We need a data structure similar to Linked HashMap so that we can jump to any user to remove them from the HashMap when the user cancels their request. Also, since we are serving in a first-come-first-serve manner, the head of the Linked HashMap would always be pointing to the longest waiting user, so that whenever seats become available, we can serve users in a fair manner.
296 |
297 | We will have a HashTable to store all the waiting users for every Show. The ‘key’ would be 'ShowID, and the ‘value’ would be a Linked HashMap containing ‘UserIDs’ and their wait-start-time.
298 |
299 | Clients can use Long Polling for keeping themselves updated for their reservation status. Whenever seats become available, the server can use this request to notify the user.
300 |
301 | Reservation Expiration
302 | On the server, ActiveReservationsService keeps track of expiry (based on reservation time) of active reservations. As the client will be shown a timer (for the expiration time), which could be a little out of sync with the server, we can add a buffer of five seconds on the server to safeguard from a broken experience, such that the client never times out after the server, preventing a successful purchase.
303 |
304 | ## 11.3. Concurrency
305 |
306 | How to handle concurrency, such that no two users are able to book the same seat. We can use transactions in SQL databases to avoid any clashes. For example, if we are using an SQL server we can utilize Transaction Isolation Levels to lock the rows before we can update them. Here is the sample code:
307 |
308 | ```sql
309 | SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
310 |
311 | BEGIN TRANSACTION;
312 |
313 | -- Suppose we intend to reserve three seats (IDs: 54, 55, 56) for ShowID=99
314 | Select * From Show_Seat where ShowID=99 && ShowSeatID in (54, 55, 56) && Status=0 -- free
315 |
316 | -- if the number of rows returned by the above statement is three, we can update to
317 | -- return success otherwise return failure to the user.
318 | update Show_Seat ...
319 | update Booking ...
320 |
321 | COMMIT TRANSACTION;
322 | ```
323 |
324 | ‘Serializable’ is the highest isolation level and guarantees safety from Dirty, Nonrepeatable, and Phantoms reads. One thing to note here; within a transaction, if we read rows, we get a write lock on them so that they can’t be updated by anyone else.
325 |
326 | Once the above database transaction is successful, we can start tracking the reservation in ActiveReservationService.
327 |
328 | # 12. Load Balancing
329 |
330 | ## 12.1. Benefits
331 |
332 | - Users experience faster, uninterrupted service. Users won’t have to wait for a single struggling server to finish its previous tasks. Instead, their requests are immediately passed on to a more readily available resource.
333 | - Service providers experience less downtime and higher throughput. Even a full server failure won’t affect the end user experience as the load balancer will simply route around it to a healthy server.
334 | - Load balancing makes it easier for system administrators to handle incoming requests while decreasing wait time for users.
335 | - Smart load balancers provide benefits like predictive analytics that determine traffic bottlenecks before they happen. As a result, the smart load balancer gives an organization actionable insights. These are key to automation and can help drive business decisions.
336 | - System administrators experience fewer failed or stressed components. Instead of a single device performing a lot of work, load balancing has several devices perform a little bit of work.
337 |
338 | ## 12.2. Algorithms
339 |
340 | How does the load balancer choose the backend server?
341 |
342 | - Load balancers consider two factors before forwarding a request to a backend server. They will first ensure that the server they choose is actually responding appropriately to requests and then use a pre-configured algorithm to select one from the set of healthy servers. We will discuss these algorithms shortly.
343 |
344 | - __Health Checks__ - Load balancers should only forward traffic to “healthy” backend servers. To monitor the health of a backend server, “health checks” regularly attempt to connect to backend servers to ensure that servers are listening. If a server fails a health check, it is automatically removed from the pool, and traffic will not be forwarded to it until it responds to the health checks again.
345 |
346 | Methods.
347 |
348 | 1. Least Connection Method — This method directs traffic to the server with the fewest active connections. This approach is quite useful when there are a large number of persistent client connections which are unevenly distributed between the servers.
349 |
350 | 2. Least Response Time Method — This algorithm directs traffic to the server with the fewest active connections and the lowest average response time.
351 |
352 | 3. Least Bandwidth Method — This method selects the server that is currently serving the least amount of traffic measured in megabits per second (Mbps).
353 |
354 | 4. Round Robin Method — This method cycles through a list of servers and sends each new request to the next server. When it reaches the end of the list, it starts over at the beginning. It is most useful when the servers are of equal specification and there are not many persistent connections.
355 |
356 | 5. Weighted Round Robin Method — The weighted round-robin scheduling is designed to better handle servers with different processing capacities. Each server is assigned a weight (an integer value that indicates the processing capacity). Servers with higher weights receive new connections before those with less weights and servers with higher weights get more connections than those with less weights.
357 |
358 | 6. IP Hash — Under this method, a hash of the IP address of the client is calculated to redirect the request to a server.
359 |
360 | # 13. Caching
361 |
362 | ## 13.1. Cache Invalidation
363 |
364 | While caching is fantastic, it requires some maintenance to keep the cache coherent with the source of truth (e.g., database). If the data is modified in the database, it should be invalidated in the cache; if not, this can cause inconsistent application behavior.
365 |
366 | Solving this problem is known as cache invalidation; there are three main schemes that are used:
367 |
368 | 1. Write-through cache: Under this scheme, data is written into the cache and the corresponding database simultaneously. The cached data allows for fast retrieval and, since the same data gets written in the permanent storage, we will have complete data consistency between the cache and the storage. Also, this scheme ensures that nothing will get lost in case of a crash, power failure, or other system disruptions.
369 |
370 | Although, write-through minimizes the risk of data loss, since every write operation must be done twice before returning success to the client, this scheme has the disadvantage of higher latency for write operations.
371 |
372 | 2. Write-around cache: This technique is similar to write-through cache, but data is written directly to permanent storage, bypassing the cache. This can reduce the cache being flooded with write operations that will not subsequently be re-read, but has the disadvantage that a read request for recently written data will create a “cache miss” and must be read from slower back-end storage and experience higher latency.
373 |
374 | 3. Write-back cache: Under this scheme, data is written to cache alone, and completion is immediately confirmed to the client. The write to the permanent storage is done after specified intervals or under certain conditions. This results in low-latency and high-throughput for write-intensive applications; however, this speed comes with the risk of data loss in case of a crash or other adverse event because the only copy of the written data is in the cache.
375 |
376 | ## 13.2. Cache eviction policies
377 |
378 | 1. First In First Out (FIFO): The cache evicts the first block accessed first without any regard to how often or how many times it was accessed before.
379 |
380 | 2. Last In First Out (LIFO): The cache evicts the block accessed most recently first without any regard to how often or how many times it was accessed before.
381 |
382 | 3. Least Recently Used (LRU): Discards the least recently used items first.
383 |
384 | 4. Most Recently Used (MRU): Discards, in contrast to LRU, the most recently used items first.
385 |
386 | 5. Least Frequently Used (LFU): Counts how often an item is needed. Those that are used least often are discarded first.
387 |
388 | 6. Random Replacement (RR): Randomly selects a candidate item and discards it to make space when necessary.
389 |
390 | # 14. Data Partitioning
391 |
392 | ## 14.1. Partitioning Criteria
393 |
394 | 1. Key or Hash-based partitioning
395 |
396 | Under this scheme, we apply a hash function to some key attributes of the entity we are storing; that yields the partition number. For example, if we have 100 DB servers and our ID is a numeric value that gets incremented by one each time a new record is inserted. In this example, the hash function could be ‘ID % 100’, which will give us the server number where we can store/read that record. This approach should ensure a uniform allocation of data among servers. The fundamental problem with this approach is that it effectively fixes the total number of DB servers, since adding new servers means changing the hash function which would require redistribution of data and downtime for the service. A workaround for this problem is to use Consistent Hashing.
397 |
398 | 2. List partitioning
399 |
400 | In this scheme, each partition is assigned a list of values, so whenever we want to insert a new record, we will see which partition contains our key and then store it there. For example, we can decide all users living in Iceland, Norway, Sweden, Finland, or Denmark will be stored in a partition for the Nordic countries.
401 |
402 | 3. Round-robin partitioning
403 |
404 | This is a very simple strategy that ensures uniform data distribution. With ‘n’ partitions, the ‘i’ tuple is assigned to partition (i mod n).
405 |
406 | 4. Composite partitioning
407 |
408 | Under this scheme, we combine any of the above partitioning schemes to devise a new scheme. For example, first applying a list partitioning scheme and then a hash based partitioning. Consistent hashing could be considered a composite of hash and list partitioning where the hash reduces the key space to a size that can be listed.
409 |
410 | ## 14.2. Common Problems of Data Partitioning
411 |
412 | 1. Joins and Denormalization
413 |
414 | Performing joins on a database which is running on one server is straightforward, but once a database is partitioned and spread across multiple machines it is often not feasible to perform joins that span database partitions. Such joins will not be performance efficient since data has to be compiled from multiple servers. A common workaround for this problem is to denormalize the database so that queries that previously required joins can be performed from a single table. Of course, the service now has to deal with all the perils of denormalization such as data inconsistency.
415 |
416 | 2. Referential integrity
417 |
418 | As we saw that performing a cross-partition query on a partitioned database is not feasible, similarly, trying to enforce data integrity constraints such as foreign keys in a partitioned database can be extremely difficult.
419 |
420 | Most of RDBMS do not support foreign keys constraints across databases on different database servers. Which means that applications that require referential integrity on partitioned databases often have to enforce it in application code. Often in such cases, applications have to run regular SQL jobs to clean up dangling references.
421 |
422 | 3. Rebalancing
423 |
424 | There could be many reasons we have to change our partitioning scheme:
425 |
426 | - The data distribution is not uniform, e.g., there are a lot of places for a particular ZIP code that cannot fit into one database partition.
427 | - There is a lot of load on a partition, e.g., there are too many requests being handled by the DB partition dedicated to user photos.
428 |
429 | In such cases, either we have to create more DB partitions or have to rebalance existing partitions, which means the partitioning scheme changed and all existing data moved to new locations. Doing this without incurring downtime is extremely difficult. Using a scheme like directory based partitioning does make rebalancing a more palatable experience at the cost of increasing the complexity of the system and creating a new single point of failure (i.e. the lookup service/database).
430 |
431 | # 15. Proxy Server
432 |
433 | ## 15.1. Open Proxy
434 |
435 | An open proxy is a proxy server that is accessible by any Internet user. Generally, a proxy server only allows users within a network group (a closed proxy) to store and forward Internet services such as DNS or web pages to reduce and control the bandwidth used by the group. With an open proxy, however, any user on the Internet is able to use this forwarding service. There are 2 famous open proxy types:
436 |
437 | 1. Anonymous Proxy. Reveals its identity as a server but doesn't disclose the initial IP address. Though this proxy server can be discovered easily, it can be beneficial for some users as it hides their IP address.
438 | 2. Transparent Proxy: This proxy server again identifies itself, and with the support of HTTP headers, the first IP address can be viewed. The main benefit of using this sort of server is its ability to cache the websites.
439 |
440 | ## 15.2. Reverse Proxy
441 |
442 | A reverse proxy retrieves resources on behalf of a client from one or more servers. These resources are then returned to the client, appearing as if they originated from the proxy server itself.
443 |
444 | # 16. SQL & NoSQL
445 |
446 | ## 16.1. SQL
447 |
448 | Relational databases store data in rows and columns. Each row contains all the information about one entity and each column contains all the separate data points. Some of the most popular relational databases are MySQL, Oracle, MS SQL Server, SQLite, Postgres, and MariaDB.
449 |
450 | ## 16.2. NoSQL
451 |
452 | 1. __Key-Value Stores__: Data is stored in an array of key-value pairs. The ‘key’ is an attribute name which is linked to a ‘value’. Well-known key-value stores include Redis, Voldemort, and Dynamo.
453 |
454 | 2. __Document Databases__: In these databases, data is stored in documents (instead of rows and columns in a table) and these documents are grouped together in collections. Each document can have an entirely different structure. Document databases include the CouchDB and MongoDB.
455 |
456 | 3. __Wide-Column Databases__: Instead of ‘tables,’ in columnar databases we have column families, which are containers for rows. Unlike relational databases, we don’t need to know all the columns up front and each row doesn’t have to have the same number of columns. Columnar databases are best suited for analyzing large datasets - big names include Cassandra and HBase.
457 |
458 | 4. __Graph Databases__: These databases are used to store data whose relations are best represented in a graph. Data is saved in graph structures with nodes (entities), properties (information about the entities), and lines (connections between the entities). Examples of graph database include Neo4J and InfiniteGraph.
459 |
460 | ## 16.3. Differences: SQL vs. NoSQL
461 |
462 | 1. Storage: SQL stores data in tables where each row represents an entity and each column represents a data point about that entity; for example, if we are storing a car entity in a table, different columns could be ‘Color’, ‘Make’, ‘Model’, and so on.
463 |
464 | NoSQL databases have different data storage models. The main ones are key-value, document, graph, and columnar. We will discuss differences between these databases below.
465 |
466 | 2. Schema: In SQL, each record conforms to a fixed schema, meaning the columns must be decided and chosen before data entry and each row must have data for each column. The schema can be altered later, but it involves modifying the whole database and going offline.
467 |
468 | In NoSQL, schemas are dynamic. Columns can be added on the fly and each ‘row’ (or equivalent) doesn’t have to contain data for each ‘column.’
469 |
470 | 3. Querying: SQL databases use SQL (structured query language) for defining and manipulating the data, which is very powerful. In a NoSQL database, queries are focused on a collection of documents. Sometimes it is also called UnQL (Unstructured Query Language). Different databases have different syntax for using UnQL.
471 |
472 | 4. Scalability: In most common situations, SQL databases are vertically scalable, i.e., by increasing the horsepower (higher Memory, CPU, etc.) of the hardware, which can get very expensive. It is possible to scale a relational database across multiple servers, but this is a challenging and time-consuming process.
473 |
474 | On the other hand, NoSQL databases are horizontally scalable, meaning we can add more servers easily in our NoSQL database infrastructure to handle a lot of traffic. Any cheap commodity hardware or cloud instances can host NoSQL databases, thus making it a lot more cost-effective than vertical scaling. A lot of NoSQL technologies also distribute data across servers automatically.
475 |
476 | 5. Reliability or ACID Compliancy (Atomicity, Consistency, Isolation, Durability): The vast majority of relational databases are ACID compliant. So, when it comes to data reliability and safe guarantee of performing transactions, SQL databases are still the better bet.
477 |
478 | 6. Most of the NoSQL solutions sacrifice ACID compliance for performance and scalability.
479 |
480 | ## 16.4. Choose which?
481 |
482 | ### 16.4.1. SQL
483 |
484 | 1. We need to ensure ACID compliance. ACID compliance reduces anomalies and protects the integrity of your database by prescribing exactly how transactions interact with the database. Generally, NoSQL databases sacrifice ACID compliance for scalability and processing speed, but for many e-commerce and financial applications, an ACID-compliant database remains the preferred option.
485 |
486 | 2. Your data is structured and unchanging. If your business is not experiencing massive growth that would require more servers and if you’re only working with data that is consistent, then there may be no reason to use a system designed to support a variety of data types and high traffic volume.
487 |
488 | ### 16.4.2. NoSQL
489 |
490 | When all the other components of our application are fast and seamless, NoSQL databases prevent data from being the bottleneck. Big data is contributing to a large success for NoSQL databases, mainly because it handles data differently than the traditional relational databases. A few popular examples of NoSQL databases are MongoDB, CouchDB, Cassandra, and HBase.
491 |
492 | 1. Storing large volumes of data that often have little to no structure. A NoSQL database sets no limits on the types of data we can store together and allows us to add new types as the need changes. With document-based databases, you can store data in one place without having to define what “types” of data those are in advance.
493 |
494 | 2. Making the most of cloud computing and storage. Cloud-based storage is an excellent cost-saving solution but requires data to be easily spread across multiple servers to scale up. Using commodity (affordable, smaller) hardware on-site or in the cloud saves you the hassle of additional software and NoSQL databases like Cassandra are designed to be scaled across multiple data centers out of the box, without a lot of headaches.
495 |
496 | 3. Rapid development. NoSQL is extremely useful for rapid development as it doesn’t need to be prepped ahead of time. If you’re working on quick iterations of your system which require making frequent updates to the data structure without a lot of downtime between versions, a relational database will slow you down.
497 |
498 | # 17. CAP Theorem
499 |
500 | CAP theorem states that it is impossible for a distributed software system to simultaneously provide more than two out of three of the following guarantees (CAP): Consistency, Availability, and Partition tolerance. When we design a distributed system, trading off among CAP is almost the first thing we want to consider. CAP theorem says while designing a distributed system, we can pick only two of the following three options:
501 |
502 | - Consistency: All nodes see the same data at the same time. Consistency is achieved by updating several nodes before allowing further reads.
503 |
504 | - Availability: Every request gets a response on success/failure. Availability is achieved by replicating the data across different servers.
505 |
506 | - Partition tolerance: The system continues to work despite message loss or partial failure. A partition-tolerant system can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.
507 |
508 | 
509 |
510 | We cannot build a general data store that is continually available, sequentially consistent, and tolerant to any partition failures. We can only build a system that has any two of these three properties. Because, to be consistent, all nodes should see the same set of updates in the same order. But if the network loses a partition, updates in one partition might not make it to the other partitions before a client reads from the out-of-date partition after having read from the up-to-date one. The only thing that can be done to cope with this possibility is to stop serving requests from the out-of-date partition, but then the service is no longer 100% available.
511 |
512 | # 18. Consistent Hashing
513 |
514 | ## 18.1. Improve caching system
515 |
516 | Distributed Hash Table (DHT) is one of the fundamental components used in distributed scalable systems. Hash Tables need a key, a value, and a hash function where hash function maps the key to a location where the value is stored.
517 |
518 | index = hash_function(key)
519 |
520 | Suppose we are designing a distributed caching system. Given ‘n’ cache servers, an intuitive hash function would be ‘key % n’. It is simple and commonly used. But it has two major drawbacks:
521 |
522 | - It is NOT horizontally scalable. Whenever a new cache host is added to the system, all existing mappings are broken. It will be a pain point in maintenance if the caching system contains lots of data. Practically, it becomes difficult to schedule a downtime to update all caching mappings.
523 | - It may NOT be load balanced, especially for non-uniformly distributed data. In practice, it can be easily assumed that the data will not be distributed uniformly. For the caching system, it translates into some caches becoming hot and saturated while the others idle and are almost empty.
524 |
525 | ## 18.2. Consistent Hashing
526 |
527 | Consistent hashing is a very useful strategy for distributed caching systems and DHTs. It allows us to distribute data across a cluster in such a way that will minimize reorganization when nodes are added or removed. Hence, the caching system will be easier to scale up or scale down.
528 |
529 | In Consistent Hashing, when the hash table is resized (e.g. a new cache host is added to the system), only ‘k/n’ keys need to be remapped where ‘k’ is the total number of keys and ‘n’ is the total number of servers. Recall that in a caching system using the ‘mod’ as the hash function, all keys need to be remapped.
530 |
531 | In Consistent Hashing, objects are mapped to the same host if possible. When a host is removed from the system, the objects on that host are shared by other hosts; when a new host is added, it takes its share from a few hosts without touching other’s shares.
532 |
533 | ## 18.3. Algorithm
534 |
535 | As a typical hash function, consistent hashing maps a key to an integer. Suppose the output of the hash function is in the range of [0, 256]. Imagine that the integers in the range are placed on a ring such that the values are wrapped around.
536 |
537 | 1. Given a list of cache servers, hash them to integers in the range.
538 | 2. To map a key to a server,
539 |
540 | - Hash it to a single integer.
541 | - Move clockwise on the ring until finding the first cache it encounters.
542 | - That cache is the one that contains the key. See animation below as an example: key1 maps to cache A; key2 maps to cache C.
543 |
544 | To add a new server, say D, keys that were originally residing at C will be split. Some of them will be shifted to D, while other keys will not be touched.
545 |
546 | To remove a cache or, if a cache fails, say A, all keys that were originally mapped to A will fall into B, and only those keys need to be moved to B; other keys will not be affected.
547 |
548 | For load balancing, as we discussed in the beginning, the real data is essentially randomly distributed and thus may not be uniform. It may make the keys on caches unbalanced.
549 |
550 | To handle this issue, we add “virtual replicas” for caches. Instead of mapping each cache to a single point on the ring, we map it to multiple points on the ring, i.e. replicas. This way, each cache is associated with multiple portions of the ring.
551 |
552 | If the hash function “mixes well,” as the number of replicas increases, the keys will be more balanced.
--------------------------------------------------------------------------------
/Beyond Cracking the Coding Interview/Beyond Cracking the Coding Interview.md:
--------------------------------------------------------------------------------
1 | Beyond Cracking the Coding Interview - by Gayle Laakmann McDowell, Mike Mroczka, Aline Lerner, Nil Mamano, 2025
2 |
3 | > [Book Link](https://www.amazon.com/Beyond-Cracking-Coding-Interview-Successfully/dp/195570600X)
4 |
5 |
6 |
7 | # Contents
8 |
9 |
10 |
11 | - [20. Anatomy of Coding Interview](#20-anatomy-of-coding-interview)
12 | - [25. Dynamic Arrays](#25-dynamic-arrays)
13 | - [26. String Manipulation](#26-string-manipulation)
14 | - [27. Two Pointers](#27-two-pointers)
15 | - [28. Grids & Matrices](#28-grids-matrices)
16 | - [29. Binary Search](#29-binary-search)
17 | - [30. Set & Maps](#30-set-maps)
18 | - [31. Sorting](#31-sorting)
19 | - [32. Stacks & Queues](#32-stacks-queues)
20 | - [33. Recursion](#33-recursion)
21 | - [34. Linked Lists](#34-linked-lists)
22 | - [35. Trees ](#35-trees)
23 | - [36. Graphs](#36-graphs)
24 | - [37. Heaps ](#37-heaps)
25 | - [38. Sliding Windows ](#38-sliding-windows)
26 | - [39. Backtracking](#39-backtracking)
27 | - [40. Dynamic Programming](#40-dynamic-programming)
28 | - [41. Greedy Algorithms](#41-greedy-algorithms)
29 | - [42. Topological Sort ](#42-topological-sort)
30 | - [43. Prefix Sums ](#43-prefix-sums)
31 |
32 |
33 |
34 |
35 | ## 20. Anatomy of Coding Interview
36 |
37 | Get buy-in with magic question: I'd like to use A algorithm with B time and C space to solve problem. Should I code this now, or should I keep thinking?
38 |
39 |
40 | ## 25. Dynamic Arrays
41 |
42 | ```py
43 | class DynamicArray:
44 | def __init__(self):
45 | self.capacity = 10
46 | self._size = 0
47 | self.fixed_array = [None] * self.capacity
48 |
49 | def get(self, i):
50 | if i < 0 or i >= self._size:
51 | raise IndexError('index out of bounds')
52 | return self.fixed_array[i]
53 |
54 | def set(self, i, x):
55 | if i < 0 or i >= self._size:
56 | raise IndexError('index out of bounds')
57 | self.fixed_array[i] = x
58 |
59 | def size(self):
60 | return self._size
61 |
62 | def append(self, x):
63 | if self._size == self.capacity:
64 | self.resize(self.capacity * 2)
65 | self.fixed_array[self._size] = x
66 | self._size += 1
67 |
68 | def resize(self, new_capacity):
69 | new_fixed_size_arr = [None] * new_capacity
70 | for i in range(self._size):
71 | new_fixed_size_arr[i] = self.fixed_array[i]
72 | self.fixed_array = new_fixed_size_arr
73 | self.capacity = new_capacity
74 |
75 | def pop_back(self):
76 | if self._size == 0:
77 | raise IndexError('pop from empty array')
78 | self._size -= 1
79 | if self._size / self.capacity < 0.25 and self.capacity > 10:
80 | self.resize(self.capacity // 2)
81 |
82 | def pop(self, i):
83 | if i < 0 or i >= self._size:
84 | raise IndexError('Index out of bounds')
85 | saved_element = self.fixed_array[i]
86 | for index in range(i, self._size - 1):
87 | self.fixed_array[index] = self.fixed_array[index + 1]
88 | self.pop_back()
89 | return saved_element
90 | ```
91 |
92 | ## 26. String Manipulation
93 |
94 | ```py
95 | def is_uppercase(c):
96 | return ord(c) >= ord('A') and ord(c) <= ord('Z')
97 |
98 | def is_digit(c):
99 | return ord(c) >= ord('0') and ord(c) <= ord('9')
100 |
101 | def is_alphanumeric(c):
102 | return is_lowercase(c) or is_uppercase(c) or is_digit(c)
103 |
104 | def to_uppercase(c):
105 | if not is_lowercase(c):
106 | return c
107 | return chr(ord(c) - ord('a') + ord('A'))
108 |
109 | def split(s, c):
110 | if not s:
111 | return []
112 | res = []
113 | current = []
114 | for char in s:
115 | if char == c:
116 | res.append(''.join(current))
117 | current = []
118 | else:
119 | current.append(char)
120 | res.append(''.join(current))
121 | return res
122 |
123 | def join(arr, s):
124 | res = []
125 | for i in range(len(arr)):
126 | if i != 0:
127 | for c in s:
128 | res.append(c)
129 | for c in arr[i]:
130 | res.append(c)
131 | return array_to_string(res)
132 | ```
133 |
134 |
135 | ## 27. Two Pointers
136 |
137 | ```py
138 | def palindrome(s):
139 | l, r = 0, len(s) - 1
140 | while l < r:
141 | if s[l] != s[r]:
142 | return False
143 | l += 1
144 | r -= 1
145 | return True
146 |
147 | def smaller_prefixes(arr):
148 | sp, fp = 0, 0
149 | slow_sum, fast_sum = 0, 0
150 | while fp < len(arr):
151 | slow_sum += arr[sp]
152 | fast_sum += arr[fp] + arr[fp+1]
153 | if slow_sum >= fast_sum:
154 | return False
155 | sp += 1
156 | fp += 2
157 | return True
158 |
159 | def common_elements(arr1, arr2):
160 | p1, p2 = 0, 0
161 | res = []
162 | while p1 < len(arr1) and p2 < len(arr2):
163 | if arr1[p1] == arr2[p2]:
164 | res.append(arr1[p1])
165 | p1 += 1
166 | p2 += 1
167 | elif arr1[p1] < arr2[p2]:
168 | p1 += 1
169 | else:
170 | p2 += 1
171 | return res
172 |
173 | def palindrome_sentence(s):
174 | l, r = 0, len(s) - 1
175 | while l < r:
176 | if not s[l].isalpha():
177 | l += 1
178 | elif not s[r].isalpha():
179 | r -= 1
180 | else:
181 | if s[l].lower() != s[r].lower():
182 | return False
183 | l += 1
184 | r -= 1
185 | return True
186 |
187 | def reverse_case_match(s):
188 | l, r = 0, len(s) - 1
189 | while l < len(s) and r >= 0:
190 | if not s[l].islower():
191 | l += 1
192 | elif not s[r].isupper():
193 | r -= 1
194 | else:
195 | if s[l] != s[r].lower():
196 | return False
197 | l += 1
198 | r -= 1
199 | return True
200 |
201 | def merge(arr1, arr2):
202 | p1, p2 = 0, 0
203 | res = []
204 | while p1 < len(arr1) and p2 < len(arr2):
205 | if arr1[p1] < arr2[p2]:
206 | res.append(arr1[p1])
207 | p1 += 1
208 | else:
209 | res.append(arr2[p2])
210 | p2 += 1
211 | while p1 < len(arr1):
212 | res.append(arr1[p1])
213 | p1 += 1
214 | while p2 < len(arr2):
215 | res.append(arr2[p2])
216 | p2 += 1
217 | return res
218 |
219 | def two_sum(arr):
220 | l, r = 0, len(arr) - 1
221 | while l < r:
222 | if arr[l] + arr[r] > 0:
223 | r -= 1
224 | elif arr[l] + arr[r] < 0:
225 | l += 1
226 | else:
227 | return True
228 | return False
229 |
230 | def sort_valley_array(arr):
231 | if len(arr) == 0:
232 | return []
233 | l, r = 0, len(arr) - 1
234 | res = [0] * len(arr)
235 | i = len(arr) - 1
236 | while l < r:
237 | if arr[l] >= arr[r]:
238 | res[i] = arr[l]
239 | l += 1
240 | i -= 1
241 | else:
242 | res[i] = arr[r]
243 | r -= 1
244 | i -= 1
245 | res[0] = arr[l]
246 | return res
247 |
248 | def intersection(int1, int2):
249 | overlap_start = max(int1[0], int2[0])
250 | overlap_end = min(int1[1], int2[1])
251 | return [overlap_start, overlap_end]
252 | def interval_intersection(arr1, arr2):
253 | p1, p2 = 0, 0
254 | n1, n2 = len(arr1), len(arr2)
255 | res = []
256 | while p1 < n1 and p2 < n2:
257 | int1, int2 = arr1[p1], arr2[p2]
258 | if int1[1] < int2[0]:
259 | p1 += 1
260 | elif int2[1] < int1[0]:
261 | p2 += 1
262 | else:
263 | res.append(intersection(int1, int2))
264 | if int1[1] < int2[1]:
265 | p1 += 1
266 | else:
267 | p2 += 1
268 | return res
269 |
270 | def reverse(arr):
271 | l, r = 0, len(arr) - 1
272 | while l < r:
273 | arr[l], arr[r] = arr[r], arr[l]
274 | l += 1
275 | r -= 1
276 |
277 | def sort_even(arr):
278 | l, r = 0, len(arr) - 1
279 | while l < r:
280 | if arr[l] % 2 == 0:
281 | l += 1
282 | elif arr[r] % 2 == 1:
283 | r -= 1
284 | else:
285 | arr[l], arr[r] = arr[r], arr[l]
286 | l += 1
287 | r -= 1
288 |
289 | def remove_duplicates(arr):
290 | s, w = 0, 0
291 | while s < len(arr):
292 | must_keep = s == 0 or arr[s] != arr[s-1]
293 | if must_keep:
294 | arr[w] = arr[s]
295 | w += 1
296 | s += 1
297 | return w
298 |
299 | def move_word(arr, word):
300 | seeker, writer = 0, 0
301 | i = 0
302 | while seeker < len(arr):
303 | if i < len(word) and arr[seeker] == word[i]:
304 | seeker += 1
305 | i += 1
306 | else:
307 | arr[writer] = arr[seeker]
308 | seeker += 1
309 | writer += 1
310 | for c in word:
311 | arr[writer] = c
312 | writer += 1
313 | ```
314 |
315 |
316 | ## 28. Grids & Matrices
317 |
318 | ```py
319 | def is_valid(room, r, c):
320 | return 0 <= r < len(room) and 0 <= c < len(room[0]) and room[r][c] != 1
321 | def valid_moves(room, r, c):
322 | moves = []
323 | directions = [[-1, 0], [1, 0], [0, -1], [0, 1]]
324 | for dir_r, dir_c in directions:
325 | new_r = r + dir_r
326 | new_c = c + dir_c
327 | if is_valid(room, new_r, new_c):
328 | moves.append([new_r, new_c])
329 | return moves
330 |
331 | def queen_valid_moves(board, piece, r, c):
332 | moves = []
333 | king_directions = [
334 | [-1, 0], [1, 0], [0, -1], [0, 1] # vertical and horizontal
335 | [-1, -1], [-1, 1], [1, -1], [1, 1] # diagonal
336 | ]
337 | knight_directions = [[-2, 1], [-1, 2], [1, 2], [2, 1], [2, -1], [1, -2], [-1, -2], [-2, -1]]
338 | if piece == "knight":
339 | directions = knight_directions
340 | else: # king and queen
341 | directions = king_directions
342 | for dir_r, dir_c in directions:
343 | new_r, new_c = r + dir_r, c + dir_c
344 | if piece == "queen":
345 | while is_valid(board, new_r, new_c):
346 | new_r += dir_r
347 | new_c += dir_c
348 | elif is_valid(board, new_r, new_c):
349 | moves.append([new_r, new_c])
350 | return moves
351 |
352 | def safe_cells(board):
353 | n = len(board)
354 | res = [[0] * n for _ in range(n)]
355 | for r in range(n):
356 | for c in range(n):
357 | if board[r][c] == 1:
358 | res[r][c] = 1
359 | mark_reachable_cells(board, r, c, res)
360 | return res
361 | def mark_reachable_cells(board, r, c, res):
362 | directions = [
363 | [-1, 0], [1, 0], [0, -1], [0, 1] # vertical and horizontal
364 | [-1, -1], [-1, 1], [1, -1], [1, 1] # diagonal
365 | ]
366 | for dir_r, dir_c in directions:
367 | new_r, new_c = r + dir_r, c + dir_c
368 | while is_valid(board, new_r, new_c):
369 | res[new_r][new_c] = 1
370 | new_r += dir_r
371 | new_c += dir_c
372 |
373 | def is_valid(grid, r, c):
374 | return 0 <= r < len(grid) and 0 <= c < len(grid[0]) and grid[r][c] == 0
375 | def spiral(n):
376 | val = n * n - 1
377 | res = [[0] * n for _ in range(n)]
378 | r, c = n - 1, n - 1
379 | directions = [[-1, 0], [0, -1], [1, 0], [0, 1]] # counterclockwise
380 | dir = 0 # start going up
381 | while val > 0:
382 | res[r][c] = val
383 | val -= 1
384 | if not is_valid(res, r + directions[dir][0], c + directions[dir][1]):
385 | dir = (dir + 1) % 4 # change directions counterclockwise
386 | r, c = r + directions[dir][0], c + directions[dir][1]
387 | return res
388 |
389 | def distance_to_river(field):
390 | R, C = len(field), len(field[0])
391 | def has_footprints(r, c):
392 | return 0 <= r < R and 0 <= c < C and field[r][c] == 1
393 | r, c = 0, 0
394 | while field[r][c] != 1:
395 | r += 1
396 | closest = r
397 | directions_row = [-1, 0, 1]
398 | while c < C:
399 | for dir_r in directions_row:
400 | new_r, new_c = r + dir_r, c + 1
401 | if has_footprints(new_r, new_c):
402 | r, c = new_r, new_c
403 | closest = min(closest, r)
404 | break
405 | return closest
406 |
407 | def valid_rows(board):
408 | R, C = len(board), len(board[0])
409 | for r in range(R):
410 | seen = set()
411 | for c in range(C):
412 | if board[r][c] in seen:
413 | return False
414 | if board[r][c] != 0:
415 | seen.add(board[r][c])
416 | return True
417 |
418 | def valid_subgrid(board, r, c):
419 | seen = set()
420 | for new_r in range(r, r + 3):
421 | for new_c in range(c, c + 3):
422 | if board[new_r][new_c] in seen:
423 | return False
424 | if board[new_r][new_c] != 0:
425 | seen.add(board[new_r][new_c])
426 | return True
427 | def valid_subgrids(board):
428 | for r in range(3):
429 | for c in range(3):
430 | if not valid_subgrid(board, r * 3, c * 3):
431 | return False
432 | return True
433 |
434 | def subgrid_maximums(grid):
435 | R, C = len(grid), len(grid[0])
436 | res = [row.copy() for row in grid]
437 | for r in range(R - 1, -1, -1):
438 | for c in range(C - 1, -1, -1):
439 | if r + 1 < R:
440 | res[r][c] = max(res[r][c], grid[r + 1][c])
441 | if c + 1 < C:
442 | res[r][c] = max(res[r][c], grid[r][c + 1])
443 | return res
444 |
445 | def backward_sum(grid):
446 | R, C = len(grid), len(grid[0])
447 | res = [row.copy() for row in grid]
448 | for r in range(R - 1, -1, -1):
449 | for c in range(C - 1, -1, -1):
450 | if r + 1 < R:
451 | res[r][c] += res[r + 1][c]
452 | if c + 1 < C:
453 | res[r][c] += res[r][c + 1]
454 | if r + 1 < R and c + 1 < C: # subtract double-counted subgrid
455 | res[r][c] -= res[r + 1][c + 1]
456 | return res
457 |
458 | class Matrix:
459 | def __init__(self, grid):
460 | self.matrix = [row.copy() for row in grid]
461 | def transpose(self):
462 | matrix = self.matrix
463 | for r in range(len(matrix)):
464 | for c in range(r):
465 | matrix[r][c], matrix[c][r] = matrix[c][r], matrix[r][c]
466 | def reflect_horizontally(self):
467 | for row in self.matrix:
468 | row.reverse()
469 | def reflect_vertically(self):
470 | self.matrix.reverse()
471 | def rotate_clockwise(self):
472 | self.transpose()
473 | self.reflect_horizontally()
474 | def rotate_counterclockwise(self):
475 | self.transpose()
476 | self.reflect_vertically()
477 | ```
478 |
479 |
480 | ## 29. Binary Search
481 |
482 | ```py
483 | def binary_search(arr, target):
484 | n = len(arr)
485 | if n == 0:
486 | return -1
487 | l, r = 0, n - 1
488 | if arr[l] >= target or arr[r] < target:
489 | if arr[l] == target:
490 | return 0
491 | return -1
492 | while r - l > 1:
493 | mid = (l + r) // 2
494 | if arr[mid] < target:
495 | l = mid
496 | else:
497 | r = mid
498 | if arr[r] == target:
499 | return r
500 | return -1
501 |
502 | def is_before(val):
503 | return not is_stolen(val)
504 | def find_bike(t1, t2):
505 | l, r = t1, t2
506 | while r - l > 1:
507 | mid = (l + r) // 2
508 | if is_before(mid):
509 | l = mid
510 | else:
511 | r = mid
512 | return r
513 |
514 | def valley_min_index(arr):
515 | def is_before(i):
516 | return i == 0 or arr[i] < arr[i - 1]
517 | l, r = 0, len(arr) - 1
518 | if is_before(r):
519 | return arr[r]
520 | while r - l > 1:
521 | mid = (l + r) // 2
522 | if is_before(mid):
523 | l = mid
524 | else:
525 | r = mid
526 | return mid(arr[l], arr[r])
527 |
528 | def two_array_two_sum(sorted_arr, unsorted_arr):
529 | for i, val in enumerate(unsorted_arr):
530 | idx = binary_search(sorted_arr, -val)
531 | if idx != -1:
532 | return [idx, i]
533 | return [-1, -1]
534 |
535 | def is_before(grid, i, target):
536 | num_cols = len(grid[0])
537 | row, col = i // num_cols, i % num_cols
538 | return grid[row][col] < target
539 |
540 | def find_through_api(target):
541 | def is_before(idx):
542 | return fetch(idx) < target
543 | l, r = 0, 1
544 | # get rightmost boundary
545 | while fetch(r) != -1:
546 | r *= 2
547 | # binary search
548 | # ...
549 |
550 | # is it impossible to split arr into k subarrays, each with sum <= max_sum?
551 | def is_before(arr, k, max_sum):
552 | splits_required = get_splits_required(arr, max_sum)
553 | return splits_required > k
554 | # return min number of subarrays with a given max sum, assume max_sum >= max(arr)
555 | def get_splits_required(arr, max_sum):
556 | splits_required = 1
557 | current_sum = 0
558 | for num in arr:
559 | if current_sum + num > max_sum:
560 | splits_required += 1
561 | current_sum = num # start new subarray with current number
562 | else:
563 | current_sum += num
564 | return splits_required
565 | def min_subarray_sum_split(arr, k):
566 | l, r = max(arr), sum(arr) # range for max subarray sum
567 | if not is_before(arr, k, l):
568 | return l
569 | while r - l > 1:
570 | mid = (l + r) // 2
571 | if is_before(arr, k, mid):
572 | l = mid
573 | else:
574 | r = mid
575 | return r
576 |
577 | def num_refills(a, b):
578 | # can we pour 'num_pours' times?
579 | def is_before(num_pours):
580 | return num_pours * b <= a
581 | # exponential search (repeatedly doubling until find upper bound)
582 | k = 1
583 | while is_before(k * 2):
584 | k *= 2
585 | # binary search between k and k * 2
586 | l, r = k, k * 2
587 | while r - l > 1:
588 | gap = r - l
589 | half_gap = gap >> 1 # bit shift instead of division
590 | mid = l + half_gap
591 | if is_before(mid):
592 | l = mid
593 | else:
594 | r = mid
595 | return l
596 |
597 | def get_ones_in_row(row):
598 | if row[0] == 0:
599 | return 0
600 | if row[-1] == 1:
601 | return len(row)
602 | def is_before_row(idx):
603 | return row[idx] == 1
604 | l, r = 0, len(row)
605 | while r - l > 1:
606 | mid = (l + r) // 2
607 | if is_before_row(mid):
608 | l = mid
609 | else:
610 | r = mid
611 | return r
612 | def is_before(picture):
613 | water = 0
614 | for row in picture:
615 | water += get_ones_in_row(row)
616 | total = len(picture[0]) ** 2
617 | return water / total < 0.5
618 | ```
619 |
620 |
621 | ## 30. Set & Maps
622 |
623 | ```py
624 | def account_sharing(connections):
625 | seen = set()
626 | for ip, username in connections:
627 | if username in seen:
628 | return ip
629 | seen.add(username)
630 | return ""
631 |
632 | def most_shared_account(connections):
633 | user_to_count = dict()
634 | for _, user in connections:
635 | if not user in user_to_count:
636 | user_to_count[user] = 0
637 | user_to_count[user] += 1
638 | most_shared_user = None
639 | for user, count in user_to_count.items():
640 | if not most_shared_account or count > user_to_count[most_shared_user]:
641 | most_shared_user = user
642 | return most_shared_user
643 |
644 | def multi_account_cheating(users):
645 | unique_lists = set()
646 | for _, ips in users:
647 | immutable_list = tuple(sorted(ips))
648 | if immutable_list in unique_lists:
649 | return True
650 | unique_lists.add(immutable_list)
651 | return False
652 |
653 | class DomainResolver:
654 | def __init__(self):
655 | self.ip_to_domains = dict()
656 | self.domain_to_subdomains = dict()
657 | def register_domain(self, ip, domain):
658 | if ip not in self.ip_to_domains:
659 | self.ip_to_domains[ip] = set()
660 | self.ip_to_domains[ip].add(domain)
661 | def register_subdomain(self, domain, subdomain):
662 | if domain not in self.domain_to_subdomains:
663 | self.domain_to_subdomains[domain] = set()
664 | self.domain_to_subdomains[domain].add(subdomain)
665 | def has_subdomain(self, ip, domain, subdomain):
666 | if ip not in self.ip_to_domains:
667 | return False
668 | if domain not in self.domain_to_subdomains:
669 | return False
670 | return subdomain in self.domain_to_subdomains[domain]
671 |
672 | def find_squared(arr):
673 | # create map from number to index (allow multiple indices per number)
674 | num_to_indices = {}
675 | for i, num in enumerate(arr):
676 | if num not in num_to_indices:
677 | num_to_indices[num] = []
678 | num_to_indices[num].append(i)
679 | res = []
680 | # iterate through each number and check if its square exists in map
681 | for i, num in enumerate(arr):
682 | square = num ** 2
683 | if square in num_to_indices:
684 | for j in num_to_indices[square]:
685 | res.append([i, j])
686 | return res
687 |
688 | def suspect_students(answers, m, students):
689 | def same_row(desk1, desk2):
690 | return (desk1 - 1) // m == (desk2 - 1) // m
691 | desk_to_index = {}
692 | for i, [student_id, desk, student_answers] in enumerate(students):
693 | if student_answers != answers:
694 | desk_to_index[desk] = i
695 | sus_pairs = []
696 | for student_id, desk, answers in students:
697 | other_desk = desk + 1
698 | if same_row(desk, other_desk) and other_desk in desk_to_index:
699 | other_student = students[desk_to_index[other_desk]]
700 | if answers == other_student[2]:
701 | sus_pairs.append([student_id, other_student[0]])
702 | return sus_pairs
703 |
704 | def alphabetic_sum_product(words, target):
705 | sums = set()
706 | for word in words:
707 | sums.add(alphabetical_sum(word))
708 | for i in sums:
709 | if target % i != 0:
710 | continue
711 | for j in sums:
712 | k = target / (i * j)
713 | if k in sums:
714 | return True
715 | return False
716 |
717 | def find_anomalies(log):
718 | opened = {} # ticket -> agent who opened it
719 | working_on = {} # agent -> ticket they're working on
720 | seen = set() # tickets that were opened or closed
721 | anomalies = set()
722 | for agent, action, ticket in log:
723 | if ticket in anomalies:
724 | continue
725 | if action == "open":
726 | if ticket in seen:
727 | anomalies.add(ticket)
728 | continue
729 | if agent in working_on:
730 | # if agent is working on another ticket, that ticket is anomalous
731 | anomalies.add(working_on[agent])
732 | opened[ticket] = agent
733 | working_on[agent] = ticket
734 | seen.add(ticket)
735 | else:
736 | if ticket not in opened or opened[ticket] != agent:
737 | anomalies.add(ticket)
738 | continue
739 | if agent not in working_on or working_on[agent] != ticket:
740 | anomalies.add(ticket)
741 | continue
742 | del working_on[agent]
743 | del opened[ticket]
744 | # any tickets still open are anomalous
745 | anomalies.update(opened.keys())
746 | return list(anomalies)
747 |
748 | def set_intersection(sets):
749 | res = sets[0]
750 | for i in range(1, len(sets)):
751 | res = {elem for elem in sets[i] if elem in res}
752 | return res
753 | ```
754 |
755 |
756 | ## 31. Sorting
757 |
758 | ```py
759 | def mergesort(arr):
760 | n = len(arr)
761 | if n <= 1:
762 | return arr
763 | left = mergesort(arr[:n // 2])
764 | right = mergesort(arr[n // 2:])
765 | return merge(left, right)
766 |
767 | def quicksort(arr):
768 | if len(arr) <= 1:
769 | return arr
770 | pivot = random.choice(arr)
771 | smaller, equal, larger = [], [], []
772 | for x in arr:
773 | if x < pivot: smaller.append(x)
774 | if x == pivot: equal.append(x)
775 | if x > pivot: larger.append(x)
776 | return quicksort(smaller) + equal + quicksort(larger)
777 |
778 | def counting_sort(arr):
779 | if not arr: return []
780 | R = max(arr)
781 | counts = [0] * (R + 1)
782 | for x in arr:
783 | counts[x] += 1
784 | res = []
785 | for x in range(R + 1):
786 | while counts[x] > 0:
787 | res.append(x)
788 | counts[x] -= 1
789 | return res
790 |
791 | def descending_sort(strings):
792 | return sorted(strings, key=lambda s: s.lower(), reverse=True)
793 |
794 | def sort_by_interval_end(intervals):
795 | return sorted(intervals, key=lambda interval: interval[1])
796 |
797 | def sort_value_then_suit(deck):
798 | suit_map = {'clubs': 0, 'hearts': 1, 'spades': 2, 'diamonds': 3}
799 | return sorted(deck, key=lambda card: (card.value, suit_map[card.suit]))
800 |
801 | def new_deck_order(deck):
802 | suit_map = {'hearts': 0, 'clubs': 1, 'diamonds': 2, 'spades': 3}
803 | return sorted(deck, key=lambda card: (suit_map[card.suit], card.value))
804 |
805 | def stable_sort_by_value(deck):
806 | return sorted(deck, key=lambda card: card.value)
807 |
808 | def letter_occurrences(word):
809 | letter_to_count = dict()
810 | for c in word:
811 | if c not in letter_to_count:
812 | letter_to_count[c] = 0
813 | letter_to_count[c] += 1
814 | tuples = []
815 | for letter, count in letter_to_count.items():
816 | tuples.append((letter, count))
817 | tuples.sort(key=lambda x: (-x[1], x[0]))
818 | res = []
819 | for letter, _ in tuples:
820 | res.append(letter)
821 | return res
822 |
823 | def are_circles_nested(circles):
824 | circles.sort(key = lambda c: c[1], reverse=True)
825 | for i in range(len(circles) - 1):
826 | if not contains(circles[i], circles[i + 1]):
827 | return False
828 | return True
829 | def contains(c1, c2):
830 | (x1, y1), r1 = c1
831 | (x2, y2), r2 = c2
832 | center_distance = sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2)
833 | return center_distance + r2 < r1
834 |
835 | def process_operations(nums, operations):
836 | n = len(nums)
837 | deleted = set()
838 | sorted_indices = []
839 | for i in range(n):
840 | sorted_indices.append(i)
841 | sorted_indices.sort(key=lambda i: nums[i])
842 | smallest_idx = 0
843 | for op in operations:
844 | if 0 <= op < n:
845 | deleted.add(op)
846 | else:
847 | # skip until the next non-deleted smallest index
848 | while smallest_idx < n and sorted_indices[smallest_idx] in deleted:
849 | smallest_idx += 1
850 | if smallest_idx < n:
851 | deleted.add(sorted_indices[smallest_idx])
852 | smallest_idx += 1
853 | res = []
854 | for i in range(n):
855 | if not i in deleted:
856 | res.append(nums[i])
857 | return res
858 |
859 | class Spreadsheet:
860 | def __init__(self, rows, cols):
861 | self.rows = rows
862 | self.cols = cols
863 | self.sheet = []
864 | for _ in range(rows):
865 | self.sheet.append([0] * cols)
866 | def set(self, row, col, value):
867 | self.sheet[row][col] = value
868 | def get(self, row, col):
869 | return self.sheet[row][col]
870 | def sort_rows_by_column(self, col):
871 | self.sheet.sort(key=lambda row: row[col])
872 | def sort_columns_by_row(self, row):
873 | columns_with_values = []
874 | for col in range(self.cols):
875 | columns_with_values.append((col, self.sheet[row][col]))
876 | sorted_columns = sorted(columns_with_values, key=lambda x: x[1])
877 | sorted_sheet = []
878 | for r in range(self.rows):
879 | new_row = []
880 | for col, _ in sorted_columns:
881 | new_row.append(self.sheet[r][col])
882 | sorted_sheet.append(new_row)
883 | self.sheet = sorted_sheet
884 |
885 | def bucket_sort(books):
886 | if not books: return []
887 | min_year = min(book.year_published for book in books)
888 | max_year = max(book.year_published for book in books)
889 | buckets = [[] for _ in range(max_year - min_year + 1)]
890 | for book in books:
891 | buckets[book.year_published - min_year].append(book)
892 | res = []
893 | for bucket in buckets:
894 | for book in bucket:
895 | res.append(book)
896 | return res
897 |
898 | def quickselect(arr, k):
899 | if len(arr) == 1:
900 | return arr[0]
901 | pivot_index = random.randint(0, len(arr) - 1)
902 | pivot = arr[pivot_index]
903 | smaller, larger = [], []
904 | for x in arr:
905 | if x < pivot: smaller.append(x)
906 | elif x > pivot: larger.append(x)
907 | if k <= len(smaller):
908 | return quickselect(smaller, k)
909 | elif k == len(smaller) + 1:
910 | return pivot
911 | else:
912 | return quickselect(larger, k - len(smaller) - 1)
913 | def first_k(arr, k):
914 | if len(arr) == 0: return []
915 | kth_val = quickselect(arr, k)
916 | return [x for x in arr if x <= kth_val]
917 | ```
918 |
919 |
920 | ## 32. Stacks & Queues
921 |
922 | ```py
923 | class Stack:
924 | def __init__(self):
925 | self.array = []
926 | def push(self, value):
927 | self.array.append(value)
928 | def pop(self):
929 | if self.is_empty():
930 | raise IndexError('stack is empty')
931 | val = self.array[-1]
932 | self.array.pop()
933 | return val
934 | def peek(self):
935 | if self.is_empty():
936 | raise IndexError('stack is empty')
937 | return self.array[-1]
938 | def size(self):
939 | return len(self.array)
940 |
941 | def compress_array(arr):
942 | stack = []
943 | for num in arr:
944 | while stack and stack[-1] == num:
945 | num += stack.pop()
946 | stack.append(num)
947 | return stack
948 |
949 | def compress_array_by_k(arr, k):
950 | stack = []
951 | def merge(num):
952 | if not stack or stack[-1][0] != num:
953 | stack.append([num, 1])
954 | elif stack[-1][1] < k - 1:
955 | stack[-1][1] += 1
956 | else:
957 | stack.pop()
958 | merge(num * k)
959 | for num in arr:
960 | merge(num)
961 | res = []
962 | for num, count in stack:
963 | for _ in range(count):
964 | res.append(num)
965 | return res
966 |
967 | class ViewerCounter:
968 | def __init__(self, window):
969 | self.queues = {"guest": Queue(), "follower": Queue(), "subscriber": Queue()}
970 | self.window = window
971 | def join(self, t, v):
972 | self.queues[v].put(t)
973 | def get_viewers(self, t, v):
974 | queue = self.queues[v]
975 | while not queue.empty() and queue.peek() < t - self.window:
976 | queue.pop()
977 | return queue.size()
978 |
979 | def current_url(actions):
980 | stack = []
981 | for action, value in actions:
982 | if action == "go":
983 | stack.append(value)
984 | else:
985 | while len(stack) > 1 and value > 0:
986 | stack.pop()
987 | value -= 1
988 | return stack[-1]
989 |
990 | def current_url_followup(actions):
991 | stack = []
992 | forward_stack = []
993 | for action, value in actions:
994 | if action == "go":
995 | stack.append(value)
996 | forward_stack = []
997 | elif action == "back":
998 | while len(stack) > 1 and value > 0:
999 | forward_stack.append(stack.pop())
1000 | value -= 1
1001 | else:
1002 | while forward_stack and value > 0:
1003 | stack.append(forward_stack.pop())
1004 | value -= 1
1005 | return stack[-1]
1006 |
1007 | def balanced(s):
1008 | height = 0
1009 | for c in s:
1010 | if c == '(':
1011 | height += 1
1012 | else:
1013 | height -= 1
1014 | if height < 0:
1015 | return False
1016 | return height == 0
1017 |
1018 | def max_balanced_partition(s):
1019 | height = 0
1020 | res = 0
1021 | for c in s:
1022 | if c == '(':
1023 | height += 1
1024 | else:
1025 | height -= 1
1026 | if height == 0:
1027 | res += 1
1028 | return res
1029 |
1030 | def balanced_brackets(s, brackets):
1031 | open_to_close = dict()
1032 | close_set = set()
1033 | for pair in brackets:
1034 | open_to_close[pair[0]] = pair[1]
1035 | close_set.add(pair[1])
1036 | stack = []
1037 | for c in s:
1038 | if c in open_to_close:
1039 | stack.append(open_to_close[c])
1040 | elif c in close_set:
1041 | if not stack or stack[-1] != c:
1042 | return False
1043 | stack.pop()
1044 | return len(stack) == 0
1045 | ```
1046 |
1047 |
1048 | ## 33. Recursion
1049 |
1050 | ```py
1051 | def moves(seq):
1052 | res = []
1053 | def moves_rec(pos):
1054 | if pos == len(seq):
1055 | return
1056 | if seq[pos] == '2':
1057 | moves_rec(pos+1)
1058 | moves_rec(pos+2)
1059 | else:
1060 | res.append(seq[pos])
1061 | moves_rec(pos+1)
1062 | moves_rec(0)
1063 | return ''.join(res)
1064 |
1065 | def nested_array_sum(arr):
1066 | res = 0
1067 | for elem in arr:
1068 | if isinstance(elem, int):
1069 | res += elem
1070 | else:
1071 | res += nested_array_sum(elem)
1072 | return res
1073 |
1074 | def reverse_in_place(arr):
1075 | reverse_rec(arr, 0, len(arr) - 1)
1076 | def reverse_rec(arr, i, j):
1077 | if i >= j:
1078 | return
1079 | arr[i], arr[j] = arr[j], arr[i]
1080 | reverse_rec(arr, i + 1, j - 1)
1081 |
1082 | def power(a, p, m):
1083 | if p == 0:
1084 | return 1
1085 | if p % 2 == 0:
1086 | half = power(a, p // 2, m)
1087 | return (half * half) % m
1088 | return (a * power(a, p - 1, m)) % m
1089 |
1090 | def fib(n):
1091 | memo = {}
1092 | def fib_rec(i):
1093 | if i <= 1:
1094 | return 1
1095 | if i in memo:
1096 | return memo[i]
1097 | memo[i] = fib_rec(i - 1) + fib_rec(i - 2)
1098 | return memo[i]
1099 | return fib_rec(n)
1100 |
1101 | def blocks(n):
1102 | memo = dict()
1103 | def roof(n):
1104 | if n == 1:
1105 | return 1
1106 | if n in memo:
1107 | return memo[n]
1108 | memo[n] = roof(n - 1) * 2 + 1
1109 | return memo[n]
1110 | def blocks_rec(n):
1111 | if n == 1:
1112 | return 1
1113 | return blocks_rec(n - 1) * 2 + roof(n)
1114 | return blocks_rec(n)
1115 |
1116 | def max_laminal_sum(arr):
1117 | # return max sum for subliminal array in arr[l:r]
1118 | def max_laminal_sum_rec(l, r):
1119 | if r - l == 1:
1120 | return arr[l]
1121 | mid = (l + r) // 2
1122 | option1 = max_laminal_sum_rec(l, mid)
1123 | option2 = max_laminal_sum_rec(mid, r)
1124 | option3 = sum(arr)
1125 | return max(option1, option2, option3)
1126 | return max_laminal_sum_rec(0, len(arr))
1127 | ```
1128 |
1129 |
1130 | ## 34. Linked Lists
1131 |
1132 | ```py
1133 | class Node:
1134 | def __init__(self, val):
1135 | self.val = val
1136 | self.prev = None
1137 | self.next = None
1138 |
1139 | def add_to_end(head, val):
1140 | cur = head
1141 | while cur.next:
1142 | cur = cur.next
1143 | cur.next = Node(val)
1144 |
1145 | class SinglyLinkedList:
1146 | def __init__(self):
1147 | self.head = None
1148 | self._size = 0
1149 | def size(self):
1150 | return self._size
1151 | def push_front(self, val):
1152 | new = Node(val)
1153 | new.next = self.head
1154 | self.head = new
1155 | self._size += 1
1156 | def pop_front(self):
1157 | if not self.head:
1158 | return None
1159 | val = self.head.val
1160 | self.head = self.head.next
1161 | self._size -= 1
1162 | return val
1163 | def push_back(self, val):
1164 | new = Node(val)
1165 | self._size += 1
1166 | if not self.head:
1167 | self.head = new
1168 | return
1169 | cur = self.head
1170 | while cur.next:
1171 | cur = cur.next
1172 | cur.next = new
1173 | def pop_back(self):
1174 | if not self.head:
1175 | return None
1176 | self._size -= 1
1177 | if not self.head.next:
1178 | val = self.head.val
1179 | self.head = None
1180 | return val
1181 | cur = self.head
1182 | while cur.next and cur.next.next:
1183 | cur = cur.next
1184 | val = cur.next.val
1185 | cur.next = None
1186 | return val
1187 | def contains(self, val):
1188 | cur = self.head
1189 | while cur:
1190 | if cur.val == val:
1191 | return cur
1192 | cur = cur.next
1193 | return None
1194 |
1195 | class Node:
1196 | def __init__(self, val):
1197 | self.val = val
1198 | self.next = None
1199 | class Queue:
1200 | def __init__(self):
1201 | self.head = None
1202 | self.tail = None
1203 | self._size = 0
1204 | def empty(self):
1205 | return not self.head
1206 | def size(self):
1207 | return self._size
1208 | def push(self, val):
1209 | new = Node(val)
1210 | if self.tail:
1211 | self.tail.next = new
1212 | self.tail = new
1213 | if not self.head:
1214 | self.head = new
1215 | self._size += 1
1216 | def pop(self):
1217 | if self.empty():
1218 | raise IndexError('empty queue')
1219 | val = self.head.val
1220 | self.head = self.head.next
1221 | if not self.head:
1222 | self.tail = None
1223 | self._size -= 1
1224 | return val
1225 |
1226 | def copy_list(head):
1227 | if not head:
1228 | return None
1229 | new_head = Node(head.val)
1230 | cur_new = new_head
1231 | cur_old = head.next
1232 | while cur_old:
1233 | cur_new.next = Node(cur_old.val)
1234 | cur_new = cur_new.next
1235 | cur_old = cur_old.next
1236 | return new_head
1237 |
1238 | def reverse_list(head):
1239 | prev = None
1240 | cur = head
1241 | while cur:
1242 | nxt = cur.next
1243 | cur.next = prev
1244 | prev = cur
1245 | cur = nxt
1246 | return prev
1247 |
1248 | def reverse_section(head, left, right):
1249 | dummy = Node(0)
1250 | dummy.next = head
1251 | # find nodes before and after section
1252 | if left == 0:
1253 | prev = dummy
1254 | else:
1255 | prev = node_at_index(head, left - 1)
1256 | if not prev or not prev.next:
1257 | # nothing to reverse
1258 | return head
1259 | nxt = node_at_index(head, right + 1) # may be none
1260 | # break out section
1261 | section_head = prev.next
1262 | prev.next = None
1263 | section_tail = section_head
1264 | while section_tail.next != nxt:
1265 | section_tail = section_tail.next
1266 | section_tail.next = None
1267 | # reverse section, same as reverse linked list solution
1268 | old_section_head = section_head
1269 | new_section_head = reverse_list(section_head)
1270 | # reattach section
1271 | prev.next = new_section_head
1272 | old_section_head.next = nxt
1273 | return dummy.next
1274 |
1275 | def has_cycle(head):
1276 | slow, fast = head, head
1277 | while fast and fast.next:
1278 | slow = slow.next
1279 | fast = fast.next.next
1280 | if slow == fast:
1281 | return True
1282 | return False
1283 |
1284 | def convert_to_array(self, node):
1285 | cur = node
1286 | while cur.prev:
1287 | cur = cur.prev
1288 | res = []
1289 | while cur:
1290 | res.append(cur.val)
1291 | cur = cur.next
1292 | return res
1293 |
1294 | def get_middle(head):
1295 | slow, fast = head, head
1296 | while fast and fast.next:
1297 | slow = slow.next
1298 | fast = fast.next.next
1299 | return slow
1300 |
1301 | def remove_kth_node(head, k):
1302 | if not head:
1303 | return None
1304 | dummy = Node(0)
1305 | dummy.next = head
1306 | fast = dummy
1307 | slow = dummy
1308 | for _ in range(k):
1309 | fast = fast.next
1310 | while fast and fast.next:
1311 | fast = fast.next
1312 | slow = slow.next
1313 | slow.next = slow.next.next
1314 | return dummy.next
1315 |
1316 | def merge(head1, head2):
1317 | dummy = Node(0)
1318 | cur = dummy
1319 | p1, p2 = head1, head2
1320 | while p1 and p2:
1321 | cur.next = p1
1322 | cur = cur.next
1323 | p1 = p1.next
1324 | cur.next = p2
1325 | p2 = p2.next
1326 | cur = cur.next
1327 | if p1:
1328 | cur.next = p1
1329 | else:
1330 | cur.next = p2
1331 | return dummy.next
1332 |
1333 | def remove_duplicates(head):
1334 | cur = head
1335 | while cur and cur.next:
1336 | if cur.val == cur.next.val:
1337 | cur.next = cur.next.next
1338 | else:
1339 | cur = cur.next
1340 | return head
1341 | ```
1342 |
1343 |
1344 | ## 35. Trees
1345 |
1346 | ```py
1347 | # DFS
1348 | class Node:
1349 | def __init__(self, val, left=None, right=None):
1350 | self.val = val
1351 | self.left = left
1352 | self.right = right
1353 | def is_leaf(node):
1354 | if not node:
1355 | return False
1356 | return not node.left and not node.right
1357 | def children_values(node):
1358 | if not node:
1359 | return []
1360 | values = []
1361 | if node.left:
1362 | values.append(node.left.val)
1363 | if node.right:
1364 | values.append(node.right.val)
1365 | return values
1366 | def grandchildren_values(node):
1367 | if not node:
1368 | return []
1369 | values = []
1370 | for child in [node.left, node.right]:
1371 | if child and child.left:
1372 | values.append(child.left.val)
1373 | if child and child.right:
1374 | values.append(child.right.val)
1375 | return values
1376 | def subtree_size(node):
1377 | if not node:
1378 | return 0
1379 | left_size = subtree_size(node.left)
1380 | right_size = subtree_size(node.right)
1381 | return left_size + right_size + 1 # 1 for node
1382 | def subtree_height(node):
1383 | if not node:
1384 | return 0
1385 | left_height = subtree_height(node.left)
1386 | right_height = subtree_height(node.right)
1387 | return max(left_height, right_height) + 1
1388 |
1389 | class Node:
1390 | def __init__(self, id, parent, left, right):
1391 | self.id = id
1392 | self.parent = parent
1393 | self.left = left
1394 | self.right = right
1395 | def is_root(node):
1396 | return not node.parent
1397 | def ancestor_ids(node):
1398 | ids = []
1399 | while node.parent:
1400 | node = node.parent
1401 | ids.append(node.id)
1402 | return ids
1403 | def depth(node):
1404 | res = 0
1405 | while node.parent:
1406 | node = node.parent
1407 | res += 1
1408 | return res
1409 | def LCA(node1, node2):
1410 | depth1 = depth(node1)
1411 | depth2 = depth(node2)
1412 | while depth1 > depth2:
1413 | node1 = node1.parent
1414 | depth1 -= 1
1415 | while depth2 > depth1:
1416 | node2 = node2.parent
1417 | depth2 -= 1
1418 | while node1.id != node2.id:
1419 | node1 = node1.parent
1420 | node2 = node2.parent
1421 | return node1.id
1422 | def distance(node1, node2):
1423 | lca_id = LCA(node1, node2)
1424 | dist = 0
1425 | while node1.id != lca_id:
1426 | dist += 1
1427 | node1 = node1.parent
1428 | while node2.id != lca_id:
1429 | dist += 1
1430 | node2 = node2.parent
1431 | return dist
1432 | def size(node):
1433 | if not node:
1434 | return 0
1435 | return size(node.left) + size(node.right) + 1
1436 | def preorder(root):
1437 | if not root:
1438 | return
1439 | print(root.val)
1440 | preorder(root.left)
1441 | preorder(root.right)
1442 | def inorder(root):
1443 | if not root:
1444 | return
1445 | inorder(root.left)
1446 | print(root.val)
1447 | inorder(root.right)
1448 | def postorder(root):
1449 | if not root:
1450 | return
1451 | postorder(root.left)
1452 | postorder(root.right)
1453 | print(root.val)
1454 | def visit(node, info_passed_down):
1455 | if base_case:
1456 | return info_to_pass_up
1457 | a = vist(node.left, info_to_pass_down)
1458 | b = visit(node.right, info_to_pass_down)
1459 | global_state = info_stored_globally
1460 | return info_to_pass_up
1461 |
1462 | def longest_aligned_chain(root):
1463 | res = 0
1464 | def visit(node, depth): # inner recursive function
1465 | nonlocal res # to make res visible inside visit()
1466 | if not node:
1467 | return 0
1468 | left_chain = visit(node.left, depth + 1)
1469 | right_chain = visit(node.right, depth + 1)
1470 | current_chain = 0
1471 | if node.val == depth:
1472 | current_chain = 1 + max(left_chain, right_chain)
1473 | res = max(res, current_chain)
1474 | return current_chain
1475 | visit(root, 0) # trigger DFS, which updates global res
1476 | return res
1477 |
1478 | def hidden_message(root):
1479 | message = []
1480 | def visit(node):
1481 | if not node:
1482 | return
1483 | if node.text[0] == 'b':
1484 | message.append(node.text[1])
1485 | visit(node.left)
1486 | visit(node.right)
1487 | elif node.text[0] == 'i':
1488 | visit(node.left)
1489 | message.append(node.text[1])
1490 | visit(node.right)
1491 | else:
1492 | visit(node.left)
1493 | visit(node.right)
1494 | message.append(node.text[1])
1495 | visit(root)
1496 | return ''.join(message)
1497 |
1498 | def most_stacked(root):
1499 | pos_to_count = dict()
1500 | def visit(node, r, c):
1501 | if not node:
1502 | return
1503 | if (r, c) not in pos_to_count:
1504 | pos_to_count[(r, c)] = 0
1505 | pos_to_count[(r, c)] += 1
1506 | visit(node.left, r + 1, c)
1507 | visit(node.right, r, c + 1)
1508 | visit(root, 0, 0)
1509 | return max(pos_to_count.values())
1510 |
1511 | def invert(root):
1512 | if not root:
1513 | return None
1514 | root.left, root.right = invert(root.right), invert(root.left)
1515 | return root
1516 |
1517 | def evaluate(root):
1518 | if root.kind == "num":
1519 | return root.num
1520 | children_evals = []
1521 | for child in root.children:
1522 | children_evals.append(evaluate(child))
1523 | if root.kind == "sum":
1524 | return sum(children_evals)
1525 | if root.kind == "product":
1526 | return product(children_evals)
1527 | if root.kind == "max":
1528 | return max(children_evals)
1529 | if root.kind == "min":
1530 | return min(children_evals)
1531 | raise ValueError('invalid node kind')
1532 |
1533 | # BFS
1534 | def level_order(root):
1535 | Q = Queue()
1536 | Q.add(root)
1537 | while not Q.empty():
1538 | node = Q.pop()
1539 | if not node:
1540 | continue
1541 | print(node.eval)
1542 | Q.add(node.left)
1543 | Q.add(node.right)
1544 |
1545 | def node_depth_queue_recipe(root):
1546 | Q = Queue()
1547 | Q.add((root, 0))
1548 | while not Q.empty():
1549 | node, depth = Q.pop()
1550 | if not node:
1551 | continue
1552 | # do something with node and depth
1553 | Q.add((node.left, depth+1))
1554 | Q.add((node.right, depth+1))
1555 |
1556 | def left_view(root):
1557 | if not root:
1558 | return []
1559 | Q = Queue()
1560 | Q.add((root, 0))
1561 | res = [root.val]
1562 | current_depth = 0
1563 | while not Q.empty():
1564 | node, depth = Q.pop()
1565 | if not node:
1566 | continue
1567 | if depth == current_depth + 1:
1568 | res.append(node.val)
1569 | current_depth += 1
1570 | Q.add((node.left, depth+1))
1571 | Q.add((node.right, depth+1))
1572 | return res
1573 |
1574 | def level_counts(root):
1575 | Q = Queue()
1576 | Q.add((root, 0))
1577 | level_count = defaultdict(int)
1578 | while not Q.empty():
1579 | node, depth = Q.pop()
1580 | if not node:
1581 | continue
1582 | level_count[depth] += 1
1583 | Q.add((node.left, depth + 1))
1584 | Q.add((node.right, depth + 1))
1585 | return level_count
1586 | def most_prolific_level(root):
1587 | level_count = level_counts(root)
1588 | res = -1
1589 | max_prolificness = -1 # less than any valid prolificness
1590 | for level in level_count:
1591 | if level + 1 not in level_count:
1592 | continue
1593 | prolificness = level_count[level + 1] / level_count[level]
1594 | if prolificness > max_prolificness:
1595 | max_prolificness = prolificness
1596 | res = level
1597 | return res
1598 |
1599 | def zig_zag_order(root):
1600 | res = []
1601 | Q = Queue()
1602 | Q.add((root, 0))
1603 | cur_level = []
1604 | cur_depth = 0
1605 | while not Q.empty():
1606 | node, depth = Q.pop()
1607 | if not node:
1608 | continue
1609 | if depth > cur_depth:
1610 | if cur_depth % 2 == 0:
1611 | res += cur_level
1612 | else:
1613 | res += cur_level[::-1] # reverse order
1614 | cur_level = []
1615 | cur_depth = depth
1616 | cur_level.append(node)
1617 | Q.add((node.left, depth + 1))
1618 | Q.add((node.right, depth + 1))
1619 | if cur_depth % 2 == 0: # add last level
1620 | res += cur_level
1621 | else:
1622 | res += cur_level[::-1]
1623 | return res
1624 |
1625 | # Binary Search Tree
1626 | def find(root, target):
1627 | cur_node = root
1628 | while cur_node:
1629 | if cur_node.val == target:
1630 | return True
1631 | elif cur_node.val > target:
1632 | cur_node = cur_node.left
1633 | else:
1634 | cur_node = cur_node.right
1635 | return False
1636 |
1637 | def find_closest(root, target):
1638 | cur_node = root
1639 | next_above, next_below = math.inf, -math.inf
1640 | while cur_node:
1641 | if cur_node.val == target:
1642 | return cur_node.val
1643 | elif cur_node.val > target:
1644 | next_above = cur_node.val
1645 | cur_node = cur_node.left
1646 | else:
1647 | next_below = cur_node.val
1648 | cur_node = cur_node.right
1649 | if next_above - target < target - next_below:
1650 | return next_above
1651 | return next_below
1652 |
1653 | def is_bst(root):
1654 | prev_value = -math.inf
1655 | res = True
1656 | def visit(node):
1657 | nonlocal prev_value, res
1658 | if not node or not res:
1659 | return
1660 | visit(node.left)
1661 | if node.val < prev_value:
1662 | res = False
1663 | else:
1664 | prev_value = node.val
1665 | visit(node.right)
1666 | visit(root)
1667 | return res
1668 | ```
1669 |
1670 |
1671 | ## 36. Graphs
1672 |
1673 | ```py
1674 | def num_nodes(graph):
1675 | return len(graph)
1676 | def num_edges(graph):
1677 | count = 0
1678 | for node in range(len(graph)):
1679 | count += len(graph[node])
1680 | return count // 2 # halved because we counted each edge from both endpoints
1681 | def degree(graph, node):
1682 | return len(graph[node])
1683 | def print_neighbors(graph, node):
1684 | for nbr in graph[node]:
1685 | print(nbr)
1686 | def build_adjency_list(V, edge_list):
1687 | graph = [[] for _ in range(V)]
1688 | for node1, node2 in edge_list:
1689 | graph[node1].append(node2)
1690 | graph[node2].append(node1)
1691 | return graph
1692 | def adjacent(graph, node1, node2):
1693 | for nbr in graph[node1]:
1694 | if nbr == node2: return True
1695 | return False
1696 |
1697 | def validate(graph):
1698 | V = len(graph)
1699 | for node in range(V):
1700 | seen = get()
1701 | for nbr in graph[node]:
1702 | if nbr < 0 or nbr >= V: return False # invalid node index
1703 | if nbr == node: return False # self-loop
1704 | if nbr in seen: return False # parallel edge
1705 | seen.add(nbr)
1706 | edges = set()
1707 | for node1 in range(V):
1708 | for node2 in graph[node1]:
1709 | edge = (min(node1, node2), max(node1, node2))
1710 | if edge in edges:
1711 | edges.remove(edge)
1712 | else:
1713 | edges.add(edge)
1714 | return len(edges) == 0
1715 |
1716 | def graph_dfs(graph, start):
1717 | visited = {start}
1718 | def visit(node):
1719 | # do something
1720 | for nbr in graph[node]:
1721 | if not nbr in visited:
1722 | visited.add(nbr)
1723 | visit(nbr)
1724 | visit(start)
1725 |
1726 | def tree_dfs(root):
1727 | def visit(node):
1728 | # do something
1729 | if root.left:
1730 | visit(root.left)
1731 | if root.right:
1732 | visit(root.right)
1733 | if root:
1734 | visit(root)
1735 |
1736 | def count_connected_components(graph):
1737 | count = 0
1738 | visited = set()
1739 | for node in range(len(graph)):
1740 | if node not in visited:
1741 | visited.add(node)
1742 | visit(node)
1743 | count += 1
1744 | return count
1745 |
1746 | def path(graph, node1, node2):
1747 | predecessors = {node2: None} # starting node doesn't have predecessor
1748 | def visit(node):
1749 | for nbr in graph[node]:
1750 | if nbr not in predecessors:
1751 | predecessors[nbr] = node
1752 | visit(nbr)
1753 | visit(node2)
1754 | if node1 not in predecessors:
1755 | return [] # node1 node2 disconnected
1756 | path = [node1]
1757 | while path[len(path) - 1] != node2:
1758 | path.append(predecessors[path[len(path) - 1]])
1759 | return path
1760 |
1761 | def is_tree(graph):
1762 | predecessors = {0: None} # start from node 0 (doesn't matter)
1763 | found_cycle = False
1764 | def visit(node):
1765 | nonlocal found_cycle
1766 | if found_cycle:
1767 | return
1768 | for nbr in graph[node]:
1769 | if nbr not in predecessors:
1770 | predecessors[nbr] = node
1771 | visit(nbr)
1772 | elif nbr != predecessors[node]:
1773 | found_cycle = True
1774 | visit(0)
1775 | connected = len(predecessors) == len(graph)
1776 | return not found_cycle and connected
1777 |
1778 | def connected_component_queries(graph, queries):
1779 | node_to_cc = {}
1780 | def visit(node, cc_id):
1781 | if node in node_to_cc:
1782 | return
1783 | node_to_cc[node] = cc_id
1784 | for nbr in graph[node]:
1785 | visit(nbr, cc_id)
1786 | cc_id = 0
1787 | for node in range(len(graph)):
1788 | if node not in node_to_cc:
1789 | visit(node, cc_id)
1790 | cc_id += 1
1791 | res = []
1792 | for node1, node2 in queries:
1793 | res.append(node_to_cc[node1] == node_to_cc[node2])
1794 | return res
1795 |
1796 | def strongly_connected(graph):
1797 | V = len(graph)
1798 | visited = set()
1799 | visit(graph, visited, 0)
1800 | if len(visited) < V:
1801 | return False
1802 | reverse_graph = [[] for _ in range(V)]
1803 | for node in range(V):
1804 | for nbr in graph[node]:
1805 | reverse_graph[nbr].append(node)
1806 | reverse_visited = set()
1807 | visit(reverse_graph, reverse_visited, 0)
1808 | return len(reverse_visited) == V
1809 |
1810 | def max_hilliness(graph, heights):
1811 | node_to_cc = label_nodes_with_cc_ids(graph)
1812 | V = len(graph)
1813 | cc_to_elevation_gain_sum = {}
1814 | cc_to_num_edges = {}
1815 | for node in range(V):
1816 | cc = node_to_cc[node]
1817 | if cc not in cc_to_num_edges:
1818 | cc_to_elevation_gain_sum[cc] = 0
1819 | cc_to_num_edges[cc] = 0
1820 | for nbr in graph[node]:
1821 | if nbr > node:
1822 | cc_to_num_edges[cc] += 1
1823 | cc_to_elevation_gain_sum[cc] += abs(heights[node] - heights[nbr])
1824 | res = 0
1825 | for cc in cc_to_num_edges:
1826 | res = max(res, cc_to_elevation_gain_sum[cc] / cc_to_num_edges[cc])
1827 |
1828 | def first_time_all_connected(V, cables):
1829 | def visit(graph, visited, node):
1830 | for nbr in graph[node]:
1831 | if nbr not in visited:
1832 | visited.add(nbr)
1833 | visit(graph, visited, nbr)
1834 |
1835 | def is_before(cable_index):
1836 | graph = [[] for _ in range(V)]
1837 | for i in range(cable_index + 1):
1838 | node1, node2 = cables[i]
1839 | graph[node1].append(node2)
1840 | graph[node2].append(node1)
1841 | visited = {0}
1842 | visit(graph, visited, 0)
1843 | return len(visited) < V
1844 | l, r = 0, len(cables) - 1
1845 | if is_before(r):
1846 | return -1
1847 | while r - l > 1:
1848 | mid = l + (r - l) // 2
1849 | if is_before(mid):
1850 | l = mid
1851 | else:
1852 | r = mid
1853 | return r
1854 |
1855 | # BFS
1856 | def graph_bfs(graph, start):
1857 | Q = Queue()
1858 | Q.push(start)
1859 | distances = {start: 0}
1860 | while not Q.empty():
1861 | node = Q.pop()
1862 | for nbr in graph[node]:
1863 | if nbr not in distances:
1864 | distances[nbr] = distances[node] + 1
1865 | Q.push(nbr)
1866 | # do something
1867 |
1868 | def tree_bfs(root):
1869 | Q = Queue()
1870 | Q.push(root)
1871 | while not Q.empty():
1872 | node = Q.pop()
1873 | if not node:
1874 | continue
1875 | # do something
1876 | Q.push(node.left)
1877 | Q.push(node.right)
1878 |
1879 | def shortest_path_queries(graph, start, queries):
1880 | Q = Queue()
1881 | Q.push(start)
1882 | predecessors = {start: None}
1883 | while not Q.empty():
1884 | node = Q.pop()
1885 | for nbr in graph[node]:
1886 | if nbr not in predecessors:
1887 | predecessors[nbr] = node
1888 | Q.push(nbr)
1889 | res = []
1890 | for node in queries:
1891 | if node not in predecessors:
1892 | res.append([])
1893 | else:
1894 | path = [node]
1895 | while path[len(path) - 1] != start:
1896 | path.append(predecessors[path[len(path) - 1]])
1897 | path.reverse()
1898 | res.append(path)
1899 | return res
1900 |
1901 | def walking_distance_to_coffee(graph, node1, node2, node3):
1902 | distances1 = bfs(graph, node1) # BFS
1903 | distances2 = bfs(graph, node2)
1904 | distances3 = bfs(graph, node3)
1905 | res = math.inf
1906 | for i in range(len(graph)):
1907 | res = min(res, distances1[i] + distances2[i] + distances3[i])
1908 | return res
1909 |
1910 | def multisource_bfs(graph, sources):
1911 | Q = Queue()
1912 | distances = {}
1913 | for start in sources:
1914 | Q.push(start)
1915 | distances[start] = 0
1916 | while not Q.empty(): # BFS
1917 | node = Q.pop()
1918 | for nbr in graph[node]:
1919 | if nbr not in distances:
1920 | distances[nbr] = distances[node] + 1
1921 | Q.push(nbr)
1922 | # do something
1923 |
1924 | def grid_dfs(grid, start_r, start_c):
1925 | # returns if (r, c) is in bounds, not visited, and walkable
1926 | def is_valid(r, c):
1927 | # do something
1928 | directions = [(-1, 0), (1, 0), (0, 1), (0, -1)]
1929 | visited = {(start_r, start_c)}
1930 | def visit(r, c):
1931 | # do something with (r, c)
1932 | for dir_r, dir_c in directions:
1933 | nbr_r, nbr_c = r + dir_r, c + dir_c
1934 | if is_valid(nbr_r, nbr_c):
1935 | visited.add((nbr_r, nbr_c))
1936 | visit(nbr_r, nbr_c)
1937 | visit(start_r, start_c)
1938 |
1939 | def grid_bfs(grid, start_r, start_c):
1940 | # returns if (r, c) is in bounds, not visited, and walkable
1941 | def is_valid(r, c):
1942 | # do something
1943 | directions = [(-1, 0), (1, 0), (0, 1), (0, -1)]
1944 | Q = Queue()
1945 | Q.push((start_r, start_c))
1946 | distances = {(start_r, start_c): 0}
1947 | while not Q.empty():
1948 | r, c = Q.pop()
1949 | for dir_r, dir_c in directions:
1950 | nbr_r, nbr_c = r + dir_r, c + dir_c
1951 | if is_valid(nbr_r, nbr_c):
1952 | distances[(nbr_r, nbr_c)] = distances[(r, c)] + 1
1953 | Q.push((nbr_r, nbr_c))
1954 | # do something with distances
1955 |
1956 | def count_islands(grid):
1957 | R, C = len(grid), len(grid[0])
1958 | count = 0
1959 | visited = set()
1960 | for r in range(R):
1961 | for c in range(C):
1962 | if grid[r][c] == 1 and (r, c) not in visited:
1963 | visited.add((r, c))
1964 | dfs(grid, visited, r, c) # normal grid DFS
1965 | count += 1
1966 | return count
1967 |
1968 | def exit_distances(maze):
1969 | R, C = len(maze), len(maze[0])
1970 | directions = [(-1, 0), (1, 0), (0, 1), (0, -1)]
1971 | distances = [[-1] * C for _ in range(R)]
1972 | Q = Queue()
1973 | for r in range(R):
1974 | for c in range(C):
1975 | if maze[r][c] == 'o':
1976 | distances[r][c] = 0
1977 | Q.push((r, c))
1978 | while not Q.empty():
1979 | r, c = Q.pop()
1980 | for dir_r, dir_c in directions:
1981 | nbr_r, nbr_c = r + dir_r, c + dir_c
1982 | if (0 <= nbr_r < R and 0 <= nbr_c < C and
1983 | maze[nbr_r][nbr_c] != 'x' and distances[nbr_r][nbr_c] == -1):
1984 | distances[nbr_r][nbr_c] = distances[r][c] + 1
1985 | Q.push((nbr_r, nbr_c))
1986 | return distances
1987 |
1988 | def segment_distance(min1, max1, min2, max2):
1989 | return max(0, max(min1, min2) - min(max1, max2))
1990 | def distance(furniture1, furniture2):
1991 | x_min1, y_min1, x_max1, y_max1 = furniture1
1992 | x_min2, y_min2, x_max2, y_max2 = furniture2
1993 | x_gap = segment_distance(x_min1, x_max1, x_min2, x_max2)
1994 | y_gap = segment_distance(y_min1, y_max1, y_min2, y_max2)
1995 | if x_gap == 0:
1996 | return y_gap
1997 | elif y_gap == 0:
1998 | return x_gap
1999 | else:
2000 | return math.sqrt(x_gap ** 2 + y_gap ** 2)
2001 | def can_reach(furniture, d):
2002 | V = len(furniture)
2003 | graph = [[] for _ in range(V)]
2004 | for i in range(V):
2005 | for j in range(i + 1, V):
2006 | if distance(furniture[i], furniture[j]) <= d:
2007 | graph[i].append(j)
2008 | graph[j].append(i)
2009 | visited = {0}
2010 | def visit(node): # DFS
2011 | # ...
2012 | visit(0)
2013 | return V-1 in visited
2014 | ```
2015 |
2016 |
2017 | ## 37. Heaps
2018 |
2019 | ```py
2020 | def first_k(arr, k):
2021 | arr.sort()
2022 | return arr[:k]
2023 | def first_k_min_heap(arr, k):
2024 | min_heap = Heap(priority_comparator=lambda x, y: x < y, heap=arr)
2025 | res = []
2026 | for i in range(k):
2027 | res.append(min_heap.pop())
2028 | return res
2029 |
2030 | def parent(idx):
2031 | if idx == 0:
2032 | return -1 # root has no parent
2033 | return (idx - 1) // 2
2034 | def left_child(idx):
2035 | return 2 * idx + 1
2036 | def right_child(idx):
2037 | return 2 * idx + 2
2038 |
2039 | class Heap:
2040 | # if higher_priority(x, y) is True, x has higher priority than y
2041 | def __init__(self, higher_priority=lambda x, y: x < y, heap=None):
2042 | self.heap = []
2043 | if heap is not None:
2044 | self.heap = heap
2045 | self.heap = heap if heap is not None else []
2046 | self.higher_priority = higher_priority
2047 | if heap:
2048 | self.heapify()
2049 | def size(self):
2050 | return len(self.heap)
2051 | def top(self):
2052 | if not self.heap:
2053 | return None
2054 | return self.heap[0]
2055 | def push(self, elem):
2056 | self.heap.append(elem)
2057 | self.bubble_up(len(self.heap)-1)
2058 | def bubble_up(self, idx):
2059 | if idx == 0:
2060 | return # root can't be bubbled up
2061 | parent_idx = parent(idx)
2062 | if self.higher_priority(self.heap[idx], self.heap[parent_idx]):
2063 | self.heap[idx], self.heap[parent_idx] = self.heap[parent_idx], self.heap[idx]
2064 | self.bubble_up[parent_idx]
2065 | def pop(self):
2066 | if not self.heap: return None
2067 | top = self.heap[0]
2068 | if len(self.heap) == 1:
2069 | self.heap = []
2070 | return top
2071 | self.heap[0] = self.heap[-1]
2072 | self.heap.pop()
2073 | self.bubble_down(0)
2074 | return top
2075 | def bubble_down(self, idx):
2076 | l_i, r_i = left_child(idx), right_child(idx)
2077 | is_leaf = l_i >= len(self.heap)
2078 | if is_leaf: return # leaves can't be bubbled down
2079 | child_i = l_i # index for highest priority child
2080 | if r_i < len(self.heap) and self.higher_priority(self.heap[r_i], self.heap[l_i]):
2081 | child_i = r_i
2082 | if self.higher_priority(self.heap[child_i], self.heap[idx]):
2083 | self.heap[idx], self.heap[child_i] = self.heap[child_i], self.heap[idx]
2084 | self.bubble_down(child_i)
2085 | def heapify(self):
2086 | for idx in range(len(self.heap) // 2, -1, -1):
2087 | self.bubble_down(idx)
2088 |
2089 | def heapsort(arr):
2090 | min_heap = Heap(priority_comparator=lambda x, y: x < y, heap=arr)
2091 | res = []
2092 | for _ in range(len(arr)):
2093 | res.append(min_heap.pop())
2094 | return res
2095 |
2096 | class TopSongs:
2097 | def __init__(self, k):
2098 | self.k = k
2099 | self.min_heap = Heap(higher_priority=lambda x, y: x[1] < y[1])
2100 | def register_plays(self, title, plays):
2101 | if self.min_heap.size() < self.k:
2102 | self.min_heap.push((title, plays))
2103 | elif plays > self.min_heap.top()[1]:
2104 | self.min_heap.pop()
2105 | self.min_heap.push((title, plays))
2106 | def top_k(self):
2107 | top_songs = []
2108 | for title, _ in self.min_heap.heap:
2109 | top_songs.append(title)
2110 | return top_songs
2111 |
2112 | class TopSongsWithUpdates:
2113 | def __init__(self, k):
2114 | self.k = k
2115 | self.max_heap = Heap(higher_priority=lambda x, y: x[1] > y[1])
2116 | self.total_plays = {}
2117 | def register_plays(self, title, plays):
2118 | new_total_plays = plays
2119 | if title in self.total_plays:
2120 | new_total_plays += self.total_plays[title]
2121 | sekf.total_plays[title] = new_total_plays
2122 | self.max_heap.push((title, new_total_plays))
2123 | def top_k(self):
2124 | top_songs = []
2125 | while len(top_songs) < self.k and self.max_heap.size() > 0:
2126 | title, plays = self.max_heap.pop()
2127 | if self.total_plays[title] == plays: # not stale
2128 | top_songs.append(title)
2129 | # restore max-heap
2130 | for title in top_songs:
2131 | self.max_heap.push((title, self.total_plays[title]))
2132 | return top_songs
2133 |
2134 | class PopularSongs:
2135 | def __init__(self):
2136 | # max-heap for lower half
2137 | self.lower_max_heap = Heap(higher_priority=lambda x, y: x > y)
2138 | # min-heap for upper half
2139 | self.upper_min_heap = Heap()
2140 | self.play_counts = {}
2141 | def register_plays(self, title, plays):
2142 | self.play_counts[title] = plays
2143 | if self.upper_min_heap.size() == 0 or plays >= self.upper_min_heap.top():
2144 | self.upper_min_heap.push(plays)
2145 | else:
2146 | self.lower_max_heap.push(plays)
2147 | # distribute elements if they're off by more than one
2148 | if self.lower_max_heap.size() > self.upper_min_heap.size():
2149 | self.upper_min_heap.push(self.lower_max_heap.pop())
2150 | elif self.upper_min_heap.size() > self.lower_max_heap.size() + 1:
2151 | self.lower_max_heap.push(self.upper_min_heap.pop())
2152 | def is_popular(self, title):
2153 | if title not in self.play_counts:
2154 | return False
2155 | if self.lower_max_heap.size() == self.upper_min_heap.size():
2156 | median = (self.upper_min_heap.top() + self.lower_max_heap.top()) / 2
2157 | else:
2158 | median = self.upper_min_heap.top()
2159 | return self.play_counts[title] > median
2160 |
2161 | def top_k_across_genres(genres, k):
2162 | initial_elems = [] # (plays, genre_index, song_index) tuples.
2163 | for genre_index, song_list in enumerate(genres):
2164 | plays = song_list[0][1]
2165 | initial_elems.append((plays, genre_index, 0))
2166 | max_heap = Heap(higher_priority=lambda x, y: x[0] > y[0], heap=initial_elems)
2167 | top_k = []
2168 | while len(top_k) < k and max_heap.size() > 0:
2169 | plays, genre_index, song_index = max_heap.pop()
2170 | song_name = genres[genre_index][song_index][0]
2171 | top_k.append(song_name)
2172 | song_index += 1
2173 | if song_index < len(genres[genre_index]):
2174 | plays = genres[genre_index][song_index][1]
2175 | max_heap.push((plays, genre_index, song_index))
2176 | return top_k
2177 |
2178 | def make_playlist(songs):
2179 | # group songs by artist
2180 | artist_to_songs = {}
2181 | for song, artist in songs:
2182 | if artist not in artist_to_songs:
2183 | artist_to_songs[artist] = []
2184 | artist_to_songs[artist].append(song)
2185 | heap = Heap(higher_priority=lambda a, b: len(a[1]) > len(b[1]))
2186 | for artist, song_list in artist_to_songs.items():
2187 | heap.push((artist, song_list))
2188 | res = []
2189 | last_artist = None
2190 | while heap.size() > 0:
2191 | artist, song_list = heap.pop()
2192 | if artist != last_artist:
2193 | res.append(song_list.pop())
2194 | last_artist = artist
2195 | if song_list: # if artist has more songs, readd it
2196 | heap.push((artist, song_list))
2197 | else:
2198 | # find different artist
2199 | if heap.size() == 0:
2200 | return [] # no valid solution
2201 | artist2, song_list2 = heap.pop()
2202 | res.append(song_list2.pop())
2203 | last_artist = artist2
2204 | # readd artists we popped
2205 | if song_list2:
2206 | heap.push((artist2, song_list2))
2207 | heap.push((artist, song_list))
2208 | return res
2209 |
2210 | def sum_of_powers(primes, n):
2211 | m = 10**9 + 7
2212 | # initialize heap with first power of each prime
2213 | # each element is tuple (power, base)
2214 | elems = [(p, p) for p in primes]
2215 | min_heap = Heap(higher_priority=lambda x, y: x[0] < y[0], heap=elems)
2216 | res = 0
2217 | for _ in range(n):
2218 | power, base = min_heap.pop()
2219 | res = (res + power) % m
2220 | min_heap.push(((power * base) % m, base))
2221 | return res
2222 | ```
2223 |
2224 |
2225 | ## 38. Sliding Windows
2226 |
2227 | ```py
2228 | def most_weekly_sales(sales):
2229 | l, r = 0, 0
2230 | window_sum = 0
2231 | cur_max = 0
2232 | while r < len(sales):
2233 | window_sum += sales[r]
2234 | r += 1
2235 | if r - l == 7:
2236 | cur_max = max(cur_max, window_sum)
2237 | window_sum -= sales[l]
2238 | l += 1
2239 | return cur_max
2240 |
2241 | def has_unique_k_days(best_seller, k):
2242 | l, r = 0, 0
2243 | window_counts = {}
2244 | while r < len(best_seller):
2245 | if not best_seller[r] in window_counts:
2246 | window_counts[best_seller[r]] = 0
2247 | window_counts[best_seller[r]] += 1
2248 | r += 1
2249 | if r - l == k:
2250 | if len(window_counts) == k:
2251 | return True
2252 | window_counts[best_seller[l]] -= 1
2253 | if window_counts[best_seller[l]] == 0:
2254 | del window_counts[best_seller[l]]
2255 | l += 1
2256 | return False
2257 |
2258 | def max_no_bad_days(sales):
2259 | l, r = 0, 0
2260 | cur_max = 0
2261 | while r < len(sales):
2262 | can_grow = sales[r] >= 10
2263 | if can_grow:
2264 | r += 1
2265 | cur_max = max(cur_max, r - l)
2266 | else:
2267 | l = r + 1
2268 | r = r + 1
2269 | return cur_max
2270 |
2271 | def has_enduring_best_seller_streak(best_seller, k):
2272 | l, r = 0, 0
2273 | cur_max = 0
2274 | while r < len(best_seller):
2275 | can_grow = l == r or best_seller[l] == best_seller[r]
2276 | if can_grow:
2277 | r += 1
2278 | if r - l == k:
2279 | return True
2280 | else:
2281 | l = r
2282 | return False
2283 |
2284 | def max_subarray_sum(arr):
2285 | max_val = max(arr)
2286 | if max_val <= 0: # edge case without positive values
2287 | return max_val
2288 | l, r = 0, 0
2289 | window_sum = 0
2290 | cur_max = 0
2291 | while r < len(arr):
2292 | can_grow = window_sum + arr[r] >= 0
2293 | if can_grow:
2294 | window_sum += arr[r]
2295 | r += 1
2296 | cur_max = max(cur_max, window_sum)
2297 | else:
2298 | window_sum = 0
2299 | l = r + 1
2300 | r = r + 1
2301 | return cur_max
2302 |
2303 | def max_at_most_3_bad_days(sales):
2304 | l, r = 0, 0
2305 | window_bad_days = 0
2306 | cur_max = 0
2307 | while r < len(sales):
2308 | can_grow = sales[r] >= 10 or window_bad_days < 3
2309 | if can_grow:
2310 | if sales[r] < 10:
2311 | window_bad_days += 1
2312 | r += 1
2313 | cur_max = max(cur_max, r - l)
2314 | else:
2315 | if sales[l] < 10:
2316 | window_bad_days -= 1
2317 | l += 1
2318 | return cur_max
2319 |
2320 | def max_consecutive_with_k_boosts(projected_sales, k):
2321 | l, r = 0, 0
2322 | used_boosts = 0
2323 | cur_max = 0
2324 | while r < len(projected_sales):
2325 | can_grow = used_boosts + max(10 - projected_sales[r], 0) <= k
2326 | if can_grow:
2327 | used_boosts += max(10 - projected_sales[r], 0)
2328 | r += 1
2329 | cur_max = max(cur_max, r - l)
2330 | elif l == r:
2331 | r += 1
2332 | l += 1
2333 | else:
2334 | used_boosts -= max(10 - projected_sales[l], 0)
2335 | l += 1
2336 | return cur_max
2337 |
2338 | def max_at_most_k_distinct(best_seller, k):
2339 | l, r = 0, 0
2340 | window_counts = {}
2341 | cur_max = 0
2342 | while r < len(best_seller):
2343 | can_grow = best_seller[r] in window_counts or len(window_counts) + 1 <= k
2344 | if can_grow:
2345 | if not best_seller[r] in window_counts:
2346 | window_counts[best_seller[r]] = 0
2347 | window_counts[best_seller[r]] += 1
2348 | r += 1
2349 | cur_max = max(cur_max, r - l)
2350 | else:
2351 | window_counts[best_seller[l]] -= 1
2352 | if window_counts[best_seller[l]] == 0:
2353 | del window_counts[best_seller[l]]
2354 | l += 1
2355 | return cur_max
2356 |
2357 | def shortest_over_20_sales(sales):
2358 | l, r = 0, 0
2359 | window_sum = 0
2360 | cur_min = math.inf
2361 | while True:
2362 | must_grow = window_sum <= 20
2363 | if must_grow:
2364 | if r == len(sales):
2365 | break
2366 | window_sum += sales[r]
2367 | r += 1
2368 | else:
2369 | cur_min = min(cur_min, r - l)
2370 | window_sum -= sales[l]
2371 | l += 1
2372 | if cur_min == math.inf:
2373 | return -1
2374 | return cur_min
2375 |
2376 | def shortest_with_all_letters(s1, s2):
2377 | l, r = 0, 0
2378 | missing = {}
2379 | for c in s2:
2380 | if not c in missing:
2381 | missing[c] = 0
2382 | missing[c] += 1
2383 | distinct_missing = len(missing)
2384 | cur_min = math.inf
2385 | while True:
2386 | must_grow = distinct_missing > 0
2387 | if must_grow:
2388 | if r == len(s1):
2389 | break
2390 | if s1[r] in missing:
2391 | missing[s1[r]] -= 1
2392 | if missing[s1[r]] == 0:
2393 | distinct_missing -= 1
2394 | r += 1
2395 | else:
2396 | cur_min = min(cur_min, r - l)
2397 | if s1[l] in missing:
2398 | missing[s1[l]] += 1
2399 | if missing[s1[l]] == 1:
2400 | distinct_missing += 1
2401 | l += 1
2402 | return cur_min if cur_min != math.inf else -1
2403 |
2404 | def smallest_range_with_k_elements(arr, k):
2405 | arr.sort()
2406 | l, r = 0, 0
2407 | best_low, best_high = 0, math.inf
2408 | while True:
2409 | must_grow = (r - l) < k
2410 | if must_grow:
2411 | if r == len(arr):
2412 | break
2413 | r += 1
2414 | else:
2415 | if arr[r - 1] - arr[l] < best_high - best_low:
2416 | best_low, best_high = arr[l], arr[r - 1]
2417 | l += 1
2418 | return [best_low, best_high]
2419 |
2420 | def count_at_most_k_bad_days(sales, k):
2421 | l, r = 0, 0
2422 | window_bad_days = 0
2423 | count = 0
2424 | while r < len(sales):
2425 | can_grow = sales[r] >= 10 or window_bad_days < k
2426 | if can_grow:
2427 | if sales[r] < 10:
2428 | window_bad_days += 1
2429 | r += 1
2430 | count += r - l
2431 | else:
2432 | if sales[l] < 10:
2433 | window_bad_days -= 1
2434 | l += 1
2435 | return count
2436 |
2437 | def count_exactly_k_bad_days(sales, k):
2438 | if k == 0:
2439 | return count_at_most_k_bad_days(sales, 0)
2440 | return count_at_most_k_bad_days(sales, k) - count_at_most_k_bad_days(sales, k - 1)
2441 |
2442 | def count_at_least_k_bad_days(sales, k):
2443 | n = len(sales)
2444 | total_subarrays = n * (n + 1) // 2
2445 | if k == 0:
2446 | return total_subarrays
2447 | return total_subarrays - count_at_most_k_bad_days(sales, k - 1)
2448 |
2449 | def count_at_most_k_drops(arr, k):
2450 | l, r = 0, 0
2451 | window_drops = 0
2452 | count = 0
2453 | while r < len(arr):
2454 | can_grow = r == 0 or arr[r] >= arr[r - 1] or window_drops < k
2455 | if can_grow:
2456 | if r > 0 and arr[r] < arr[r - 1]:
2457 | window_drops += 1
2458 | r += 1
2459 | count += r - l
2460 | else:
2461 | if arr[l] > arr[l + 1]:
2462 | window_drops -= 1
2463 | l += 1
2464 | return count
2465 | def count_exactly_k_drops(arr, k):
2466 | if k == 0:
2467 | return count_at_least_k_drops(arr, 0)
2468 | return count_at_most_k_drops(arr, k) - count_at_most_k_drops(arr, k - 1)
2469 | def count_at_least_k_drops(arr, k):
2470 | n = len(arr)
2471 | total_count = n * (n + 1) // 2
2472 | if k == 0:
2473 | return total_count
2474 | return total_count - count_at_most_k_drops(arr, k - 1)
2475 |
2476 | def count_bad_days_range(sales, k1, k2):
2477 | if k1 == 0:
2478 | return count_at_least_k_bad_days(sales, k2)
2479 | return count_at_least_k_bad_days(sales, k2) - count_at_least_k_bad_days(sales, k1 - 1)
2480 |
2481 | def count_all_3_groups(arr):
2482 | n = len(arr)
2483 | total_count = n * (n + 1) // 2
2484 | return total_count - count_at_most_2_groups(arr)
2485 | def count_at_most_2_groups(arr):
2486 | l, r = 0, 0
2487 | window_counts = {}
2488 | count = 0
2489 | while r < len(arr):
2490 | can_grow = arr[r] % 3 in window_counts or len(window_counts) < 2
2491 | if can_grow:
2492 | if not arr[r] % 3 in window_counts:
2493 | window_counts[arr[r] % 3] = 0
2494 | window_counts[arr[r] % 3] += 1
2495 | r += 1
2496 | count += r - l
2497 | else:
2498 | window_counts[arr[l] % 3] -= 1
2499 | if window_counts[arr[l] % 3] == 0:
2500 | del window_counts[arr[l] % 3]
2501 | l += 1
2502 | return count
2503 | ```
2504 |
2505 |
2506 | ## 39. Backtracking
2507 |
2508 | ```py
2509 | def max_sum_path(grid):
2510 | # inefficient backtracking solution, DP is better
2511 | max_sum = -math.inf
2512 | R, C = len(grid), len(grid[0])
2513 | def visit(r, c, cur_sum):
2514 | nonlocal max_sum
2515 | if r == R - 1 and c == C - 1:
2516 | max_sum = max(max_sum, cur_sum)
2517 | return
2518 | if r + 1 < R:
2519 | visit(r + 1, c, cur_sum + grid[r + 1][c]) # go down
2520 | if c + 1 < C:
2521 | visit(r, c + 1, cur_sum + grid[r][c + 1]) # go right
2522 | visit(0, 0, grid[0][0])
2523 | return max_sum
2524 |
2525 | # backtracking
2526 | def visit(partial_solution):
2527 | if full_solution(partial_solution):
2528 | # process leaf/full solution
2529 | else:
2530 | for choice in choices(partial_solution):
2531 | # prune children where possible
2532 | child = apply_choice(partial_solution)
2533 | visit(child)
2534 | visit(empty_solution)
2535 |
2536 | def all_subsets(S):
2537 | res = [] # gloabl list of subsets
2538 | subset = [] # state of current partial solution
2539 | def visit(i):
2540 | if i == len(S):
2541 | res.append(subset.copy())
2542 | return
2543 | # choice 1: pick S[i]
2544 | subset.append(S[i])
2545 | visit(i + 1)
2546 | subset.pop() # cleanup work, undo choice 1
2547 | # choice 2: skip S[i]
2548 | visit(i + 1)
2549 | visit(0)
2550 | return res
2551 |
2552 | def generate_permutation(arr):
2553 | res = []
2554 | perm = arr.copy()
2555 | def visit(i):
2556 | if i == len(perm) - 1:
2557 | res.append(perm.copy())
2558 | return
2559 | for j in range(i, len(perm)):
2560 | perm[i], perm[j] = perm[j], perm[i] # pick perm[j]
2561 | visit(i + 1)
2562 | perm[i], perm[j] = perm[j], perm[i] # cleanup work, undo change
2563 | visit(0)
2564 | return res
2565 |
2566 | def generate_sentences(sentence, synonyms):
2567 | words = sentence.split()
2568 | res = []
2569 | cur_sentence = []
2570 | def visit(i):
2571 | if i == len(words):
2572 | res.append(" ".join(cur_sentence))
2573 | return
2574 | if words[i] not in synonyms:
2575 | choices = [words[i]]
2576 | else:
2577 | choices = synonyms.get(words[i])
2578 | for choice in choices:
2579 | cur_sentence.append(choice)
2580 | visit(i + 1)
2581 | cur_sentence.pop() # undo change
2582 | visit(0)
2583 | return res
2584 |
2585 | def jumping_numbers(n):
2586 | res = []
2587 | def visit(num):
2588 | if num >= n:
2589 | return
2590 | res.append(num)
2591 | last_digit = num % 10
2592 | if last_digit > 0:
2593 | visit(num * 10 + (last_digit - 1))
2594 | if last_digit < 9:
2595 | visit(num * 10 + (last_digit + 1))
2596 | for num in range(1, 10):
2597 | visit(num)
2598 | return sorted(res)
2599 |
2600 | def maximize_style(budget, prices, ratings):
2601 | best_rating_sum = 0
2602 | best_items = []
2603 | n = len(prices)
2604 | items = []
2605 | def visit(i, cur_cost, cur_rating_sum):
2606 | nonlocal best_items, best_rating_sum
2607 | if i == n:
2608 | if cur_rating_sum > best_rating_sum:
2609 | best_rating_sum = cur_rating_sum
2610 | best_items = items.copy()
2611 | return
2612 | # choice 1: skip item i
2613 | visit(i + 1, cur_cost, cur_rating_sum)
2614 | # choice 2: pick item i (if within budget)
2615 | if cur_cost + prices[i] <= budget:
2616 | items.append(i)
2617 | visit(i + 1, cur_cost + prices[i], cur_rating_sum + ratings[i])
2618 | items.pop()
2619 | visit(0, 0, 0)
2620 | return best_items
2621 | ```
2622 |
2623 |
2624 | ## 40. Dynamic Programming
2625 |
2626 | ```py
2627 | def delay(times):
2628 | n = len(times)
2629 | if n < 3:
2630 | return 0
2631 | memo = {}
2632 | def delay_rec(i):
2633 | if i >= n - 3:
2634 | return times[i]
2635 | if i in memo:
2636 | return memo[i]
2637 | memo[i] = times[i] + min(delay_rec(i + 1), delay_rec(i + 2), delay_rec(i + 3))
2638 | return memo[i]
2639 | return min(delay_rec[0], delay_rec(1), delay_rec(2))
2640 |
2641 | # memoization
2642 | # memo = empty map
2643 | # f(subproblem_id):
2644 | # if subproblem is base case:
2645 | # return result direcly
2646 | # if subproblem in memo map:
2647 | # return cached result
2648 | # memo[subproblem_id] = recurrence relation formula
2649 | # return memo[subproblem_id]
2650 | # return f(initial subproblem)
2651 |
2652 | def max_path(grid):
2653 | R, C = len(grid), len(grid[0])
2654 | memo = {}
2655 | def max_path_rec(r, c):
2656 | if r == R - 1 and c == C - 1:
2657 | return grid[r][c]
2658 | if (r, c) in memo:
2659 | return memo[(r, c)]
2660 | elif r == R - 1:
2661 | memo[(r, c)] = grid[r][c] + max_path_rec(r, c + 1)
2662 | elif c == C - 1:
2663 | memo[(r, c)] = grid[r][c] + max_path_recI(r + 1, c)
2664 | else:
2665 | memo[(r, c)] = grid[r][c] + max(max_path_rec(r + 1, c), max_path_rec(r, c + 1))
2666 | return memo[(r, c)]
2667 | return max_path_rec(0, 0)
2668 |
2669 | def min_split(arr, k):
2670 | n = len(arr)
2671 | memo = {}
2672 | def min_split_rec(i, x):
2673 | if (i, x) in memo:
2674 | return memo[(i, x)]
2675 | # base case
2676 | if n - i == x: # put each element in its own subarray
2677 | memo[(i, x)] = max(arr[i:])
2678 | elif x == 1: # put all elements in one subarray
2679 | memo[(i, x)] = sum(arr[i:])
2680 | else: # general case
2681 | current_sum = 0
2682 | res = math.inf
2683 | for p in range(i, n - x + 1):
2684 | current_sum += arr[p]
2685 | res = min(res, max(current_sum, min_split_rec(p + 1, x - 1)))
2686 | memo[(i, x)] = res
2687 | return memo[(i, x)]
2688 | return min_split_rec(0, k)
2689 |
2690 | def num_ways():
2691 | memo = {}
2692 | def num_ways_rec(i):
2693 | if i > 21:
2694 | return 1
2695 | if 16 <= i <= 21:
2696 | return 0
2697 | if i in memo:
2698 | return memo[i]
2699 | memo[i] = 0
2700 | for card in range(1, 11):
2701 | memo[i] += num_ways_rec(i + card)
2702 | return memo[i]
2703 | return num_ways_rec(0)
2704 |
2705 | def lcs(s1, s2):
2706 | memo = {}
2707 | def lcs_rec(i1, i2):
2708 | if i1 == len(s1) or i2 == len(s2):
2709 | return 0
2710 | if (i1, i2) in memo:
2711 | return memo[(i1, i2)]
2712 | if s1[i1] == s2[i2]:
2713 | memo[(i1, i2)] = 1 + lcs_rec(i1 + 1, i2 + 1)
2714 | else:
2715 | memo[(i1, i2)] = max(lcs_rec(i1 + 1, i2), lcs_rec(i1, i2 + 1))
2716 | return memo[(i1, i2)]
2717 | return lcs_rec(0, 0)
2718 |
2719 | def lcs_reconstruction(s1, s2):
2720 | memo = {}
2721 | def lcs_res(i1, i2):
2722 | if i1 == len(s1) or i2 == len(s2):
2723 | return ""
2724 | if (i1, i2) in memo:
2725 | return memo[(i1, i2)]
2726 | if s1[i1] == s2[i2]:
2727 | memo[(i1, i2)] = s1[i1] + lcs_res(i1 + 1, i2 + 1)
2728 | else:
2729 | opt1, opt2 = lcs_rec(i1 + 1, i2), lcs_res(i1, i2 + 1)
2730 | if len(opt1) >= len(opt2):
2731 | memo[(i1, i2)] = opt1
2732 | else:
2733 | memo[(i1, i2)] = opt2
2734 | return memo[(i1, i2)]
2735 | return lcs_res(0, 0)
2736 |
2737 | def lcs_reconstruction_optimal(s1, s2):
2738 | memo = {}
2739 | def lcs_rec(s1, s2):
2740 | # same as before
2741 | i1, i2 = 0, 0
2742 | res = []
2743 | while i1 < len(s1) and i2 < len(s2):
2744 | if s1[i1] == s2[i2]:
2745 | res.append(s1[i1])
2746 | i1 += 1
2747 | i2 += 1
2748 | elif lcs_rec(i1 + 1, i2) > lcs_rec(i1, i2 + 1):
2749 | i1 += 1
2750 | else:
2751 | i2 += 1
2752 | return ''.join(res)
2753 |
2754 | def delay(times):
2755 | n = len(times)
2756 | if n < 3:
2757 | return 0
2758 | dp = [0] * n
2759 | dp[n - 1], dp[n - 2], dp[n - 3] = times[n - 1], times[n - 2], times[n - 3]
2760 | for i in range(n - 4, -1, -1):
2761 | dp[i] = times[i] + min(dp[i + 1], dp[i + 2], dp[i + 3])
2762 | return min(dp[0], dp[1], dp[2])
2763 |
2764 | def delay_optimized(times):
2765 | n = len(times)
2766 | if n < 3:
2767 | return 0
2768 | dp1, dp2, dp3 = times[n - 3], times[n - 2], times[n - 1]
2769 | for i in range(n - 4, -1, -1):
2770 | cur = times[i] + min(dp1, dp2, dp3)
2771 | dp1, dp2, dp3 = cur, dp1, dp2
2772 | return min(dp1, dp2, dp3)
2773 | ```
2774 |
2775 |
2776 | ## 41. Greedy Algorithms
2777 |
2778 | ```py
2779 | def most_non_overlapping_intervals(intervals):
2780 | intervals.sort(key=lambda x: x[1])
2781 | count = 0
2782 | prev_end = -math.inf
2783 | for l, r in intervals:
2784 | if l > prev_end:
2785 | count += 1
2786 | prev_end = r
2787 | return count
2788 |
2789 | def can_reach_goal(jumping_points, k, max_aging):
2790 | n = len(jumping_points)
2791 | gaps = []
2792 | for i in range(1, n):
2793 | gaps.append(jumping_points[i] - jumping_points[i - 1])
2794 | gaps.sort()
2795 | total_aging = sum(gaps[:n - 1 - k])
2796 | return total_aging <= max_aging
2797 |
2798 | def minimize_distance(points, center1, center2):
2799 | n = len(points)
2800 | assignment = [0] * n
2801 | baseline = 0
2802 | c1_count = 0
2803 | for i, p in enumerate(points):
2804 | if dist(p, center1) <= dist(p, center2):
2805 | assignment[i] = 1
2806 | baseline += dist(p, center1)
2807 | c1_count += 1
2808 | else:
2809 | assignment[i] = 2
2810 | baseline += dist(p, center2)
2811 | if c1_count == n // 2:
2812 | return baseline
2813 | switch_costs = []
2814 | for i, p in enumerate(points):
2815 | if assignment[i] == 1 and c1_count > n // 2:
2816 | switch_costs.append(dist(p, center2) - dist(p, center1))
2817 | if assignment[i] == 2 and c1_count < n // 2:
2818 | switch_costs.append(dist(p, center1) - dist(p, center2))
2819 | res = baseline
2820 | switch_costs.sort()
2821 | for cost in switch_costs[:abs(c1_count - n // 2)]:
2822 | res += cost
2823 | return res
2824 |
2825 | def min_middle_sum(arr):
2826 | arr.sort()
2827 | middle_sum = 0
2828 | for i in range(len(arr) // 3):
2829 | middle_sum += arr[i * 2 + 1]
2830 | return middle_sum
2831 |
2832 | def min_script_runs(meetings):
2833 | meetings.sort(key=lambda x: x[1])
2834 | count = 0
2835 | prev_end = -math.inf
2836 | for l, r in meetings:
2837 | if l > prev_end:
2838 | count += 1
2839 | prev_end = r
2840 | return count
2841 |
2842 | def latest_reachable_year(jumping_points, k, max_aging):
2843 | gaps = []
2844 | for i in range(1, len(jumping_points)):
2845 | gaps.append(jumping_points[i] - jumping_points[i - 1])
2846 | min_heap = Heap()
2847 | total_gap_sum = 0
2848 | sum_heap = 0
2849 | for i, gap in enumerate(gaps):
2850 | aged = total_gap_sum - sum_heap
2851 | min_heap.push(gap)
2852 | sum_heap += gap
2853 | total_gap_sum += gap
2854 | if min_heap.size() > k:
2855 | smallest_jump = min_heap.pop()
2856 | sum_heap -= smallest_jump
2857 | new_aged = total_gap_sum - sum_heap
2858 | if new_aged > max_aging:
2859 | # we can't reach the end of gap i
2860 | # we get to jumping_points[i] and age naturally from there
2861 | remaining_aging = max_aging - aged
2862 | return jumping_points[i] + remaining_aging
2863 | # reached last jumping point
2864 | aged = total_gap_sum - sum_heap
2865 | remaining_aging = max_aging - aged
2866 | return jumping_points[len(jumping_points) - 1] + remaining_aging
2867 | ```
2868 |
2869 |
2870 | ## 42. Topological Sort
2871 |
2872 | ```py
2873 | def topological_sort(graph):
2874 | # initialization
2875 | V = len(graph)
2876 | in_degrees = [0 for _ in range(V)]
2877 | for node in range(V):
2878 | for nbr in graph[node]: # for weighted graphs, unpack edges: nbr, _
2879 | in_degrees[nbr] += 1
2880 | degree_zero = []
2881 | for node in range(V):
2882 | if in_degrees[node] == 0:
2883 | degree_zero.append(node)
2884 | # main peel-off loop
2885 | topo_order = []
2886 | while degree_zero:
2887 | node = degree_zero.pop()
2888 | topo_order.append(node)
2889 | for nbr in graph[node]: # for weighted graphs, unpack edges: nbr, _
2890 | in_degrees[nbr] -= 1
2891 | if in_degrees[nbr] == 0:
2892 | degree_zero.append(nbr)
2893 | if len(topo_order) < V:
2894 | return [] # there'a a cycle, some nodes couldn't be peeled off
2895 | return topo_order
2896 |
2897 | def distance(graph, start):
2898 | topo_order = topological_sort(graph)
2899 | distances = {start: 0}
2900 | for node in topo_order:
2901 | if node not in distance: continue
2902 | for nbr, weight in graph[node]:
2903 | if nbr not in distances or distances[node] + weight < distances[nbr]:
2904 | distances[nbr] = distances[node] + weight
2905 | res = []
2906 | for i in range(len(graph)):
2907 | if i in distances:
2908 | res.append(distances[i])
2909 | else:
2910 | res.append(math.inf)
2911 | return res
2912 |
2913 | def shortest_path(graph, start, goal):
2914 | topo_order = topological_sort(graph)
2915 | distances = {start: 0}
2916 | predecessors = {}
2917 | for node in topo_order:
2918 | if node not in distances: continue
2919 | for nbr, weight in graph[node]:
2920 | if nbr not in distances or distances[node] + weight < distance[nbr]:
2921 | distances[nbr] = distances[node] + weight
2922 | predecessors[nbr] = node
2923 | if goal not in distances:
2924 | return []
2925 | path = [goal]
2926 | while path[-1] != start:
2927 | path.append(predecessors[path[-1]])
2928 | path.reverse()
2929 | return path
2930 |
2931 | def path_count(graph, start):
2932 | topo_order = topological_sort(graph)
2933 | counts = [0] * len(graph)
2934 | counts[start] = 1
2935 | for node in topo_order:
2936 | for nbr in graph[node]:
2937 | counts[nbr] += counts[node]
2938 | return counts
2939 |
2940 | def compile_time(seconds, imports):
2941 | V = len(seconds)
2942 | graph = [[] for _ in range(V)]
2943 | for package in range(V):
2944 | for imported_package in imports[package]:
2945 | graph[imported_package].append(package)
2946 | topo_order = topological_sort(graph)
2947 | durations = {}
2948 | for node in topo_order:
2949 | if node not in durations:
2950 | durations[node] = seconds[node]
2951 | for nbr in graph[node]:
2952 | if nbr not in durations:
2953 | durations[nbr] = 0
2954 | durations[nbr] = max(durations[nbr], seconds[nbr] + durations[node])
2955 | return max(durations.values())
2956 | ```
2957 |
2958 |
2959 | ## 43. Prefix Sums
2960 |
2961 | ```py
2962 | def channel_views(views, periods):
2963 | prefix_sum = [0] * len(views)
2964 | prefix_sum[0] = views[0]
2965 | for i in range(1, len(views)):
2966 | prefix_sum[i] = prefix_sum[i - 1] + views[i]
2967 | res = []
2968 | for l, r in periods:
2969 | if l == 0:
2970 | res.append(prefix_sum[r])
2971 | else:
2972 | res.append(prefix_sum[r] - prefix_sum[l - 1])
2973 | return res
2974 |
2975 | # # initialization
2976 | # # initialize prefix_sum with the same length as input array
2977 | # prefix_sum[0] = arr[0] # at least one element
2978 | # for i from 1 to len(arr) - 1:
2979 | # prefix_sum[i] = prefix_sum[i - 1] + arr[i]
2980 | # # query: sum of subarray [l, r]
2981 | # if l == 0:
2982 | # return prefix_sum[r]
2983 | # return prefix_sum[r] - prefix_sum[l - 1]
2984 |
2985 | def good_reception_scores(likes, dislikes, periods):
2986 | positive_days = [0] * len(likes)
2987 | for i in range(likes):
2988 | if likes[i] > dislikes[i]:
2989 | positive_days[i] = 1
2990 | # build prefix sum for positive_days array and query it with each period
2991 |
2992 | def exclusive_product_array(arr):
2993 | m = 10 ** 9 + 7
2994 | n = len(arr)
2995 | prefix_product = [1] * n
2996 | prefix_product[0] = arr[0]
2997 | for i in range(1, n):
2998 | prefix_product[i] = (prefix_product[i - 1] * arr[i]) % m
2999 | postfix_product = [1] * n
3000 | postfix_product[n - 1] = arr[n - 1]
3001 | for i in range(n - 2, -1, -1):
3002 | postfix_product[i] = (postfix_product[i + 1] * arr[i]) % m
3003 | res = [1] * n
3004 | res[0] = postfix_product[1]
3005 | res[n - 1] = prefix_product[n - 2]
3006 | for i in range(1, n - 1):
3007 | res[i] = (prefix_product[i - 1] * postfix_product[i + 1]) % m
3008 | return res
3009 |
3010 | def balanced_index(arr):
3011 | prefix_sum = 0
3012 | postfix_sum = sum(arr) - arr[0]
3013 | for i in range(len(arr)):
3014 | if prefix_sum == postfix_sum:
3015 | return i
3016 | prefix_sum += arr[i]
3017 | if i + 1 < len(arr):
3018 | postfix_sum -= arr[i + 1]
3019 | return -1
3020 |
3021 | def max_total_deviation(likes, dislikes):
3022 | scores = [likes[i] - dislikes[i] for i in range(len(likes))]
3023 | scores.sort()
3024 | n = len(scores)
3025 | prefix_sum = [0] * n
3026 | prefix_sum[0] = scores[0]
3027 | for i in range(1, n):
3028 | prefix_sum[i] = prefix_sum[i - 1] + scores[i]
3029 | max_deviation = 0
3030 | for i in range(n):
3031 | left, right = 0, 0
3032 | if i > 0:
3033 | left = i * scores[i] - prefix_sum[i - 1]
3034 | if i < n - 1:
3035 | right = prefix_sum[n - 1] - prefix_sum[i] - (n - i - 1) * scores[i]
3036 | max_deviation = max(max_deviation, left + right)
3037 | return max_deviation
3038 |
3039 | def count_subarrays(arr, k):
3040 | prefix_sum = # ...
3041 | prefix_sum_to_count = {0: 1} # for empty prefix
3042 | count = 0
3043 | for val in prefix_sum:
3044 | if val - k in prefix_sum_to_count:
3045 | count += prefix_sum_to_count[val - k]
3046 | if val not in prefix_sum_to_count:
3047 | prefix_sum_to_count[val] = 0
3048 | prefix_sum_to_count[val] += 1
3049 | return count
3050 |
3051 | def longest_subarray_with_sum_k(arr, k):
3052 | prefix_sum = # ...
3053 | prefix_sum_to_index = {0: -1} # for empty prefix
3054 | res = -1
3055 | for r, val in enumerate(prefix_sum):
3056 | if val - k in prefix_sum_to_index:
3057 | l = prefix_sum_to_index[val - k]
3058 | res = max(res, r - l)
3059 | if val not in prefix_sum_to_index:
3060 | prefix_sum_to_index[val] = r
3061 | return res
3062 |
3063 | def range_updates(n, votes):
3064 | diff = [0] * n
3065 | for l, r, v in votes:
3066 | diff[l] += v
3067 | if r + 1 < n:
3068 | diff[r + 1] -= v
3069 | prefix_sum = [0] * n
3070 | prefix_sum[0] = diff[0]
3071 | for i in range(1, n):
3072 | prefix_sum[i] = prefix_sum[i - 1] + diff[i]
3073 | return prefix_sum
3074 |
3075 | def most_booked_slot(slots, bookings):
3076 | n = len(slots)
3077 | diff = [0] * n
3078 | for l, r, c in bookings:
3079 | diff[l] += c
3080 | if r + 1 < n:
3081 | diff[r + 1] -= c
3082 | prefix_sum = [0] * n
3083 | prefix_sum[0] = diff[0]
3084 | for i in range(1, n):
3085 | prefix_sum[i] = prefix_sum[i - 1] + diff[i]
3086 | max_bookings, max_index = 0, -1
3087 | for i in range(n):
3088 | total_bookings = prefix_sum[i] + slots[i]
3089 | if total_bookings > max_bookings:
3090 | max_bookings, max_index = total_bookings, i
3091 | return max_index
3092 | ```
3093 |
--------------------------------------------------------------------------------