├── Questions └── Cracking the Coding Interview │ ├── 4 Ways to Divide Data.md │ ├── Cache.md │ ├── Deadlock-Free Class.md │ ├── Duplicate URLs.md │ ├── Find Words in Millions of Documents.md │ ├── Synchronized Methods.md │ └── Thread vs. Process.md ├── Solutions └── Cracking the Coding Interview │ ├── 4 Ways to Divide Data.md │ ├── Cache.md │ ├── Deadlock-Free Class.md │ ├── Duplicate URLs.md │ ├── Find Words in Millions of Documents.md │ ├── Synchronized Methods.md │ └── Thread vs. Process.md └── readme.md /Questions/Cracking the Coding Interview/4 Ways to Divide Data.md: -------------------------------------------------------------------------------- 1 | ### Question 2 | 3 | What are 4 ways to divide data (across machines)? 4 | -------------------------------------------------------------------------------- /Questions/Cracking the Coding Interview/Cache.md: -------------------------------------------------------------------------------- 1 | ### Question 2 | 3 | Imagine a web server for a simplified search engine. This system has 100 machines to respond to search queries, which may then call out using `processSearch(String query)` to another cluster of machines to actually get the result. The machine which responds to a given query is chosen at random, so you cannot guarantee that the same machine will always respond to the same request. The method `processSearch` is very expensive. Design a caching mechanism for the most recent queries. Be sure to explain how you would update the cache when data changes. 4 | -------------------------------------------------------------------------------- /Questions/Cracking the Coding Interview/Deadlock-Free Class.md: -------------------------------------------------------------------------------- 1 | ### Question 2 | 3 | Design a class which provides a lock only if there are no possible deadlocks. 4 | -------------------------------------------------------------------------------- /Questions/Cracking the Coding Interview/Duplicate URLs.md: -------------------------------------------------------------------------------- 1 | ### Question 2 | 3 | You have 10 billion URLs. How do you detect the duplicate documents? In this case, assume "duplicate" means that the URLs are identical. 4 | -------------------------------------------------------------------------------- /Questions/Cracking the Coding Interview/Find Words in Millions of Documents.md: -------------------------------------------------------------------------------- 1 | ### Question 2 | 3 | How would you find words in millions of documents? 4 | -------------------------------------------------------------------------------- /Questions/Cracking the Coding Interview/Synchronized Methods.md: -------------------------------------------------------------------------------- 1 | ### Question 2 | 3 | You are given a class with synchronized method A and a normal method B. If you have two threads in one instance of a program, can they both execute A at the same time? Can they execute A and B at the same time? 4 | -------------------------------------------------------------------------------- /Questions/Cracking the Coding Interview/Thread vs. Process.md: -------------------------------------------------------------------------------- 1 | ### Question 2 | 3 | What's the difference between a thread and a process? 4 | -------------------------------------------------------------------------------- /Solutions/Cracking the Coding Interview/4 Ways to Divide Data.md: -------------------------------------------------------------------------------- 1 | ### Solution 2 | 3 | - 4 ways to divide data (Read book's explanation for great tips) 4 | - by order of appearance (Benefit: won't ever need more machines than necessary) 5 | - by hashing it mod number of machines (Benefit: Every machine knows exactly where data is) 6 | - by using what the actual values represent. Can group similar things together, like all people in Mexico. 7 | - arbitrarily (Benefit: better load balancing) 8 | -------------------------------------------------------------------------------- /Solutions/Cracking the Coding Interview/Cache.md: -------------------------------------------------------------------------------- 1 | ### Solution 2 | 3 | - We basically design an LRU cache 4 | - Main idea: Use linked list to keep track of popular pages. Use a HashMap (in parallel w/ the LinkedList) for fast access to LinkedList nodes (key = the url string, value = node in LinkedList) 5 | - "A linked list would allow easy purging of old data, by moving "fresh" items to the front. We could implement it to remove the last element of the linked list when the list exceeds a certain size." 6 | - Options for storing the cache 7 | 1. Each machine has it's own cache of just it's searches (lame) 8 | 1. Each machine has it's own cache of ALL machine's searches 9 | - Pro: Efficient Lookup 10 | - Con: takes up a ton of memory. Updating cache means updating it on every machine) 11 | 1. Cache is shared across machines (by hashing keys, which are queries) 12 | 1. Rodney method: Each machine has most popular searches cached. Less popular searches are shared among machines 13 | -------------------------------------------------------------------------------- /Solutions/Cracking the Coding Interview/Deadlock-Free Class.md: -------------------------------------------------------------------------------- 1 | ### Solution 2 | 3 | Implement a directed graph and use DFS/BFS to detect a cycle. 4 | 5 | A deadlock can occur if and only if we find a cycle. 6 | 7 | ### Similar Solution 8 | 9 | From MIT's "Hacking a Google Interview" 10 | 11 | How can we ensure that deadlock does not occur? 12 | 13 | There are many possible answers to this problem, but the answer the interviewer will be looking for is this: we can prevent deadlock if we assign an order to our locks and require that locks always be acquired in order. For example, if a thread needs to acquire locks 1, 5, and 2, it must acquire lock 1, followed by lock 2, followed by lock 5. That way we prevent one thread trying to acquire lock 1 then lock 2, and another thread trying to acquire lock 2 then lock 1, which could cause deadlock (Note this approach is not used very often in practice). 14 | -------------------------------------------------------------------------------- /Solutions/Cracking the Coding Interview/Duplicate URLs.md: -------------------------------------------------------------------------------- 1 | ### Solution 2 | 3 | - Simple Answer: Use HashMap to detect duplicates 4 | - 10 billion URLS at 100 characters each at 4 bytes each = 4 terabytes of information. Can't save this all in 1 file. 5 | - Create 4000 1GB files called .txt where is hash(url) % 4000. 6 | - (Although mostly uniformly distributed, some files may be bigger or smaller than 1GB since it's very unlikely we can create a perfect hash function) 7 | - Now all URLs with same hash value are in same file 8 | - (this ensures that 2 of the same key are not in different files, so our next step will successfully remove all duplicates) 9 | - Do a 2nd pass (through each of the 4000 files) and create a HashMap to detect duplicates 10 | - This 2nd pass can be done in parallel on 4000 Machines (Pro: Faster, Con: if 1 machine crashes it affects our final result) 11 | -------------------------------------------------------------------------------- /Solutions/Cracking the Coding Interview/Find Words in Millions of Documents.md: -------------------------------------------------------------------------------- 1 | ### Solution 2 | 3 | - preprocess the data with a HashMap> 4 | - Somehow divide the HashMap across machines. Can do it alphabetically by keywords. 5 | -------------------------------------------------------------------------------- /Solutions/Cracking the Coding Interview/Synchronized Methods.md: -------------------------------------------------------------------------------- 1 | From "Cracking the Coding Interview" book solution. 2 | 3 | ### Solution 4 | 5 | By applying the word synchronized to a method, we ensure that two threads cannot execute synchronized methods on the same object instance at the same time. 6 | 7 | So, the answer to the first part really depends. If the two threads have the same instance of the object, then no, they cannot simultaneously execute method A. However, if they have different instances of the object, then they can. 8 | 9 | Conceptually, you can see this by considering locks. A synchronized method applies a "lock" on all synchronized methods in that instance of the object. This blocks other threads from executing synchronized methods within that instance. 10 | 11 | In the second part, we're asked if threadl can execute synchronized method A while thread2 is executing non-synchronized method B. Since B is not synchronized, there is nothing to block threadl from executing A while thread2 is executing B. This is true regardless of whether threadl and thread2 have the same instance of the object. 12 | 13 | Ultimately, the key concept to remember is that only one synchronized method can be in execution per instance of that object. Other threads can execute non-synchronized methods on that instance, or they can execute any method on a different instance of the object. 14 | -------------------------------------------------------------------------------- /Solutions/Cracking the Coding Interview/Thread vs. Process.md: -------------------------------------------------------------------------------- 1 | From "Cracking the Coding Interview" book solution. 2 | 3 | ### Solution 4 | 5 | Processes and threads are related to each other but are fundamentally different. 6 | 7 | A process can be thought of as an instance of a program in execution. A process is an independent entity to which system resources (e.g., CPU time and memory) are allocated. Each process is executed in a separate address space, and one process cannot access the variables and data structures of another process. If a process wishes to access another process' resources, inter-process communications have to be used. These include pipes, files, sockets, and other forms. 8 | 9 | A thread exists within a process and shares the process' resources (including its heap space). Multiple threads within the same process will share the same heap space. This is very different from processes, which cannot directly access the memory of another process. Each thread still has its own registers and its own stack, but other threads can read and write the heap memory. A thread is a particular execution path of a process. When one thread modifies a process resource, the change is immediately visible to sibling threads. 10 | -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | 2 | ### Cracking the Coding Interview 3 | 4 | | Section | Question | Solution | Difficulty | 5 | |:---------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:----------:| 6 | | 9 - Intro | [4 Ways to Divide Data](https://github.com/RodneyShag/System_Design/blob/master/Questions/Cracking%20the%20Coding%20Interview/4%20Ways%20to%20Divide%20Data.md) | [Solution](https://github.com/RodneyShag/System_Design/blob/master/Solutions/Cracking%20the%20Coding%20Interview/4%20Ways%20to%20Divide%20Data.md) | Medium | 7 | | 9 - Intro | [Find Words in Millions of Documents](https://github.com/RodneyShag/System_Design/blob/master/Questions/Cracking%20the%20Coding%20Interview/Find%20Words%20in%20Millions%20of%20Documents.md) | [Solution](https://github.com/RodneyShag/System_Design/blob/master/Solutions/Cracking%20the%20Coding%20Interview/Find%20Words%20in%20Millions%20of%20Documents.md) | Easy | 8 | | 9.4 | [Duplicate URLs](https://github.com/RodneyShag/System_Design/blob/master/Questions/Cracking%20the%20Coding%20Interview/Duplicate%20URLs.md) | [Solution](https://github.com/RodneyShag/System_Design/blob/master/Solutions/Cracking%20the%20Coding%20Interview/Duplicate%20URLs.md) | Medium | 9 | | 9.5 | [Cache](https://github.com/RodneyShag/System_Design/blob/master/Questions/Cracking%20the%20Coding%20Interview/Cache.md) | [Solution](https://github.com/RodneyShag/System_Design/blob/master/Solutions/Cracking%20the%20Coding%20Interview/Cache.md) | Medium | 10 | | 15.1 | [Thread vs. Process](https://github.com/RodneyShag/System_Design/blob/master/Questions/Cracking%20the%20Coding%20Interview/Thread%20vs.%20Process.md) | [Solution](https://github.com/RodneyShag/System_Design/blob/master/Solutions/Cracking%20the%20Coding%20Interview/Thread%20vs.%20Process.md) | Medium | 11 | | 15.4 | [Deadlock-Free Class](https://github.com/RodneyShag/System_Design/blob/master/Questions/Cracking%20the%20Coding%20Interview/Deadlock-Free%20Class.md) | [Solution](https://github.com/RodneyShag/System_Design/blob/master/Solutions/Cracking%20the%20Coding%20Interview/Deadlock-Free%20Class.md) | Medium | 12 | | 15.6 | [Synchronized Methods](https://github.com/RodneyShag/System_Design/blob/master/Questions/Cracking%20the%20Coding%20Interview/Synchronized%20Methods.md) | [Solution](https://github.com/RodneyShag/System_Design/blob/master/Solutions/Cracking%20the%20Coding%20Interview/Synchronized%20Methods.md) | Easy | 13 | 14 | ### Recommended Resources 15 | 16 | - [System Design Primer](https://github.com/donnemartin/system-design-primer) - Cover the topics then dive into sample problems and their solutions. 17 | - [Grokking the System Design Interview](https://www.educative.io/collection/5668639101419520/5649050225344512). $79 course that walks you through common system design problems. 18 | - [Designing Data-Intensive Applications](https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321) - look into this to see if it's good. 19 | 20 | ### More Resources 21 | 22 | - [Grokking the Coding Interview: Patterns for Coding Questions](https://www.educative.io/courses/grokking-the-coding-interview) 23 | - [LeetCode - System Design](https://leetcode.com/discuss/interview-question/system-design?currentPage=1&orderBy=hot&query=) 24 | - [YouTube - Tech Dummies (System Design Interview Prep)](https://www.youtube.com/channel/UCn1XnDWhsLS5URXTi5wtFTA) 25 | - [LeetCode Post with Resources](https://leetcode.com/discuss/career/216554/from-0-to-clearing-uberappleamazonlinedingoogle) 26 | - [Android System Design](https://www.facebook.com/careers/life/preparing-for-your-android-engineering-interview-at-facebook) 27 | --------------------------------------------------------------------------------