├── .gitignore ├── Basic Knowledge ├── Computer Architecture │ ├── Introduction.md │ ├── Memory.md │ └── Processor.md ├── Computer Network │ ├── Application Layer.md │ ├── Introduction.md │ ├── Link Layer.md │ ├── Network Layer.md │ └── Transport Layer.md ├── Data Structure & Algorithm │ ├── Backtrack.md │ ├── Design.md │ ├── Dynamic Programming.md │ ├── Graph.md │ ├── Sorting and Searching.md │ └── Tree.md ├── Database │ ├── Data Model.md │ ├── Database Design.md │ ├── Introduction.md │ └── SQL.md ├── Information Security │ ├── Cryptography.md │ ├── Database Security.md │ └── Web Security.md ├── Machine Learning │ ├── Computer Vision.md │ ├── Deep Learning.md │ ├── Natural Language Processing.md │ ├── Recommder System.md │ ├── Reinforcement Learning.md │ └── Statistical Learning.md ├── Operating System │ ├── Device Management.md │ ├── Introduction.md │ ├── Memory Management.md │ ├── Process Management.md │ └── Thread Management.md ├── Programming Language │ ├── Golang.md │ ├── Java.md │ ├── Javascript.md │ └── Python.md └── Statistics │ ├── Probs and Stats.md │ ├── Stats Inference.md │ └── Stats Modelling.md ├── Frameworks ├── AWS │ ├── IAM.md │ ├── Lambda.md │ └── VPC.md ├── Big Data │ ├── Kafka.md │ └── Spark.md ├── DevOps │ ├── Docker.md │ ├── Git.md │ └── Kubernetes.md └── Web Development │ ├── Flask.md │ └── React.md ├── Images ├── 007D0E12-DC9B-4787-91A9-FCEA8EB25A4F.png ├── 008i3skNgy1gyu0c5s7gaj31c6092wfc.jpg ├── 008i3skNgy1gyu0c83rl1j31ci0ju76k.jpg ├── 008i3skNgy1gyu0cax566j31ac0is75t.jpg ├── 008i3skNgy1gyu0cen7olj319q0a0t9w.jpg ├── 04E687DF-CB15-4C86-A420-625B1288AEF9.png ├── 091-udp-user-datagram-protocol-01.jpeg ├── 1*0dsWFuc0pDmcAmHJUh7wqg.png ├── 1*CMGFtehu01ZEBgzHG71sMg.png ├── 1*F9QnGqmAMcXEr1lQeSiqVA.png ├── 1*HeDm_FpLVoR3d97NKdKNLw.png ├── 1*LwOBbwGXMZUy6OzkFAPTzw.png ├── 1*cuOmGT7NevP9oJFJfVpRKA.png ├── 1*kSRH4T8S1YmAuHbpgQ3Ylw.png ├── 1*u2el-HrqRPVk7x0xlvs_CA.png ├── 1*vbRXwvbk4ep_kK3dgfAWzQ.png ├── 1*x4BemYz3TpQxAiMx4E_1qQ.png ├── 1200px-Huffman_tree_2.svg.png ├── 1200px-IPv4_Packet-en.svg.png ├── 1538057889628.png ├── 157-Datalink-PDU2.jpg ├── 1620764635917.png ├── 1783_2454_329-fat32-file-system.jpg ├── 1C2567BE-17C9-43AE-98C9-A864103CCE98.png ├── 20220829220512.png ├── 20220830151759.png ├── 2335a085-5678-472d-8bf6-509f073cce6c.png ├── 2BE5A89A-9473-420B-B659-D5F010B2BB8D.png ├── 2CDNtEvHIomxnzf.png ├── 3atNUyejpIVchEF.png ├── 400px-Sample-network-diagram.png ├── 440px-Virtual_address_space_and_physical_address_space_relationship.svg.png ├── 440px-X86_Paging_PAE_4K.svg.png ├── 4F00CAC7-1C7B-4065-A509-F7B57DD7FBE3.png ├── 58182variations_comparison.png ├── 5FyWjSG1DvLVph6.png ├── 5SEPxn9AGVq2J3w.png ├── 5bc71daf871a6.png ├── 5bc71ed75cefe.png ├── 8RktEjwCXJU92Wn.png ├── 8TaCWgzYcHkMu1x.png ├── 8wsLWXR1jaAvgKC.png ├── A320790E-26BD-4370-8B87-47E08D04FA9C.png ├── AkOG1FI9uYNVSiX.png ├── BECB3588-AFBF-42F2-9A1C-22916449B1B4.png ├── BERT.png ├── BfT2ScuOlW3DY1V.png ├── BiFHO5rXQyJ9CoE.png ├── Blank diagram (1).png ├── Blank diagram-3028745.png ├── Blank diagram.png ├── Bp8QcoewPVYhF9C.png ├── CB177F61-F635-4014-B166-2C9DDFE51FA3.png ├── CBOW-Model-from-Original-Paper15.png ├── CL84sPApiofEzGX.jpg ├── CN-20231012130356517.png ├── CN.png ├── Capture-24.png ├── CbJ2PA1qd4skVv9.png ├── Cl3Mdkr1HoXfe5z.png ├── DA5651B0-8568-45A2-A26E-B5DE1F699F9B.png ├── DB8B54EB-D3C3-4EB0-A8F4-71163B1426E0.png ├── E63DEDAE-89A8-4A35-B994-24482EC5FE27.png ├── EN-tcp-20231012130628676.png ├── EN-tcp-verbindungsabbau.png ├── EN-tcp.png ├── FAwhLygaZbrdlq1.png ├── File_Hierarchy.png ├── Flowcharts (1)-3026409.png ├── Flowcharts (1).png ├── Flowcharts.png ├── GEeuqFxMY9noD2b.jpg ├── GZaKDHj4Ir9LVCF.png ├── GeLMhaPE9muIjgd.png ├── HRwP1uQhVj9zEqo.jpg ├── IErKH5JjNV8eTn3.png ├── IJcUaSYs4uQ1vVl.png ├── IbzHhrMq9UwGd25.png ├── IckhsWpVL8TDnYi.png ├── Iz7Ti-20231025130710487.png ├── Iz7Ti.png ├── JPOM4bnYzKqX9DG.png ├── LCdpFfP8twkD6S4.png ├── LeNet5-fm.png ├── NHiFXYjkBSVM49L.png ├── NZgAcEaq74b3r1e.png ├── NkWiVebrlOf4dQ2.png ├── SGHnm1TwRgBVNyF.png ├── SYNXkEyBGf6l4vW.png ├── Sbs79dz2qLZrth8.jpg ├── Screenshot-2019-07-21-at-2.57.13-AM.png ├── Screenshot-from-2021-03-18-12-47-59.png ├── Sliding-Window-Protocol.jpg ├── Sliding_SET_2-1.jpg ├── Stop-and-Wait-ARQ.png ├── TWH2.png ├── The-architecture-of-Skip-gram-model-20.png ├── U3iEf2QnpCOFqsS.png ├── UWVkO6PIpX95DA7.png ├── Untitled drawing (1).png ├── VMPcZnrig6opvI1.png ├── XjicpDbhByIVZeo.png ├── Xy32mSYOxFnrVwz.png ├── ZSzqvTtModxWU3K.png ├── alipay.jpeg ├── attention_research_1.png ├── bkNRTaSdzp87BYA.png ├── cDJTWHUgOPkS2ua.png ├── components-of-kubernetes.svg ├── dg2W549oVLIzB6r.png ├── dns-root-server.png ├── efj4eq30q56ute916yfi.png ├── git-cheatsheet-6.jpg ├── git_commands-20231005101828239.png ├── git_commands.png ├── httpatomoreillycomsourceoreillyimages9320.png ├── https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F641bd7c1-f965-4144-99c3-06d733c3be14_3486x2853.jpeg ├── illu.png ├── image-20220305124311303.png ├── image-20220305150629948.png ├── image-20220305152405197.png ├── image-20220305153830056.png ├── image-20220305161459418.png ├── image-20220305201150385.png ├── image-20221009115237311.png ├── image-20230114111840096.png ├── image-20230114114412320.png ├── image-20230114115309584.png ├── image-20230114115445001.png ├── image-20230222223859233.png ├── image-20230222224048747.png ├── image-20230225211522140.png ├── image-20230225211557063.png ├── image-20230311142440848.png ├── image-20230311142622209.png ├── image-20230311142721864.png ├── image-20230311142910320.png ├── image-20230311143019440.png ├── image-20230311143031158.png ├── image-20230315204036268.png ├── image-20230315204104180.png ├── image-20230315204420119.png ├── image-20231116101501454.png ├── image-20231116104529337.png ├── image-20231116104607353.png ├── image-20231116110410692.png ├── image-20231116112227032.png ├── image-20231121131612791.png ├── image-20231201111643716.png ├── image-20231201145740349.png ├── image-20231205150904749.png ├── image-20231205151057529.png ├── image-20231206131354092.png ├── image-20231211145359412.png ├── image-20231212143108175.png ├── image-20231212150157274.png ├── image6-20231018142135500.png ├── image6-7664108.png ├── image6.png ├── kHyHs.png ├── lQj8AhFHwJYkEoV.png ├── man-in-the-middle-mitm-attack.png ├── mbKIngJhVoMj25d.png ├── nvJEei4abws3Rkr.jpg ├── oAjQbqfurwDtsan.png ├── oIqyPETceM7xbnD.jpg ├── pNktMSJAU3rHVf7.png ├── pyQ6Z7HBngEfkdI.png ├── requirement.png ├── sOnM6eDqWA3QjZP.png ├── smtp_1.png ├── tTBlm68bDk2Vc93.png ├── tcp-header.png ├── uFptSkiLynWCjzN.png ├── vlQH1xzqCYn2F3c.png ├── what-is-apache-spark.b3a3099296936df595d9a7d3610f1a77ff0749df.png └── yHekzjfKh6taDw2.png ├── README.md └── System Design ├── IAM.md ├── Incident Insights.md ├── Key-Value Store.md ├── URL Shortener.md ├── Web Application.md └── Web Crawler.md /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | Interviews 3 | -------------------------------------------------------------------------------- /Basic Knowledge/Computer Architecture/Memory.md: -------------------------------------------------------------------------------- 1 | ## Cache 2 | 3 | ### Introduction 4 | 5 | - Cache Mechanism: Cache memory is a type of high-speed volatile computer memory that provides high-speed data access to the CPU and improves overall system performance. 6 | 7 | - **Temporal Locality:** If a piece of data is accessed, it is likely to be accessed again soon. Cache stores this data so that future accesses can be faster. 8 | 9 | - **Spatial Locality:** If a piece of data is accessed, data located near it is likely to be accessed soon. Cache loads blocks of data to take advantage of this pattern. 10 | 11 | - Cache Levels: 12 | 13 | - **L1 Cache:** The Level 1 cache is the smallest and fastest cache, located closest to the CPU. It is usually divided into separate instruction and data caches (L1i and L1d). 14 | - **L2 Cache:** The Level 2 cache is larger and slightly slower than the L1 cache. It serves as a secondary cache that stores data not found in the L1 cache. 15 | - **L3 Cache:** The Level 3 cache is larger and slower than L2 but faster than the main memory. It is shared among the cores of a multi-core processor. 16 | 17 | ### Performance 18 | 19 | **Average Memory Access Time (AMAT)** is a critical metric used to evaluate the performance of cache memory in computer systems. 20 | 21 | 1. **Hit Time (HT):** The time it takes to access data from the cache when there is a cache hit. This includes the time to determine if the data is in the cache and to read it. 22 | 2. **Miss Rate (MR):** The percentage of memory accesses that result in a cache miss. It is the ratio of the number of cache misses to the total number of memory accesses. 23 | 3. **Miss Penalty (MP):** The additional time required to fetch data from the next level of the memory hierarchy (e.g., main memory or a lower-level cache) when a cache miss occurs. 24 | 25 | $$ 26 | AMAT=Hit Time+(Miss Rate×Miss Penalty) 27 | $$ 28 | 29 | ### Organization 30 | 31 | - **Cache Size** 32 | 33 | - **Definition:** The total storage capacity of the cache, typically measured in kilobytes (KB), megabytes (MB), or gigabytes (GB). 34 | 35 | - **Impact:** A larger cache can store more data, reducing the miss rate and improving performance. However, larger caches are more expensive and can have higher access latencies. 36 | 37 | - **Cache Lines (Blocks)** 38 | 39 | - **Definition:** The smallest unit of data that can be stored in the cache. Each cache line or block typically ranges from 32 to 128 bytes in size. 40 | 41 | - Structure: 42 | - **Data Field:** Stores the actual data from memory. 43 | - **Tag Field:** Identifies the memory address of the data stored in the cache line. 44 | - **Status Bits:** Include valid bits (indicating if the data is valid) and dirty bits (indicating if the data has been modified). 45 | 46 | - **Direct-Mapped Cache:** 47 | 48 | - **Structure:** Each block of main memory maps to exactly one cache line. 49 | 50 | - Indexing: 51 | 52 | The memory address is divided into three parts: tag, index, and offset. 53 | 54 | - **Tag:** Identifies the specific block. 55 | - **Index:** Identifies the specific cache line. 56 | - **Offset:** Identifies the specific byte within the cache line. 57 | 58 | - **Advantages:** Simple and fast to access. 59 | 60 | - **Disadvantages:** High conflict miss rate, as multiple memory blocks can map to the same cache line. 61 | 62 | - **Fully Associative Cache:** 63 | 64 | - **Structure:** Any block can be stored in any cache line. 65 | 66 | - Indexing: 67 | 68 | The memory address is divided into tag and offset. 69 | 70 | - **Tag:** Identifies the specific block. 71 | - **Offset:** Identifies the specific byte within the cache line. 72 | 73 | - **Advantages:** Low conflict miss rate. 74 | 75 | - **Disadvantages:** Complex and slower due to the need for associative search. 76 | 77 | - **Set-Associative Cache:** 78 | 79 | - **Structure:** Compromise between direct-mapped and fully associative caches. Memory blocks are divided into sets, and each block can map to any line within a set. 80 | 81 | - Indexing: 82 | 83 | The memory address is divided into tag, set index, and offset. 84 | 85 | - **Tag:** Identifies the specific block. 86 | - **Set Index:** Identifies the specific set within the cache. 87 | - **Offset:** Identifies the specific byte within the cache line. 88 | 89 | - **Advantages:** Balances complexity and performance. 90 | 91 | - **Disadvantages:** More complex than direct-mapped but less flexible than fully associative. 92 | 93 | ### Cache Lookup 94 | 95 | 1. **Address Mapping:** 96 | - When the CPU needs to access data, it generates a memory address. The address is divided into three parts: tag, index, and offset. 97 | - **Tag:** Identifies the specific block of memory. 98 | - **Index:** Identifies the specific cache set. 99 | - **Offset:** Identifies the specific byte within the cache line. 100 | 2. **Cache Lookup Process:** 101 | - **Indexing:** The cache controller uses the index part of the address to locate the relevant set in the cache. 102 | - **Tag Comparison:** Within the identified set, the cache controller compares the tag part of the address with the tags stored in the cache lines to check for a match. 103 | - Hit or Miss: 104 | - **Cache Hit:** If a matching tag is found, it means the requested data is in the cache. The data is then accessed using the offset. 105 | - **Cache Miss:** If no matching tag is found, it means the data is not in the cache. The system then fetches the data from the next level of the memory hierarchy (L2, L3, or main memory). 106 | 3. **Handling Cache Misses:** 107 | - **Read Miss:** The data is fetched from the next memory level and loaded into the cache. If necessary, an existing cache line is evicted based on the replacement policy. 108 | - **Write Miss:** Depending on the write policy (write-through or write-back), the data is either written directly to the next memory level or marked as dirty and written back later. 109 | 4. **Cache Write Policies:** 110 | - **Write-Through:** Data is written to both the cache and the next memory level simultaneously. Ensures data consistency but can be slower. 111 | - **Write-Back:** Data is written only to the cache initially and marked as dirty. It is written back to the next memory level only when the cache line is evicted. This reduces write latency but requires more complex control logic. 112 | 5. **Replacement Policies:** 113 | - **Least Recently Used (LRU):** Replaces the cache line that has not been used for the longest time. 114 | - **First In, First Out (FIFO):** Replaces the oldest cache line. 115 | - **Random:** Replaces a randomly selected cache line. 116 | - **Least Frequently Used (LFU):** Replaces the cache line with the fewest accesses. 117 | 118 | ## Cache Improvement 119 | 120 | 1. **Reducing Hit Time** 121 | 122 | - Smaller, Faster Caches: Use a smaller L1 cache to keep the hit time low. Smaller caches are faster because they have less data to search through. 123 | 124 | - Optimized Cache Access Paths: Design the hardware to minimize the delay in accessing the cache. This includes optimizing the paths and logic gates involved in cache access. 125 | 126 | - Parallel Access: Allow simultaneous access to different parts of the cache to improve hit time, particularly useful for multi-core processors. 127 | 128 | 2. **Reducing Miss Rate** 129 | 130 | - Increasing Cache Size: Larger caches store more data, reducing the likelihood of cache misses. However, this comes with increased cost and potentially higher hit time. 131 | - Higher Associativity: Using set-associative or fully associative caches instead of direct-mapped caches can significantly reduce conflict misses. 132 | - Common configurations are 2-way, 4-way, 8-way set-associative caches, where each block can be placed in multiple locations. 133 | 134 | - Smarter Replacement Policies: Implementing more effective replacement policies like Least Recently Used (LRU) instead of simpler policies like FIFO can help keep more useful data in the cache. 135 | 136 | - Cache Line Size Optimization: Adjust the cache line size to balance spatial locality. Larger lines exploit spatial locality better but might lead to increased miss penalties if not utilized fully. 137 | 138 | 3. **Reducing Miss Penalty** 139 | 140 | - Multi-Level Caches: Use multiple levels of cache (L1, L2, L3) to create a hierarchy. Higher-level caches (L2, L3) can store more data and reduce the miss penalty when an L1 miss occurs. 141 | 142 | - Faster Memory Technologies: Use faster memory technologies (e.g., SRAM for caches, DDR for main memory) to reduce the time it takes to fetch data on a miss. 143 | 144 | - Prefetching: Implement hardware or software prefetching to load data into the cache before it is actually needed, thereby reducing miss penalties. Prefetching can be sequential (based on access patterns) or stride-based. 145 | 146 | - Write Policies: Use write-back caches instead of write-through caches to reduce the number of writes to main memory, thereby reducing miss penalties. 147 | 148 | - Non-Blocking Caches: Allow the cache to continue servicing requests while handling a miss, improving overall performance and reducing the effective miss penalty. 149 | 150 | ### Pipelined Cache 151 | 152 | ### TLB Cache Overlap 153 | 154 | ### Virtually Accessed Cache 155 | 156 | ### Way prediction 157 | 158 | ### Replacement Policy 159 | 160 | 161 | 162 | ## Main Memory 163 | 164 | ### Virtual Memory 165 | 166 | - **Virtual Address Space:** Each process is given its own virtual address space, which the process uses to address memory. This space is larger than the actual physical memory and provides isolation between processes. 167 | - **Paging:** Memory is divided into fixed-size blocks called pages. Virtual memory is managed in units of pages, and both virtual and physical memory are divided into pages of the same size (commonly 4KB). 168 | - **Page Table:** A data structure used to map virtual addresses to physical addresses. Each process has its own page table, which the operating system maintains to keep track of the mapping between virtual and physical memory. 169 | - **Page Frames:** The physical memory is divided into page frames, which are the same size as pages. Virtual pages are mapped to these physical page frames. 170 | 171 | ### TLB 172 | 173 | - **Purpose:**The primary purpose of the TLB is to reduce the time required to translate virtual addresses to physical addresses. Without a TLB, each memory access would require traversing the page table, which is much slower. 174 | - **Structure:**The TLB is a small, fast cache located within the CPU. It typically holds a limited number of entries, each mapping a virtual page number to a physical frame number. 175 | - **TLB Entries:** 176 | - **Virtual Page Number (VPN):** The part of the virtual address that identifies the page. 177 | - **Physical Frame Number (PFN):** The part of the physical address that identifies the frame. 178 | - **Control Bits:** Including valid bits, dirty bits, and other status flags. 179 | 180 | ### Address Mapping 181 | 182 | 1. **Virtual Address Generation:** When an application accesses memory, it generates a virtual address. This virtual address consists of two main parts: 183 | - **Virtual Page Number (VPN):** Identifies the specific page in the virtual address space. 184 | - **Offset:** Identifies the specific byte within the page. 185 | 2. **TLB Lookup:** 186 | - The first step in address translation is to check the Translation Lookaside Buffer (TLB). The TLB is a cache that stores recent translations of virtual addresses to physical addresses. 187 | - **TLB Hit:** If the TLB contains the translation for the given VPN, the corresponding Physical Frame Number (PFN) is retrieved. The physical address is then formed by combining the PFN with the offset. 188 | - **TLB Miss:** If the translation is not in the TLB, a page table lookup is required. 189 | 3. **Page Table Lookup:** 190 | - The page table is a data structure maintained by the operating system that maps virtual addresses to physical addresses. Each process has its own page table. 191 | - **Page Table Hierarchy:** Modern systems often use a multi-level page table to manage memory efficiently. Common structures include two-level, three-level, or even four-level page tables. -------------------------------------------------------------------------------- /Basic Knowledge/Data Structure & Algorithm/Backtrack.md: -------------------------------------------------------------------------------- 1 | # Backtrack 2 | 3 | ## Introduction 4 | 5 | 1. **Incremental Approach**: Solutions are built one step at a time. After each step, the algorithm checks if the current partial solution can still lead to a valid complete solution. 6 | 2. **Testing for Validity**: At each step, the algorithm tests whether the current path taken leads to a potential solution. If it violates any constraints, the algorithm backtracks and tries a different path. 7 | 3. **Depth-First Search**: Backtracking typically uses a depth-first search approach, exploring one branch to its conclusion before backtracking and trying another. 8 | 4. **Pruning the Search Space**: An essential aspect of backtracking is pruning, where the algorithm eliminates paths that cannot possibly lead to a solution, thereby reducing the search space. 9 | 10 | ## Subset Problem 11 | 12 | ### Combination 13 | 14 | ```python 15 | def combine(n, k): 16 | def backtrack(start, path): 17 | if len(path) == k: 18 | results.append(path[:]) 19 | return 20 | for i in range(start, n+1): 21 | path.append(i) 22 | backtrack(i+1, path) 23 | path.pop() 24 | results = [] 25 | backtrack(1, []) 26 | return results 27 | ``` 28 | 29 | ### Permutation 30 | 31 | ```python 32 | def permute(n, k): 33 | def backtrack(path): 34 | if len(path) == k: 35 | results.append(path[:]) 36 | return 37 | for i in range(1, n+1): 38 | if i not in path: 39 | path.append(i) 40 | backtrack(path) 41 | path.pop() 42 | results = [] 43 | backtrack(1, []) 44 | return results 45 | ``` 46 | 47 | ### Subset Sum 48 | 49 | ```python 50 | def subsetSub(arr, k): 51 | def backtrack(start, curr): 52 | if curr == k: 53 | return True 54 | elif curr > k or start >= len(arr): 55 | return False 56 | include = backtrack(start + 1, curr + arr[start]) 57 | exclude = backtrack(start + 1, curr) 58 | return include or exclude 59 | return backtrack(0, 0) 60 | ``` 61 | 62 | ## Game Solver -------------------------------------------------------------------------------- /Basic Knowledge/Data Structure & Algorithm/Design.md: -------------------------------------------------------------------------------- 1 | # Design 2 | 3 | ## Cache 4 | 5 | ### LRU 6 | 7 | ```python 8 | class Node: 9 | def __init__(self, key=0, value=0, prev=None, next=None): 10 | self.key = key 11 | self.val = value 12 | self.prev = prev 13 | self.next = next 14 | 15 | class LRUCache: 16 | def __init__(self, capacity: int): 17 | self.capacity = capacity 18 | self.mapper = {} 19 | self.head = Node() 20 | self.tail = Node() 21 | self.head.next = self.tail 22 | self.tail.prev = self.head 23 | 24 | def _insert(self, node): 25 | prev = self.tail.prev 26 | node.prev = prev 27 | prev.next = node 28 | self.tail.prev = node 29 | node.next = self.tail 30 | self.mapper[node.key] = node 31 | 32 | def _remove(self, node): 33 | prev, next = node.prev, node.next 34 | prev.next = next 35 | next.prev = prev 36 | node.prev = None 37 | node.next = None 38 | del self.mapper[node.key] 39 | 40 | def get(self, key: int) -> int: 41 | if key not in self.mapper: 42 | return -1 43 | node = self.mapper[key] 44 | self._remove(node) 45 | self._insert(node) 46 | return node.val 47 | 48 | def put(self, key: int, value: int) -> None: 49 | if key in self.mapper: 50 | node = self.mapper[key] 51 | node.val = value 52 | self._remove(node) 53 | self._insert(node) 54 | else: 55 | if len(self.mapper) == self.capacity: 56 | self._remove(self.head.next) 57 | node = Node(key, value) 58 | self._insert(node) 59 | ``` 60 | 61 | ### LFU 62 | 63 | ## HashTable 64 | 65 | ```python 66 | class HashTable: 67 | def __init__(self, size=100): 68 | self.size = size 69 | self.table = [None] * size 70 | 71 | def _hash(self, key): 72 | return hash(key) % size 73 | 74 | def put(self, key, value): 75 | index = self._hash(key) 76 | if self.table[index] is None: 77 | self.table[index] = [] 78 | for item in self.table[index]: 79 | if item[0] == key: 80 | item[1] = value 81 | return 82 | self.table.append([key, value]) 83 | 84 | def get(self, key): 85 | index = self._hash(key) 86 | if self.table[index] is not None: 87 | for item in self.table[index]: 88 | if item[0] == key: 89 | return item[1] 90 | return None 91 | 92 | def remove(self, key): 93 | index = self._hash(key) 94 | if self.table[index] is not None: 95 | for item in self.table[index]: 96 | if item[0] == key: 97 | self.table[index].remove(key) 98 | return 99 | ``` 100 | 101 | 1. Load Factor: Monitor the load factor (elements vs. slots), and expand when it exceeds a threshold (usually around 0.7 to 0.8) to avoid performance degradation. 102 | 2. Expansion Strategy: Decide how much to increase the size when expanding, with doubling the size being a common choice. 103 | 3. Rehashing: Rehash existing elements efficiently when expanding by recalculating hash codes and redistributing them. 104 | 4. Timing: Choose when to trigger expansion—immediately or deferred—to balance collision prevention and computational cost. 105 | 5. Concurrent Access: Handle expansion carefully in concurrent environments, considering synchronization. 106 | 6. Memory: Be mindful of temporary memory usage increase during expansion. 107 | 7. Performance Testing: Test and benchmark to ensure your chosen parameters align with your use case and performance requirements. 108 | 109 | ## Queue 110 | 111 | ### Heap 112 | 113 | ```python 114 | class MinHeap: 115 | def __init__(self): 116 | self.heap = [] 117 | 118 | def push(self, value): 119 | self.heap.append(value) 120 | self._heapify_up(len(self.heap)-1) 121 | 122 | def pop(self): 123 | if len(self.heap) == 0: 124 | return None 125 | if len(self.heap) == 1: 126 | return self.heap.pop() 127 | root = self.heap[0] 128 | self.heap[0] = self.heap.pop() 129 | self._heapify_down(0) 130 | return root 131 | 132 | def _heapify_up(self, index): 133 | while index > 0: 134 | parent = (index - 1) // 2 135 | if self.heap[index] < self.heap[parent]: 136 | self.heap[index], self.heap[parent] = self.heap[parent], self.heap[index] 137 | else: 138 | break 139 | 140 | def _heapify_down(self, index): 141 | while True: 142 | left = 2 * index + 1 143 | right = 2 * index + 2 144 | smallest = index 145 | if (left < len(self.heap)) and self.heap[left] < self.heap[smallest]: 146 | smallest = left 147 | if (right < len(self.heap)) and self.heap[right] < self.heap[smallest]: 148 | smallest = right 149 | if smallest != index: 150 | self.heap[smallest], self.heap[index] = self.heap[index], self.heap[smallest] 151 | index = smallest 152 | else: 153 | break 154 | ``` 155 | 156 | A linear time heapify operation can be performed by starting from the middle of the heap and working your way up to the root, ensuring that the subtree rooted at each node satisfies the heap property. 157 | 158 | ```python 159 | def heapify(arr): 160 | n = len(arr) 161 | for i in range(n // 2 - 1, -1, -1): 162 | helper(arr, i, n) 163 | 164 | def helper(arr, index, size): 165 | smallest = index 166 | left = 2 * index + 1 167 | right = 2 * index + 2 168 | if (left < size) and arr[left] < arr[smallest] 169 | smallest = left 170 | if (right < size) and arr[right] < arr[smallest]: 171 | smallest = right 172 | if smallest != index: 173 | arr[smallest], arr[index] = arr[index], arr[smallest] 174 | helper(arr, smallest, size) 175 | ``` 176 | 177 | ### Priority Queue 178 | 179 | ```python 180 | import heapq 181 | 182 | class PriorityQueue: 183 | def __init__(self): 184 | self.heap = [] 185 | 186 | def push(self, item, priority): 187 | heapq.heappush(self.heap, (priority, item)) 188 | 189 | def pop(self): 190 | return heapq.heappop(self.heap)[1] 191 | 192 | def top(self): 193 | return self.heap[0][1] 194 | ``` 195 | 196 | ## Stack 197 | 198 | ### Min Stack 199 | 200 | ### Max Stack 201 | 202 | ## Union-Find 203 | 204 | ```python 205 | class UnionFind: 206 | def __init__(self, size): 207 | self.root = [i for i in range(size)] 208 | self.rank = [0] * size 209 | 210 | def find(self, x): 211 | if self.root[x] != x: 212 | self.root[x] = self.find(self.root[x]) # Path compression 213 | return self.root[x] 214 | 215 | def union(self, x, y): 216 | rootx = self.find(x) 217 | rooty = self.find(y) 218 | if rootx != rooty: 219 | if self.rank[rootx] < self.rank[rooty]: 220 | self.parent[rootx] = rooty 221 | elif self.rank[rootx] > self.rank[rooty]: 222 | self.parent[rooty] = rootx 223 | else: 224 | self.parent[rooty] = rootx 225 | self.rank[rootx] += 1 226 | 227 | def connected(self, x, y): 228 | return self.find(x) == self.find(y) 229 | ``` 230 | 231 | ## File System 232 | 233 | 234 | 235 | -------------------------------------------------------------------------------- /Basic Knowledge/Data Structure & Algorithm/Dynamic Programming.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Basic Knowledge/Data Structure & Algorithm/Dynamic Programming.md -------------------------------------------------------------------------------- /Basic Knowledge/Data Structure & Algorithm/Sorting and Searching.md: -------------------------------------------------------------------------------- 1 | [toc] 2 | 3 | # Sorting and Searching 4 | 5 | ## Sorting 6 | 7 | ### Bubble Sort 8 | 9 | - **How it works:** Bubble Sort repeatedly compares adjacent elements and swaps them if they are in the wrong order. It continues this process until no more swaps are needed. 10 | - **Time Complexity:** O(n^2) in the worst and average cases, where 'n' is the number of elements. 11 | - **Space Complexity:** O(1) - Bubble Sort is an in-place sorting algorithm, which means it doesn't require additional memory for sorting. 12 | 13 | ```python 14 | def bubble_sort(arr): 15 | n = len(arr) 16 | for i in range(n): 17 | for j in range(n-i-1): 18 | if arr[j] > arr[j+1]: 19 | arr[j], arr[j+1] = arr[j+1], arr[j] 20 | ``` 21 | 22 | ### Insertion Sort 23 | 24 | - **How it works:** Insertion Sort builds the final sorted array one item at a time by taking an element from the unsorted part and inserting it into its correct position within the sorted part. 25 | - **Time Complexity:** O(n^2) in the worst and average cases. 26 | - **Space Complexity:** O(1) - Like the previous two, Insertion Sort is an in-place sorting 27 | 28 | ```python 29 | def insertion_sort(arr): 30 | n = len(arr) 31 | for i in range(1, n): 32 | j = i-1 33 | item = arr[i] 34 | while j >= 0 and arr[j] > arr[j+1]: 35 | arr[j], arr[j+1] = arr[j+1], arr[j] 36 | j -= 1 37 | arr[j+1] = item 38 | ``` 39 | 40 | ### Selection Sort 41 | 42 | - **How it works:** Selection Sort divides the input into two parts: a sorted part and an unsorted part. It repeatedly selects the minimum element from the unsorted part and appends it to the sorted part. 43 | - **Time Complexity:** O(n^2) in the worst and average cases. 44 | - **Space Complexity:** O(1) - Selection Sort is also an in-place sorting algorithm. 45 | 46 | ```python 47 | def selection_sort(arr): 48 | n = len(arr) 49 | for i in range(n): 50 | min_index = i 51 | for j in range(i+1, n): 52 | if arr[j] < arr[min_index]: 53 | min_index = j 54 | arr[i], arr[min_index] = arr[min_index], arr[i] 55 | ``` 56 | 57 | ### Quick Sort 58 | 59 | - **How it works:** Quick Sort also uses a divide-and-conquer approach. It selects a pivot element and partitions the array into two subarrays: one with elements less than the pivot and one with elements greater than the pivot. It then recursively sorts the subarrays. 60 | - **Time Complexity:** O(n^2) in the worst case (rare), but O(n log n) on average and in the best case. 61 | - **Space Complexity:** O(log n) - Quick Sort is often in-place, and the space complexity is determined by the recursion stack. 62 | 63 | ```python 64 | def quick_sort(arr, low, high): 65 | if low >= high: 66 | return 67 | mid = partition(arr, low, high) 68 | quick_sort(arr, low, mid-1) 69 | quick_sort(arr, mid+1, high) 70 | 71 | def partition(arr, low, high): 72 | key = arr[high] 73 | j = low 74 | for i in range(low, high): 75 | if arr[i] <= key: 76 | arr[j], arr[i] = arr[i], arr[j] 77 | j += 1 78 | arr[j], arr[high] = arr[high], arr[j] 79 | return j 80 | ``` 81 | 82 | ### Merge Sort 83 | 84 | - **How it works:** Merge Sort is a divide-and-conquer algorithm. It divides the unsorted list into smaller sublists, sorts them, and then merges the sorted sublists until the entire list is sorted. 85 | - **Time Complexity:** O(n log n) in the worst, average, and best cases. 86 | - **Space Complexity:** O(n) - Merge Sort typically requires additional space for the merging step, making it not entirely in-place. 87 | 88 | ```python 89 | def merge_sort(arr): 90 | if len(arr) <= 1: 91 | return arr 92 | mid = len(arr) // 2 93 | left = merge_sort(arr[:mid]) 94 | right = merge_sort(arr[mid:]) 95 | sort_arr = merge(left, right) 96 | return sort_arr 97 | 98 | def merge(left, right): 99 | sort_arr = [] 100 | i, j = 0, 0 101 | while i < len(left) and j < len(right): 102 | if left[i] <= right[j]: 103 | sort_arr.append(left[i]) 104 | i += 1 105 | else: 106 | sort_arr.append(right[j]) 107 | j += 1 108 | while i < len(left): 109 | sort_arr.append(left[i]) 110 | i += 1 111 | while j < len(right): 112 | sort_arr.append(right[j]) 113 | j += 1 114 | return sort_arr 115 | ``` 116 | 117 | ### Heap Sort 118 | 119 | - **How it works:** Heap Sort uses a binary heap data structure to repeatedly extract the maximum element (for ascending order) and place it at the end of the sorted portion of the array. 120 | 121 | - **Time Complexity:** O(n log n) in all cases. 122 | 123 | - **Space Complexity:** O(1) - Heap Sort is an in-place sorting algorithm. 124 | 125 | ```python 126 | def heapify(arr, n, i): 127 | largest = i 128 | left = 2 * i + 1 129 | right = 2 * i + 2 130 | 131 | if left < n and arr[left] > arr[largest]: 132 | largest = left 133 | 134 | if right < n and arr[right] > arr[largest]: 135 | largest = right 136 | 137 | if largest != i: 138 | arr[i], arr[largest] = arr[largest], arr[i] 139 | heapify(arr, n, largest) 140 | 141 | def heap_sort(arr): 142 | n = len(arr) 143 | 144 | # Build a max heap. 145 | for i in range(n // 2 - 1, -1, -1): 146 | heapify(arr, n, i) 147 | 148 | # Extract elements one by one. 149 | for i in range(n - 1, 0, -1): 150 | arr[i], arr[0] = arr[0], arr[i] # Swap the root (maximum element) with the last element. 151 | heapify(arr, i, 0) 152 | ``` 153 | 154 | ## Searching 155 | 156 | ### Linear Search 157 | 158 | ```python 159 | def linear_serach(arr, target): 160 | n = len(arr) 161 | for i in range(n): 162 | if arr[i] == target: 163 | return i 164 | return -1 165 | ``` 166 | 167 | ### Binary Search 168 | 169 | ```python 170 | def binary_search(arr, target): 171 | left, right = 0, len(arr) - 1 172 | while left < right: 173 | mid = left + (right - left) // 2 174 | if arr[mid] >= target: 175 | right = mid 176 | else: 177 | left = mid + 1 178 | return left 179 | ``` 180 | -------------------------------------------------------------------------------- /Basic Knowledge/Database/Data Model.md: -------------------------------------------------------------------------------- 1 | [toc] 2 | 3 | ## Extended Entity-Relationship Model 4 | 5 | ### Entity 6 | 7 | - Entities represent real-world objects, concepts, or things that are important to the database. They are typically nouns and have attributes that describe their properties. 8 | - Attributes are characteristics or properties of an entity. They provide additional information about the entity. 9 | - Single-valued property: only has single value, represented by an ellipsis 10 | - Multi-valued property: can have more than one value. We represent these property types with a double ellipsis 11 | - Composite property: combine two or more property types to create composite property types 12 | - Union Entity: Union types represent attributes that can have multiple types. 13 | 14 | ### Relationship 15 | 16 | - Relationships represent associations or connections between entities. They describe how entities interact or are related to each other 17 | 18 | - Cardinality: Cardinality defines the number of instances of one entity that can be related to instances of another entity in a relationship 19 | - One-to-One: each instance of one entity is associated with exactly one instance of another entity 20 | - One-to-Many: each instance of one entity can be associated with multiple instances of another entity, but each instance of the second entity is associated with only one instance of the first entity 21 | - Many-to-Many: multiple instances of one entity can be associated with multiple instances of another entity 22 | - N-ary (ternary): the relationship is defined between multiple entities simultaneously; For example, a student enrolls in a course taught by a specific professor 23 | - Identifying relationship: the existence of a relationship depends on the existence of certain entities; the primary key of the child entity includes the primary key of the parent entity as part of its own primary key; the child entity is a weak entity type 24 | - Recursive relationship: an entity is related to itself through a relationship 25 | 26 | ### Constraint 27 | 28 | - Entity Integrity Constraint: the primary key attribute of an entity must have a unique value and cannot be null 29 | - Referential Integrity Constraint: foreign key values in a relationship must match the primary key values of the related entity or be null 30 | - Domain Constraint: the values stored in an attribute adhere to specific data types, formats, or value restrictions 31 | - Caridinality Constraint: specify the minimum and maximum number of occurrences allowed in a relationship 32 | - Nullability Constraint: whether an attribute can have null values or must have non-null values 33 | 34 | ### Inheritance 35 | 36 | - subtypes: represent specialized entities that inherit attributes and relationships from a more general entity called a supertype 37 | - supertypes: general entity 38 | - Specification: defining subtypes from a supertype 39 | - Generalizaiton: creating a supertype from existing subtypes 40 | 41 | ### E-R Diagram 42 | 43 | - Entity: Entities are represented by rectangles (or squares) with the entity name written inside 44 | - Attribute: typically shown as ovals or ellipses connected to the respective entity box 45 | - Primary Key: underlined within the attribute oval or written separately below the attribute name 46 | - Relationship: represented by diamonds (or rhombuses) connected by lines to the participating entities 47 | - Cardinality Notations: indicate the number of instances of one entity that can be associated with instances of another entity in a relationship 48 | - Inheritance Symbols: 49 | - Solid line with a triangle arrowhead: Indicates a specialization hierarchy where the subtype entities inherit attributes and relationships from the supertype 50 | - Dashed line with a triangle arrowhead: Represents a generalization hierarchy where a supertype is created from existing subtypes 51 | 52 | - Disjointness and Completeness Constraints 53 | - Disjoint: subtypes are exclusive, 54 | - Overlapping: subtype can overlap 55 | - Ternary (N-ary) Relationship Notation: These relationships are typically represented by diamonds (or rhombuses) with lines connecting them to the participating entities. The lines may include labels to describe the nature of the relationship. 56 | - Union Type Notation: They are represented by a circle or ellipse split into sections, each section corresponding to a possible type of the attribute. 57 | - Attribute Inheritance Notation: a dashed line is drawn from the supertype's attribute to the subtype's attribute. 58 | 59 | ## Relational Model 60 | 61 | - Tables/Relations: Tables are used to represent entities, and each table has a unique name. 62 | - Rows/Tuples: Each row in a table represents a specific occurrence of an entity. 63 | - Columns/Attributes: Columns define the properties or characteristics of an entity. 64 | - Keys: Keys are used to uniquely identify rows in a table 65 | - Relationships: Relationships represent associations between entities in the database. They are established through the use of keys 66 | 67 | ### Relation Mapping 68 | 69 | - Entity Types: Each entity type in the ER model corresponds to a relation in the relational model. 70 | - The relation has the same name as the entity type 71 | - If the entity type has an identifying property, it becomes the primary key of the relation 72 | - Other properties of the entity type become attributes in the relation 73 | - Composite Property Types: If an entity type has a composite property type, composed of multiple sub-properties, the sub-properties become separate attributes in the relation and we lose the composite property itself 74 | - Multi-Valued Property Types: The relational model does not directly support multi-valued attributes. If an entity type has a multi-valued property type, a separate relation is created to represent the property 75 | 76 | - Relationships: 77 | - One-to-One: For a 1:1 relationship between entity types ET1 and ET2, a foreign key can be added either in ET1 or ET2 to establish the connection 78 | - One-to-Many Relationships: In a 1:N relationship, where one instance of ET1 relates to many instances of ET2, the foreign key should be added in ET2. 79 | - Many-to-Many Relationships: In an N:M relationship, a separate relation R is created to represent the relationship. R contains foreign keys referencing the primary keys of ET1 and ET2 80 | 81 | - Inheritance 82 | - Mandatory Disjoint: Create separate relations for ET1 and ET2, both inheriting the primary key A and attribute B from ET. ET1 has an additional attribute C, while ET2 has an additional attribute D. 83 | - Mandatory, Overlap Allowed: Create a single relation for ET with attributes A, B, C, and D. Include an attribute named "Type" to indicate whether the tuple is an instance of ET1, ET2, or both 84 | - Non-Mandatory, Overlap Allowed: Create relation ET and separate relations for ET1 and ET2, both inheriting the primary key A from ET. ET1 has an additional attribute C, and ET2 has an additional attribute D. Instances of ET are added as tuples in ET and can be referenced by ET1, ET2, both, or neither 85 | - Non-Mandatory, Disjoint: Create separate relations for ET, ET1, and ET2. ET1 and ET2 inherit the primary key A from ET. Tuples in ET can exist without being referenced by ET1 or ET2, and when referenced, they exclusively belong to either ET1 or ET2 86 | 87 | - Union Type 88 | - Create a relation for ET that includes the property type B. 89 | - Introduce an artificial identifier in the ET relation called ET-ID. ET-ID will consist of either a value of C or D, representing the primary key of either ET1 or ET2. 90 | - Create separate relations for ET1 and ET2 with primary keys C and D respectively. The artificial identifier allows us to uniquely identify each tuple in ET, as every tuple must be an instance of either ET1 or ET2. 91 | -------------------------------------------------------------------------------- /Basic Knowledge/Database/Database Design.md: -------------------------------------------------------------------------------- 1 | [toc] 2 | 3 | ## Normalization 4 | 5 | ### Non-normalized Problem 6 | 7 | - Redundancy: Non-normalized data with redundancy for some attributes often has insertion anomaly, delete anomaly or update anomaly problems 8 | - Information loss: Decomposing a table into multiple tables can enforce functional dependencies, but improper handling of shared attributes may lead to information loss and inaccurate data representation. 9 | - Dependency loss: Dependency loss happens when decomposing a table into multiple tables, leading to the inability to enforce functional dependencies that existed in the original table. 10 | 11 | ### Functional Dependency 12 | 13 | - Functional dependencies express relationships between sets of attributes in a relation. 14 | - For each value of X in R, there is precisely one value of Y in R that is functionally dependent on X. 15 | - Full functional dependencies are those where Y is dependent on X, and Y is not dependent on any proper subset of X. 16 | - Keys are used to enforce functional dependencies. Making X the key in a relation enforces the dependency that X determines Y. 17 | - Enforcing functional dependencies ensures uniqueness and establishes a one-to-one relationship between X and Y values in the relation. 18 | 19 | ### Normal Form 20 | 21 | - First normal form (1NF) requires that all domain values in a relation R are atomic (indivisible). All relations are automatically in 1NF, as they are data structures with atomic values. 22 | - Second normal form (2NF) is achieved when R is in 1NF, and every non-key attribute is fully dependent on the key. 23 | - Third normal form (3NF) is attained when R is in 2NF, and every non-key attribute is non-transitively dependent on the key. 24 | - Boyce-Codd normal form (BCNF) is reached when R is in 3NF, and every determinant (set of attributes) is a candidate key. A determinant is a set of attributes in R on which some other attribute is fully functionally dependent. 25 | 26 | ### Armstrong Rule 27 | 28 | - The rule of **reflexivity** states that if Y is a subset of X, then X functionally determines Y. 29 | - The rule of **augmentation** states that if X functionally determines Y, then WX (X with additional attributes) functionally determines WY (Y with additional attributes). 30 | - The rule of **transitivity** states that if X determines Y, and Y determines Z, then X determines Z through transitivity. 31 | 32 | ## Architecture 33 | 34 | ### Data Storage 35 | 36 | - Secondary storage is a physical disk and holds data and applications 37 | - Main memory (RAM) stores currently running programs and their data 38 | - CPU can only query and update data in main memory. 39 | - Data is transported between disk and main memory using a bus. 40 | 41 | ### Data Access 42 | 43 | - Main memory access takes about 30 nanoseconds (10^-7 seconds). 44 | - Disk access takes about 10 milliseconds (10^-2 seconds). 45 | - Main memory access is about 100,000 times faster than disk access. 46 | 47 | ### Disk Structure 48 | 49 | - Disk consists of circular platters, read-write heads, and an actuator. 50 | - Platter surfaces are split into sectors (512 bytes) and tracks. 51 | - Collection of tracks passing under the heads is called a cylinder. 52 | 53 | ![An Overview of Disk Structure - Administration Practice Windows Server 2003](../../Images/1783_2454_329-fat32-file-system.jpg) 54 | 55 | - Blocks are a unit of information transferred between disk and main memory. 56 | - Several records can be stored in a single block. 57 | - Files consist of multiple blocks connected by address pointers. 58 | 59 | ### Data Seek 60 | 61 | - Seek times take 3-8ms, rotational delays take 2-3ms, and transfer times take 0.5-1ms. 62 | - Page fault (disk to main memory transfer) takes approximately 10ms. 63 | - Bulk transfers (extent transfers) can save seek and rotational delay costs. 64 | 65 | - Buffer space may be needed for bulk transfers. 66 | - Least Recently Used (LRU) strategy is a common buffer management approach. 67 | 68 | ## Data Organization 69 | 70 | ### Heap 71 | 72 | - Heap is an unsorted data file organization strategy. 73 | - Searching for a record in a heap requires scanning data pages as the location is unknown. On average, searching a heap with N data blocks requires examining N/2 data blocks. 74 | - Sorted files improve lookup time compared to unsorted heaps. Even with sorted data, searching block by block still requires examining about half of the blocks on average. 75 | - Binary search is a search technique used with sorted data. Binary search divides the search space in half on each iteration, making it more efficient. The time complexity of binary search for N blocks is log2(N). 76 | - Binary search is significantly faster than linear search for large datasets. Binary search scales logarithmically, while linear search scales linearly with the number of blocks (N). 77 | 78 | ### Index 79 | 80 | - Sparse primary index: Iterates through data blocks, appends the key value of the first record and a pointer to that block to the index block. The index is sorted. Ratio of index blocks to data blocks is called **fanout** 81 | - Dense primary index: Iterates through data blocks, appends key value of each record and a pointer to that block to the index block. The index is sorted. Dense index has one index entry per record, while sparse index has one per data block. 82 | - Secondary Index: Built when indexing records based on non-key attribute values. Secondary indices must be dense and are suited for point queries. 83 | 84 | - Multilevel Index: Multilevel indices reduce search space further and improve efficiency. Top-level index points to lower-level indices. Searching through multilevel index is still logarithmic in n with base equal to the fanout. 85 | - B+ tree index: In a B+ tree, data resides in leaf nodes, while other nodes are index nodes. B+ tree remains balanced, maintaining similar distance from root to leaves for all leaves. Overflows at lower levels rarely propagate up more than one or two levels, if at all. 86 | 87 | ### Hashing 88 | 89 | - In static hashing, key values are mapped to buckets using a hash function, h. A good hash function should distribute values uniformly, avoid collisions, and fill the buckets as much as possible. 90 | - Inserting records involves hashing the key to find the bucket, then locating or allocating the data block associated with that bucket, and writing the record to the block. If a data block is full, a new block is allocated for records associated with that bucket, and its pointer is stored in the previous block. 91 | - To find a record, input the key to the hash function, get the bucket number, look up the block address in the directory, and access the block for the record. 92 | - Static hashing provides constant time data access but may lead to longer chains of data blocks over time. Dynamic hash functions can expand the address space to maintain constant data access cost. 93 | - A good hash function is essential to allocate records uniformly across buckets to avoid uneven chain lengths and degraded search performance. -------------------------------------------------------------------------------- /Basic Knowledge/Database/Introduction.md: -------------------------------------------------------------------------------- 1 | [toc] 2 | 3 | ## Databases 4 | 5 | A *database* is an organized collection of inter-related data that models some aspect of the real-world. 6 | 7 | A database management system (DBMS) is the software that manages a database. 8 | 9 | A general-purpose DBMS is designed to allow the definition, creation, querying, updation, and administration of databases. 10 | 11 | ## Data Models 12 | 13 | A *data model* is a collection of concepts for describing the data in a database. 14 | 15 | ### Data Structure 16 | 17 | - Data is represented using tables in the relational model. Tables consist of columns (attributes) and rows (entries). 18 | - The number of columns is the degree of the table, and the number of rows is the cardinality. Each column has an associated data type that defines the domain of values it can hold. 19 | - The schema defines the structure of the data and is relatively stable. The schema of a table includes the table name, column names, and column data types. 20 | - Rows in the table represent the current state of reality as modeled by the table's structure. 21 | 22 | ### Constraints 23 | 24 | - Constraints are rules defined to restrict the data values permitted in a system. 25 | - Constraints go beyond the schema and column data types to provide additional restrictions. 26 | - Common constraints include primary key constraint, unique constraint, not null constraint, etc. 27 | 28 | ### Operations 29 | 30 | - We use **operations** to retrieve and change data. 31 | - Common operations include select, delete, insert, etc. 32 | 33 | ### Integrity and Consistency 34 | 35 | - Integrity refers to how accurately the database reflects reality. 36 | - Poor integrity can be seen when the information in the database does not match the actual state of the data in reality. 37 | - Consistency relates to the absence of internal conflicts within a database. 38 | - Inconsistencies can occur when there are conflicting or contradictory values across different tables or columns within a table. 39 | 40 | ### Surrogate 41 | 42 | - To record the information of regular users, a RegularUser table can be created in the database with Email, Name, and Address columns. 43 | - Changes in the recorded information can occur over time, such as a user's name or address. 44 | - An alternative approach to implementing the table is to include an additional column called UserID, which stores system-generated unique identifiers. 45 | - These unique identifiers act as surrogates to identify users in the real world, even if other information about them changes. 46 | - The use of surrogates provides a more reliable and consistent means of identifying users, regardless of any changes to their personal details. 47 | 48 | ## Database Architecture 49 | 50 | - Database data is an extension of the schema, representing permitted data instances. The separation of schema from data enables efficient data access. 51 | - The ANSI/SPARC three-level architecture includes conceptual, internal, and external schema. Conceptual schema focuses on data meaning, internal schema on storage, and external schema on application-specific use. 52 | - The database management system translates queries and responses between schema for efficient data retrieval and presentation. 53 | 54 | ![A diagram of the ANSI/SPARC 3-level database architecture.](../../Images/20220829220512.png) 55 | 56 | ### Conceptual Schema 57 | 58 | - The conceptual schema describes general and time-invariant structures of reality. 59 | - It does not involve data representation, physical organization, access, or use. 60 | - The conceptual schema defines the table and its columns. 61 | - Queries can be written against the conceptual schema to fetch specific data. 62 | - The conceptual schema does not control the display or underlying data access. 63 | 64 | ### External Schema 65 | 66 | - The external schema represents a subset of information derived from the conceptual schema. 67 | - It is designed to cater to the specific needs of a particular user group. 68 | - An external schema can be created using the "create view" statement. 69 | - Views serve as a window into the database rather than physically existing as separate tables. 70 | 71 | ### Internal Schema 72 | 73 | - The internal schema describes the physical representation of data specified in the conceptual schema. 74 | - Indexing is a primary technique used at the internal schema level. It aims to optimize query and update performance. 75 | - Indexes are not directly accessed by applications, allowing for flexibility in replacing or removing indexes without impacting the applications. 76 | 77 | ### Data Independence 78 | 79 | - Physical data independence refers to the ability to modify the internal schema without affecting external schema-based applications. 80 | - Logical data independence describes how much we can change the conceptual schema without changing the applications that run on the external schemata. 81 | 82 | ## DBMS Framework 83 | 84 | - The ANSI/SPARC DBMS Framework was proposed in 1975 and defines the components of a three-level architecture. 85 | 86 | - The schema compiler is responsible for handling schema definitions. The enterprise administrator defines conceptual schemata, the application system administrator defines external schemata, and the database administrator defines internal schemata. 87 | - The schema definitions are checked for syntax correctness and stored in the metadatabase. 88 | - The query transformer translates user queries from the external schema level to the conceptual schema level and then to the internal schema level. 89 | - The internal schema to storage transformer translates the query into operating system calls to retrieve data from storage. 90 | - The process is reversed when the database system prepares a response to return to the user. 91 | - In production database systems, this process is optimized for efficiency, but the overall functionality remains the same. 92 | 93 | ![The ANSI/SPARC DBMS Framework.](../../Images/20220830151759.png) -------------------------------------------------------------------------------- /Basic Knowledge/Database/SQL.md: -------------------------------------------------------------------------------- 1 | [toc] 2 | 3 | ## Structured Query Language 4 | 5 | - Data Manipulation Language (DML): SELECT, INSERT, UPDATE, and DELETE statements. 6 | - Data Definition Language (DDL): Schema definitions for tables, indexes, views, and other objects. 7 | - Data Control Language (DCL): Security, access controls. 8 | 9 | ### Aggregates 10 | 11 | Types of Aggregates Functions: 12 | 13 | - AVG(COL): The average of the values in COL 14 | - MIN(COL): The minimum value in COL 15 | - MAX(COL): The maximum value in COL 16 | - COUNT(COL): The number of tuples in the relation 17 | 18 | DISTINCT keyword: select different results 19 | 20 | Non-aggregated values in SELECT output clause must appear in GROUP BY clause. 21 | 22 | The HAVING clause filters output results based on aggregation computation. This make HAVING behave like a WHERE clause for a GROUP BY. 23 | 24 | ### String 25 | 26 | The SQL standard says that strings are case sensitive and single-quotes only. 27 | 28 | - Pattern Matching: The LIKE keyword is used for string matching in predicates. '%' matches any substrings, '_' matches any one character 29 | - Concatenation: Two vertical bars (“||”) will concatenate two or more strings together into a single string. 30 | - String Functions: Many database systems implement other functions in addition to those in the standard. Examples of standard string functions include SUBSTRING(S, B, E) and UPPER(S). 31 | 32 | ### Output Control 33 | 34 | - ORDER BY: impose a sort on tuples, ASC for ascending results, DESC for descending results 35 | - LIMIT: restrict the number of result tuples, use OFFSET to return a range in the results 36 | 37 | ### Nested Query 38 | 39 | - ALL: Must satisfy expression for all rows in sub-query. 40 | - ANY: Must satisfy expression for at least one row in sub-query. 41 | - IN: Equivalent to =ANY(). 42 | - EXISTS: At least one row is returned. 43 | 44 | ### Window Function 45 | 46 | - ROW NUMBER: assigns a number to each row in order, regardless of any ties 47 | - RANK: assigns same rank to ties, and skips ahead to next number in position 48 | - DENSE_RANK: assigns same rank to ties, but continues with the sequence 49 | - PARTITION BY: Use PARTITION BY to specify group. 50 | - ORDER BY: ORDER BY within OVER to ensure a deterministic ordering of results even if database changes internally. 51 | 52 | ### Common Table Expressions 53 | 54 | CTEs can be thought of as a temporary table that is scoped to a single query. 55 | 56 | WITH clause binds the output of the inner query to a temporary result with that name. 57 | 58 | Adding the RECURSIVE keyword after WITH allows a CTE to reference itself. -------------------------------------------------------------------------------- /Basic Knowledge/Machine Learning/Computer Vision.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Basic Knowledge/Machine Learning/Computer Vision.md -------------------------------------------------------------------------------- /Basic Knowledge/Machine Learning/Recommder System.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Basic Knowledge/Machine Learning/Recommder System.md -------------------------------------------------------------------------------- /Basic Knowledge/Machine Learning/Reinforcement Learning.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Basic Knowledge/Machine Learning/Reinforcement Learning.md -------------------------------------------------------------------------------- /Basic Knowledge/Operating System/Introduction.md: -------------------------------------------------------------------------------- 1 | [toc] 2 | 3 | # Introduction 4 | 5 | ## Overview 6 | 7 | An operating system (OS) is like the conductor of an orchestra, managing and coordinating various components of a computer system to ensure it runs smoothly and efficiently. It serves as the bridge between hardware and software, enabling users and applications to interact with the computer's resources. 8 | 9 | ## Components 10 | 11 | **Prompt: What are key components for an operating system?** 12 | 13 | An operating system is composed of several key components, each responsible for specific tasks and functions. These components work together to manage the computer's hardware and software resources effectively. Here are the primary components of an operating system: 14 | 15 | 1. **Kernel:** 16 | - The kernel is the core component of the operating system. It provides essential services and manages the most critical system functions, including process management, memory management, and hardware communication. 17 | - It interacts directly with the computer's hardware and ensures that different parts of the operating system and applications can access hardware resources securely. 18 | 2. **User Interface:** 19 | - The user interface (UI) is the part of the operating system that allows users to interact with the computer. It can take various forms: 20 | - **Graphical User Interface (GUI):** This is the familiar graphical desktop environment, which includes windows, icons, menus, and buttons. Popular examples include Windows Explorer in Windows and the desktop environment in macOS. 21 | - **Command-Line Interface (CLI):** Users interact with the computer by typing text-based commands. Examples include the Windows Command Prompt, macOS Terminal, and various Unix/Linux shells. 22 | 3. **System Libraries:** 23 | - System libraries are collections of pre-written code that provide standard functions and procedures for software applications. These libraries abstract the complexities of hardware and provide a higher-level interface for developers. 24 | - Application programmers can use these libraries to perform tasks such as file manipulation, input/output operations, and network communication. 25 | 4. **Device Drivers:** 26 | - Device drivers are software modules that serve as intermediaries between the operating system and hardware devices. They enable the operating system and applications to communicate with hardware components like printers, graphics cards, and network adapters. 27 | - Device drivers are specific to the hardware they support and are essential for proper hardware operation. 28 | 5. **File System:** 29 | - The file system manages the storage and organization of data on storage devices, such as hard drives and solid-state drives. It includes file structures, directories, and metadata. 30 | - The file system component is responsible for creating, deleting, reading, and writing files, as well as enforcing access permissions. 31 | 6. **Process Management:** 32 | - This component is responsible for creating, scheduling, and terminating processes, which are individual program units. It allocates CPU time to processes and handles process synchronization and communication. 33 | 7. **Memory Management:** 34 | - Memory management is responsible for allocating and deallocating memory resources to processes. It keeps track of used and available memory, preventing memory conflicts and protecting processes from interfering with one another. 35 | 8. **I/O (Input/Output) Management:** 36 | - I/O management handles input and output operations between the computer and external devices. It ensures that data is read from and written to devices correctly, efficiently, and in a controlled manner. 37 | 9. **Security and Access Control:** 38 | - This component enforces security measures to protect system resources and data. It manages user accounts, permissions, and authentication, ensuring that only authorized users can access certain functions or data. 39 | 40 | ## Types 41 | 42 | **Prompt: What are some common operating systems?** 43 | 44 | - **Desktop Operating Systems:** Examples include Microsoft Windows, macOS, and various Linux distributions. These are designed for personal computers and workstations. 45 | - **Server Operating Systems:** These are optimized for managing server hardware and include systems like Windows Server, Linux servers, and Unix variants. 46 | - **Mobile Operating Systems:** Android and iOS are popular mobile operating systems for smartphones and tablets. 47 | - **Embedded Operating Systems:** These are used in devices like smart appliances, IoT devices, and industrial machinery. 48 | - **Real-Time Operating Systems (RTOS):** RTOSs are designed for applications where precise timing is critical, such as in automotive systems, medical devices, and aerospace. 49 | 50 | ## Architecture 51 | 52 | **Prompt: What are common operating system architectures?** 53 | 54 | Operating system architectures, also known as system architectures, refer to the underlying structure and design principles that determine how an operating system is organized and how it interacts with hardware and software. There are several common operating system architectures, each with its own characteristics. Some of the most common architectures include: 55 | 56 | 1. **Monolithic Kernel:** 57 | - In a monolithic kernel, the entire operating system is a single, large program that runs in kernel mode. All system services, such as process management, memory management, file systems, and device drivers, are part of this single program. 58 | - Monolithic kernels are typically efficient but less modular and less fault-tolerant. If one component crashes, it can potentially affect the entire system. 59 | - Examples include Linux and earlier versions of Microsoft Windows (e.g., Windows 95, 98). 60 | 2. **Microkernel:** 61 | - The microkernel architecture strives to keep the kernel as small as possible, with most of the OS services running as user-level processes or in separate modules. The core kernel handles only essential functions like inter-process communication and hardware abstraction. 62 | - Microkernels are more modular, which can make them more robust and easier to extend. If one component fails, it's less likely to crash the whole system. 63 | - Examples include the QNX operating system and the Mach microkernel used in various systems. 64 | 3. **Hybrid Kernel:** 65 | - The hybrid kernel architecture combines elements of both monolithic and microkernel designs. It has a core set of essential services in kernel mode, but additional services run in user mode as separate modules. 66 | - This architecture aims to strike a balance between efficiency and modularity. It can provide better performance than pure microkernels while maintaining some of the advantages of modularity. 67 | - Microsoft Windows NT, Windows 2000, Windows XP, and later versions use a hybrid kernel. 68 | 5. **Virtual Machine (VM) Monitor:** 69 | - This architecture is used in virtualization environments where multiple guest operating systems run on a host system. A VM monitor (also known as a hypervisor) sits between the hardware and the guest OSs, providing virtualized resources to each guest. 70 | - Examples include VMware, Microsoft Hyper-V, and Xen. 71 | 6. **Client-Server Architecture:** 72 | - In this architecture, the operating system is divided into client and server components. The client interacts with the user or applications, while the server handles system services and resource management. 73 | - Examples include Windows Server, where the user interface and applications run on top of a server core. 74 | 75 | These architectural designs vary in their trade-offs between performance, modularity, and reliability. The choice of architecture often depends on the specific requirements and use cases of the operating system, as well as the goals of the OS developers. 76 | 77 | ## Levels 78 | 79 | **Prompt: What are kernel mode and user mode, what's the difference?** 80 | 81 | Kernel mode and user mode are two distinct privilege levels that exist in modern operating systems to ensure system stability, security, and resource protection. These privilege levels define the extent of control and access that a program or process has over system resources. Here's an explanation of kernel mode and user mode and their key differences: 82 | 83 | **Kernel Mode:** 84 | 85 | 1. **Privilege Level:** Kernel mode, also known as supervisor mode or privileged mode, is the highest privilege level that the CPU can operate in. 86 | 2. **Access to Resources:** In kernel mode, a program or process has unrestricted access to system resources and hardware. It can execute any CPU instruction, access any memory location, and control all system components. 87 | 3. **Responsibilities:** The kernel (core of the operating system) runs in kernel mode and is responsible for managing system resources, executing privileged instructions, and ensuring system security. It handles tasks such as memory management, device driver execution, and process scheduling. 88 | 4. **Privileged Instructions:** Kernel mode allows the execution of privileged instructions that can control hardware, perform context switching, and perform critical system operations. 89 | 90 | **User Mode:** 91 | 92 | 1. **Privilege Level:** User mode is the lower privilege level for programs and processes. It is designed to restrict their access to system resources. 93 | 2. **Access to Resources:** In user mode, a program or process is restricted from directly accessing hardware and certain privileged instructions. It has a limited, controlled view of the system resources. 94 | 3. **Responsibilities:** User mode is where most applications and user-level processes run. These programs have access to a controlled set of system functions and services, provided by the kernel through well-defined interfaces. 95 | 4. **Privileged Instructions:** Privileged instructions, which can control hardware and manipulate system resources, are not allowed in user mode. 96 | 97 | **Key Differences:** 98 | 99 | 1. **Privilege Level:** The most significant difference is the privilege level. Kernel mode operates with the highest level of privilege, whereas user mode is a lower privilege level. 100 | 101 | 2. **Access Control:** Kernel mode has unrestricted access to hardware and system resources, while user mode has limited and controlled access. 102 | 103 | 3. **Responsibilities:** Kernel mode is responsible for core operating system functions and managing system resources, while user mode is where user applications and processes execute. 104 | 105 | 4. **Privileged Instructions:** Kernel mode can execute privileged instructions that control hardware and system-level operations, while user mode is restricted from executing such instructions. 106 | 107 | The differentiation between kernel mode and user mode is fundamental for ensuring system stability and security. The operating system uses these privilege levels to prevent user-level programs from interfering with critical system operations and resources. Unauthorized access to kernel mode can result in system crashes and security vulnerabilities, which is why modern operating systems enforce this separation of privilege levels. -------------------------------------------------------------------------------- /Basic Knowledge/Programming Language/Golang.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Basic Knowledge/Programming Language/Golang.md -------------------------------------------------------------------------------- /Basic Knowledge/Programming Language/Java.md: -------------------------------------------------------------------------------- 1 | ## Java Basic 2 | 3 | ### Concepts 4 | 5 | - Java Virtual Machine(JVM):an abstract machine, provides a runtime environment in which java bytecode can be executed 6 | - Java Runtime Enviroment (JRE): contains the parts of java libraries required to run java programs 7 | - Java Development Kit(JDK): includes everything including compiler, JRE, debuggers, etc. 8 | 9 | ### Features 10 | 11 | - Platform Independent: conpiler converts source code to bytecode, then JVM executes the bytecode 12 | - Object Oriented: abstraction, encapsulationl, inheritance, polymorphism 13 | - Robust: garbage collection, exception handling, memory allocation 14 | 15 | ## Variable 16 | 17 | ### Data Type 18 | 19 | Java is statically typed language where each variable and expression type is already known at compile time 20 | 21 | Primitive Data Type: 22 | 23 | - Boolean: true, false 24 | - Character: char 25 | - Integer: byte, short, int, long 26 | - Floating-Point: float, double 27 | 28 | Non-primitive Data Types: 29 | 30 | - String: a sequence of characters, immutable and can not be changed 31 | - Array: declared like other variables with [] after the data type, stored in contiguous memory and dynamically allocated 32 | 33 | ### Variable 34 | 35 | - local variables: created in the block, exists only within the block, must initialized 36 | - instance variables: declared in a class, created with an object and destroyed when object is destroyed, default initalized to 0 37 | - static variables: created at the start of program execution and destroyed when execution ends, default value 0 38 | 39 | ## Objected Oriented 40 | 41 | ### Concepts 42 | 43 | - class: user-defined prototype, represents the set of properties and methods 44 | - object: basic unit of OOP that represents real life entity 45 | 46 | - access modifier: defines the access type of the method 47 | - public: accessible in all classes 48 | - Protected: accessible within the package and in subclass 49 | - Private: accessible only in the class 50 | 51 | - 4pillars of OOP 52 | - Abstraction: identify only the required characteristics of an object, ignoring the irrelevant details 53 | - Encapsulation: a wrapping up of data under a single unit; bind together the code and data it manipulates 54 | - inheritance: one class is allowed to inherit the features of another class 55 | - Polymorphism: the ability to differentiate between entities with the same name efficiently 56 | 57 | ### Constructor 58 | 59 | Constructor: A constructor is a special method that is used to initialize objects. 60 | 61 | - constructors must have the same name as the class 62 | - constructors do not return any type 63 | - contructors are called only once at the time of object creation 64 | 65 | Copy Constructor: copy constructer is called when copy one object to another 66 | 67 | - no default copy constructor, if not defined then compile error 68 | - Non-primitive data type only create reference, if need copy, explicitly call copy constructor 69 | 70 | ### Inheritance 71 | 72 | - single inheritance: subclasses inherit the features from one superclass 73 | - multilevel inheritance: the inherited class is base class for other classes 74 | - hierarchical inheritance: one class serves as superclass for more than one subclass 75 | - multiple inheritance: one class can inherit from multiple interfaces but not classes 76 | 77 | ### Polymorphism 78 | 79 | - Compile-time Polymorphism: achieved by function overloading 80 | - change the number of parameters 81 | - change data types of the arguments 82 | - change the order of the parameters of methods 83 | - Runtime Polymorphism: acheived by method overriding 84 | - At run-time, Java determines the superclass version or subclass version of a method to be executed 85 | - A superclass reference variable can refer to a subclass object 86 | 87 | ### Abstraction 88 | 89 | In Java, abstraction is achieved by interfaces and abstract classes. 90 | 91 | abstraction provide a superclass without providing a complete implementation of every method 92 | 93 | - abstract class: a class declared with an abstract keyword 94 | - abstract method: a method that is declared without implementation 95 | 96 | interfaces is defined as an abstract type used to specify the behavior of a class, only have abstract methods 97 | 98 | - Interfaces use implements keyword for inheritance 99 | - interface supports multiple inheritance 100 | 101 | ### Encapsulation 102 | 103 | Encapsulation is defined as the wrapping up of data under a single unit. It binds together code and the data it manipulates 104 | 105 | - the variable or data is hidden from any other class and can be accessed only the the member function 106 | - encapsulation can be achieved by declaring all variables in the class as private and writing public methods in the class 107 | 108 | ## Exception 109 | 110 | - Built-in Exceptions 111 | - Checked Exception: compile time exceptions including IOException, SQLException, etc. 112 | - Unchecked Exception: compiler won't check the exceptions at compile time, including ClassCastException, etc. 113 | - User-Defined Exceptions 114 | 115 | ```java 116 | try { 117 | // block of code to monitor for errors 118 | } catch (ExceptionType1 ex1) { 119 | // exception handler for ExceptionType1 120 | } catch (ExceptionType2 ex2) { 121 | // exception handler for ExceptionType2 122 | } finally { 123 | // optional, block of code to be executed after try block ends 124 | } 125 | ``` 126 | 127 | ### Exception Handling 128 | 129 | - If an exception has occurred, the method creates and exception object and hands it off to JVM 130 | - JVM searches the call stack to find the method that can handle the occurred exception (Exception handler) 131 | - JVM stats searching the method where exception occurred, and proceed through the call stack in reversed order 132 | - if finds an appropriate handler, then passes the exception to it 133 | - If can not find an appropriate handler, call the default exception handler 134 | 135 | ## Collections 136 | 137 | ### List 138 | 139 | - ArrayList: a dynamic array, size increase or decrease automatically when collection grows or shrinks 140 | - LinkedList: every element is a separate object with data part and address part, linked using pointers and addresses 141 | - Vector: dynamic arrays, identical to ArrayList of implementation; Vector is synchronized but ArrayList is not 142 | - Stack: last in first out data structures 143 | 144 | ### Queue 145 | 146 | - PriorityQueue: based on the priority heap, ordered according to the natural ordering or by a comparator 147 | - ArrayDeque: double-ended queue 148 | 149 | ### Set 150 | 151 | - HashSet: inherent implementation of hash table data structure 152 | - LinkedHashSet: uses doubly linked list to store the data and retains the ordering of the elements 153 | - TreeSet: use a tree for storage, the order of the element is maintained using their natural ordering 154 | 155 | ### Map 156 | 157 | - HashMap: stores the data in key value pairs 158 | -------------------------------------------------------------------------------- /Basic Knowledge/Programming Language/Python.md: -------------------------------------------------------------------------------- 1 | # Python 2 | 3 | ## Multithread 4 | 5 | - **`threading` Module**: Python provides the `threading` module, a higher-level threading interface that includes classes and methods to create and manage threads, synchronize thread execution, and share data between threads. 6 | - **Global Interpreter Lock (GIL)**: Python's GIL ensures that only one thread executes Python bytecode at a time, which simplifies memory management but can limit the performance of CPU-bound multithreaded programs. However, I/O-bound and certain concurrent operations can still benefit significantly. 7 | - **Creating Threads**: Threads can be created by instantiating the `Thread` class with a target function and arguments. Threads are started with the `start()` method and wait for completion with the `join()` method. 8 | - **Synchronization Primitives**: The `threading` module provides various synchronization primitives such as Locks, RLocks, Semaphores, Conditions, and Barriers to manage concurrent access to resources and coordinate thread execution. 9 | 10 | ### Thread Create 11 | 12 | - **Creating a Thread:** To create a new thread, you instantiate a `Thread` object by providing the target callable and any arguments it needs: 13 | 14 | ```python 15 | from threading import Thread 16 | 17 | def my_function(arg1, arg2): 18 | print(f"Function running with arguments: {arg1}, {arg2}") 19 | 20 | thread = Thread(target=my_function, args=("Hello", "World")) 21 | ``` 22 | 23 | - **Starting a Thread:** After creating a `Thread` object, you start it with the `start()` method. This invokes the target callable in a new, concurrent thread: 24 | 25 | ```python 26 | thread.start() 27 | ``` 28 | 29 | - **Waiting for a Thread to Complete:** You can wait for a thread to finish its execution by calling its `join()` method. This is useful when the main program or other threads need to wait for the thread’s task to complete: 30 | 31 | ```python 32 | thread.join() 33 | ``` 34 | 35 | - **Daemon Threads:** Threads can be made daemon threads by setting the `daemon` attribute to `True`. Daemon threads are automatically killed when the main program exits, and they do not prevent the program from exiting. They are useful for background tasks that should not block the program from ending: 36 | 37 | ```python 38 | thread.daemon = True 39 | ``` 40 | 41 | Alternatively, you can set the `daemon` flag through the `Thread` constructor: 42 | 43 | ```python 44 | thread = Thread(target=my_function, args=("Hello", "Daemon"), daemon=True) 45 | ``` 46 | 47 | - **`start()`**: Begin the thread’s execution. 48 | - **`join(timeout=None)`**: Wait for the thread to finish. An optional `timeout` can be provided. 49 | - **`is_alive()`**: Returns `True` if the thread is still running. 50 | - **`name`**: You can get or set the name of the thread. This is useful for debugging. 51 | - **`daemon`**: A boolean value that indicates whether the thread is a daemon thread. 52 | 53 | ### Thread Synchronization 54 | 55 | - **Lock:** A `Lock` in Python's `threading` module is a synchronization primitive that is used to ensure that only one thread can execute a block of code or access a shared resource at a time. 56 | - **Acquire Lock:** Before accessing a shared resource, a thread must acquire the lock. If the lock is already held by another thread, the requesting thread will block until the lock becomes available. 57 | - **Release Lock:** After finishing with the shared resource, the thread must release the lock, allowing other threads to acquire it. 58 | - **With Statement:** Locks are typically used with the `with` statement in Python, which ensures that the lock is properly released, even if an exception occurs within the block. 59 | 60 | ```python 61 | import threading 62 | 63 | # A shared variable 64 | counter = 0 65 | # Create a lock 66 | lock = threading.Lock() 67 | 68 | # The target function for our threads 69 | def increment_counter(): 70 | global counter 71 | for _ in range(10000): 72 | # Acquire the lock before accessing the shared data 73 | # can also be used as follows 74 | # lock.acquire() 75 | # counter += 1 76 | # lock.release() 77 | with lock: 78 | counter += 1 79 | 80 | # Create threads 81 | threads = [threading.Thread(target=increment_counter) for _ in range(10)] 82 | 83 | # Start threads 84 | for thread in threads: 85 | thread.start() 86 | 87 | # Wait for all threads to complete 88 | for thread in threads: 89 | thread.join() 90 | 91 | print(f"Final counter value: {counter}") 92 | ``` 93 | 94 | - **Conditional Variable:** A threading Condition Variable is a synchronization primitive that allows threads to wait for certain conditions to be met before proceeding. 95 | - **Waiting for a Condition**: A thread that needs a certain condition to be true before it can proceed calls `wait()`, which releases the lock and blocks the thread until another thread calls `notify()` or `notify_all()`. 96 | - **Notifying about a Condition Change**: A thread that changes a condition in such a way that it might allow waiting threads to proceed calls `notify()` to wake up one waiting thread or `notify_all()` to wake up all waiting threads. 97 | 98 | ```python 99 | import threading 100 | import time 101 | 102 | # Shared resource and condition variable 103 | items = [] 104 | condition = threading.Condition() 105 | 106 | # Producer thread class 107 | class Producer(threading.Thread): 108 | def run(self): 109 | global items 110 | for i in range(5): 111 | with condition: # Acquire the lock 112 | print(f"Producing item {i}") 113 | items.append(i) # Add an item to the resource 114 | condition.notify() # Notify the consumer that an item is available 115 | time.sleep(1) 116 | 117 | # Consumer thread class 118 | class Consumer(threading.Thread): 119 | def run(self): 120 | global items 121 | while True: 122 | with condition: # Acquire the lock 123 | if not items: 124 | condition.wait() # Wait for an item to be produced 125 | item = items.pop(0) 126 | print(f"Consuming item {item}") 127 | time.sleep(2) 128 | 129 | producer = Producer() 130 | consumer = Consumer() 131 | 132 | consumer.start() 133 | producer.start() 134 | 135 | producer.join() 136 | consumer.join() 137 | 138 | ``` 139 | 140 | - **Semaphore:** a semaphore allows a fixed number of threads to access a shared resource simultaneously. This is useful in scenarios where you want to limit the number of concurrent accesses to a resource. 141 | - **Initialization**: `sem = threading.Semaphore(value=N)` where `N` is the initial number of permits. By default, `N` is 1. 142 | - **Acquiring a Permit**: `sem.acquire(blocking=True, timeout=None)` attempts to decrement the semaphore's counter, blocking until a permit is available unless `blocking` is set to `False`. 143 | - **Releasing a Permit**: `sem.release()` increments the semaphore's counter, potentially unblocking a waiting thread. 144 | 145 | ```python 146 | import threading 147 | import time 148 | import random 149 | 150 | # A semaphore to limit access to a resource 151 | max_connections = 3 152 | pool_semaphore = threading.Semaphore(max_connections) 153 | 154 | def access_database(thread_id): 155 | print(f"Thread {thread_id} is waiting to access the database.") 156 | with pool_semaphore: 157 | # Simulate database access 158 | print(f"Thread {thread_id} has accessed the database.") 159 | time.sleep(random.uniform(0.5, 2.0)) # Simulate time taken for database operations 160 | print(f"Thread {thread_id} is done with the database.") 161 | 162 | # Create multiple threads that need database access 163 | threads = [threading.Thread(target=access_database, args=(i,)) for i in range(10)] 164 | 165 | for thread in threads: 166 | thread.start() 167 | 168 | for thread in threads: 169 | thread.join() 170 | 171 | print("All threads have completed their database operations.") 172 | 173 | ``` 174 | 175 | - **Barrier:** A threading Barrier is a synchronization primitive used to ensure that a fixed number of threads wait at a certain point in their execution until all of the threads have reached this point. 176 | - **Initialization**: `barrier = threading.Barrier(parties, action=None, timeout=None)` where `parties` is the number of threads that must call `wait()` for the barrier to trip, `action` is a callable to be executed by one of the threads when the barrier is tripped, and `timeout` is the default timeout for the `wait()` method. 177 | - **Waiting at the Barrier**: Each thread calls `barrier.wait(timeout=None)` to wait for the other threads. The call blocks until the specified number of threads have called `wait()`. Once the last thread calls `wait()`, all threads are simultaneously released. 178 | - **Resetting the Barrier**: The `reset()` method resets the barrier to its initial state. It can be used if you need to reuse the barrier. 179 | - **Aborting the Barrier**: The `abort()` method puts the barrier into a broken state. This causes all current and future calls to `wait()` to fail with a `BrokenBarrierError`, which can be useful for error handling. 180 | 181 | ```python 182 | import threading 183 | import time 184 | 185 | # Define a barrier for 3 threads 186 | barrier = threading.Barrier(3) 187 | 188 | def phase1(): 189 | print(f"Thread {threading.current_thread().name} is starting phase 1") 190 | time.sleep(1) # Simulate work 191 | print(f"Thread {threading.current_thread().name} is waiting at the barrier") 192 | barrier.wait() # Wait for other threads 193 | print(f"Thread {threading.current_thread().name} passed the barrier") 194 | 195 | def phase2(): 196 | print(f"Thread {threading.current_thread().name} is starting phase 2") 197 | time.sleep(1) # Simulate work 198 | print(f"Thread {threading.current_thread().name} is waiting at the barrier") 199 | barrier.wait() # Wait for other threads 200 | print(f"Thread {threading.current_thread().name} passed the barrier and is completing") 201 | 202 | # Create threads for each phase 203 | threads = [threading.Thread(target=phase1) for _ in range(3)] + [threading.Thread(target=phase2) for _ in range(3)] 204 | 205 | # Start and join threads 206 | for thread in threads: 207 | thread.start() 208 | 209 | for thread in threads: 210 | thread.join() 211 | 212 | print("All threads have completed their work") 213 | 214 | ``` 215 | 216 | - **Event:** An event object manages an internal flag that can be set to true with the `set()` method and reset to false with the `clear()` method. Threads can wait for the flag to be set to true using the `wait()` method, which blocks until the flag becomes true. 217 | - **Set**: Marks the event as true. All threads waiting for it to become true are awakened. 218 | - **Clear**: Resets the event's state to false. 219 | - **Wait**: Blocks until the event's state becomes true. An optional timeout can be specified. 220 | 221 | ```python 222 | import threading 223 | import time 224 | 225 | # Create the Event object 226 | event = threading.Event() 227 | 228 | def task(): 229 | print("Task started. Doing some work.") 230 | time.sleep(2) # Simulate a task taking some time 231 | print("Task completed. Notifying the main thread.") 232 | event.set() # Signal that the task is done 233 | 234 | # Create and start the worker thread 235 | worker = threading.Thread(target=task) 236 | worker.start() 237 | 238 | # Main thread waiting for the task to complete 239 | print("Main thread waiting for the task to complete.") 240 | event.wait() # Block until the event is set by the worker thread 241 | print("Main thread notified. Continuing execution.") 242 | 243 | # Optionally, you might want to join the thread 244 | # to wait for it to complete before exiting the program. 245 | worker.join() 246 | 247 | ``` 248 | 249 | -------------------------------------------------------------------------------- /Basic Knowledge/Statistics/Stats Inference.md: -------------------------------------------------------------------------------- 1 | # Stats Inference 2 | 3 | [toc] 4 | 5 | ## Stats Fundamental 6 | 7 | ### Convergence 8 | 9 | - Convergence in distribution 10 | 11 | $$ 12 | \sqrt n(\bar X_n-\mu)\stackrel{d}{\longrightarrow} N(0,\sigma^2) 13 | $$ 14 | 15 | - Convergence in probability 16 | 17 | $$ 18 | \bar X_n\stackrel{p}{\longrightarrow} \mu 19 | $$ 20 | 21 | If $X_n$ convergence in probability to $X$, then $X_n$ convergence in distribution to $X$. 22 | 23 | ### Law of Large numbers 24 | 25 | $X_1,X_2,...,X_n$ is i.i.d random variables with mean $\mu$ and finite variance. 26 | $$ 27 | \bar X_n=\frac{1}{n}\sum_{i=1}^nX_i\stackrel{p}{\longrightarrow}\mu 28 | $$ 29 | 30 | ### Central Limit Theorem 31 | 32 | $X_1,X_2,...,X_n$ is i.i.d random variables with mean $\mu$ and finite variance. 33 | $$ 34 | \sqrt n\frac{\bar X_n-\mu}{\sigma}\stackrel{d}{\longrightarrow}N(0,1) 35 | $$ 36 | 37 | ### Slutsky's Theorem 38 | 39 | If $X_n\stackrel{d}{\longrightarrow}X$ and $Y_n\stackrel{p}{\longrightarrow}c$, then the following 40 | $$ 41 | X_n+Y_n\stackrel{d}{\longrightarrow}X+c 42 | $$ 43 | 44 | $$ 45 | X_nY_n\stackrel{d}{\longrightarrow}cX 46 | $$ 47 | 48 | $$ 49 | If\ c\ne0,\frac{X_n}{Y_n}\stackrel{d}{\longrightarrow}\frac{X}{c} 50 | $$ 51 | 52 | ### Continuous Mapping Theorem 53 | 54 | Let $\bold X_i\in R$, and $\bold X_i\stackrel{d}{\longrightarrow}\bold X$, then if $g$ is a continuous function, we have $g(\bold X_i)\stackrel{d}{\longrightarrow} g(\bold X)$ in distribution. 55 | 56 | ### Delta Method 57 | 58 | Suppose that $\sqrt n(T_n-t)\stackrel{d}{\longrightarrow}N(0,v)$. If $g(x)$ is a function with derivative $g'(t)$, then 59 | $$ 60 | \sqrt n(g(T_n)-g(t))\stackrel{d}{\longrightarrow}g'(t)N(0,v) 61 | $$ 62 | 63 | ## Parameter Estimation 64 | 65 | An estimator of $\theta_0\in \theta$ is a statistic whose primary goal is to estimate $\theta_0$, If {$X_1=x_1,...,X_n=x_n$} are observed, then $T(x_1,...,x_n)$ is called an estimate of $\theta_0$. 66 | 67 | - An estimator $\theta_n$ of $\theta_0$ is unbiased if $E[\hat \theta_n]=\theta_0$ 68 | - An estimator $\theta_n$ of $\theta_0$ is consistent if $\hat \theta_n\stackrel{p}{\longrightarrow}\theta_0$ 69 | 70 | - Mean Square Error is a measure of accuracy of an estimator 71 | 72 | $$ 73 | MSE(\hat \theta_n)=tr[var(\hat \theta_n)]+||bias(\hat \theta_n,\theta_0)||_2^2 74 | $$ 75 | 76 | $$ 77 | Eff(\hat \theta_n, \tilde \theta_n)=\frac{MSE(\hat \theta_n)}{MSE(\tilde \theta_n)} 78 | $$ 79 | 80 | ### Method of Moments 81 | 82 | - Population moments: $\mu_k(\theta_0)=E[X_1^k]$ 83 | - Empirical moments: $\hat \mu_k=\bar X_n^k=\frac{1}{n}\sum_{i=1}^n X_i^k$ 84 | 85 | When the sample size gets larger, the empirical moments will converge to the population moments. Typically we will choose the number of moments to match the dimension of the parameter that we want to estimate. 86 | $$ 87 | \sqrt n(\hat \theta_n^{MM}-\theta_0)\stackrel{d}{\longrightarrow}N(0,V(\theta_0)) 88 | $$ 89 | The method of moments estimator is a consistent and asymptotically normally distributed, where $V(\theta_0)=[\triangledown\psi^{-1}(M(\theta_0))]\Sigma(\theta_0)[\triangledown\psi^{-1}(M(\theta_0))]^T$ 90 | 91 | ### Maximum Likelihood Estimation 92 | 93 | $X_1,...,X_n$ is i.i.d. sample of random variables with density function $f(x;\theta_0)$, the likelihood function is 94 | $$ 95 | L(\theta;x_1,...,x_n)=\prod_{i=1}^nf(x_i;\theta) 96 | $$ 97 | 98 | $$ 99 | \hat \theta_n^{MLE}=argmax\log L(\theta;X_1,...X_n) 100 | $$ 101 | 102 | The maximum likelihood estimator is consistent and asymptotically normally distributed. 103 | $$ 104 | I(\theta)=-E[\frac{\partial^2}{\partial \theta\partial \theta^T}\log L(\theta;X_1)] 105 | $$ 106 | 107 | $$ 108 | I_n(\theta)=-\sum_{i=1}^n E[\frac{\partial^2}{\partial \theta\partial \theta^T}\log L(\theta;X_i)]=nI(\theta) 109 | $$ 110 | 111 | $$ 112 | \sqrt n(\hat \theta_n^{MLE}-\theta_0)\stackrel{d}{\longrightarrow}N(0,I(\theta_0)^{-1}) 113 | $$ 114 | 115 | Cramer-Rao Lower Bound: $X_1, X_2,...,X_n$ is i.i.d. sample of random variables with density function $f(x;\theta_0)$. For an unbiased estimator $T(\bold X)$ 116 | $$ 117 | var(T(\bold X))\ge \frac{1}{nI(\theta_0)} 118 | $$ 119 | 120 | ## Hypothesis Testing 121 | 122 | ### Confidence Interval 123 | 124 | The random interval $[L(\bold X), U(\bold X)]$ is called a $\alpha$ level confidence interval for $\theta$ if 125 | $$ 126 | P[L(\bold X)\le\theta\le U(\bold X)]\ge \alpha 127 | $$ 128 | $\R(\bold X)$ is a subset of $\theta$ depending on $\bold X$. $R(\bold X)$ is called a $\alpha$ confidence region if 129 | $$ 130 | P[\bold \theta\in R(\bold X)]\ge\alpha 131 | $$ 132 | A random variable $g(\bold X;\theta)=g(X_1,...,X_n;\theta)$ is a pivot if the distribution of $g(\bold X;\theta)$ is independent of the parameter $\theta$. Use $P(a\le g(\bold X;\theta)\le b)=\alpha$ to get the confidence interval $[a, b]$. 133 | 134 | The bootstrap method is a powerful computer based alternative the construction of the confidence intervals. 135 | 136 | ### Hypothesis Testing 137 | 138 | - H0: $\theta \in \theta_0$ is the null hypothesis 139 | - H1: $\theta \in \theta_1$ is the alternative hypothesis 140 | - Rejection region: R={X: H_0 rejected} 141 | - Type 1 error: $\alpha=P_{\theta}(R)$, for all $\theta\in \theta_0$ 142 | - Type 2 error: $1-\beta=1-P_\theta(R)$, for all $\theta \in\theta/\theta_0$ 143 | 144 | $[L(X), U(X)]$ Is a $(1-\alpha)$ level confidence interval for parameter $\theta$. Then the test defined with rejection region ${\theta_0\notin [L(X),U(X)]}$ is a hypothesis test with significance level $\alpha$, where null hypothesis H0 and alternative hypothesis H1 145 | 146 | p-value of a test statistics $T_n$ is the probability under H0 of obtaining the observed value of $T_n$ or a more extreme one. 147 | $$ 148 | p=P_{H_0}[T_n\ge t_{obs}] 149 | $$ 150 | 151 | - The smaller the p-value, the more confident we are about rejecting H0 152 | - Commonly we consider he thresholds 0.05 or 0.01 153 | - p-value do not give the probability the the null hypothesis is true and do not indicate which alternative is best supported 154 | 155 | ### Likelihood Ratio Test 156 | 157 | Likelihood ratio test is method of hypothesis testing related to the maximum likelihood estimators. $\bold X=(X_1,...,X_n)$ is random variables with joint distribution depending on $\theta$, $L(\theta|\bold X)$ is the likelihood function. LRT statistic for the null hypothesis and alternative hypothesis is 158 | $$ 159 | \lambda(\bold X)=\frac{sup_{\theta_0}L(\theta|\bold X)}{sup_{\theta_1}L(\theta|\bold X)} 160 | $$ 161 | The rejection region is {$\bold X:\lambda(\bold X)\le c$}, where c is a number between 0 and 1. 162 | 163 | ### Comparison of Tests 164 | 165 | The level of a test only controls the type 1 error. Given two tests that achieve the same rejection level, the better test among them will be the one that minimize the type 2 error. 166 | 167 | Neyman-Pearson Lemma: suppose that $\bold X$ have joint density function $f(\bold x;\theta)$. The most powerful test at level $\alpha$ is given by 168 | $$ 169 | R={\bold x: \frac{f_{\theta_1}(\bold x)}{f_{\theta_0}(\bold x)}>C_{\alpha}} 170 | $$ 171 | Where $C_{\alpha}$ is a constant such that $P_{\theta_0}\le\alpha$. 172 | 173 | ### Wald Test 174 | 175 | When the unknown parameter $\theta$ is multivariate many hypothesis can be expressed as linear combinations of the components of $\theta$. 176 | $$ 177 | H_0: A\theta=a_0 178 | $$ 179 | 180 | $$ 181 | H_1:A\theta\neq a_0 182 | $$ 183 | 184 | $$ 185 | W_n=(A\hat \theta-a_0)^T(AI_n(\hat \theta)A^T)^{-1}(A\hat \theta-a_0) 186 | $$ 187 | 188 | $$ 189 | W_n\stackrel{d}{\longrightarrow}X^2_q 190 | $$ 191 | 192 | ## Exponential Family 193 | 194 | Let $X=(X_1,...,X_n)$ have a joint distribution $F_{\theta}$. The family of distributions $\{F_{\theta}\}$ is a said to be k-parameter exponential family if its density or probability function can be written in the form 195 | $$ 196 | f(\bold x;\theta)=c(\theta)exp\{\sum_{j=1}^kQ_j(\theta)T_j(\bold x)\}h(\bold x) 197 | $$ 198 | 199 | ### Sufficient Statistic 200 | 201 | A statistic $T=T(\bold X)$ is a sufficient statistic for a parameter $\theta$ if for all sets A, $P[\bold X\in A|T=t]$ is independent of $\theta$ for all t in rane of T. 202 | 203 | Suppose that $X=(X_1,...,X_n)$ has a joint density or frequency function $f(\bold x;\theta)$. Then $T=T(\bold X)$ is sufficient for $\theta$ if and only if, 204 | $$ 205 | f(\bold x, \theta)=g(T(\bold x);\theta)h(\bold x) 206 | $$ 207 | A sufficient statistic $T(\bold X)$ is called a minimal sufficient statistics if, for any other sufficient statistic $T'(\bold X)$, $T(\bold x)$ is a function of $T'(\bold x)$ 208 | $$ 209 | \frac{f(\bold x;\theta)}{f(\bold y;\theta)}=\frac{g(T(\bold x);\theta)h(\bold x)}{g(T(\bold y);\theta)h(\bold y)}=H(\bold x, \bold y) 210 | $$ 211 | 212 | ### Estimation 213 | 214 | If the data generating distribution belongs to the exponential family it follows that 215 | 216 | - MLE will be a function of sufficient statistics 217 | - Under regularity conditions, consistency and asymptotic normality of the MLE can used for statistical inference 218 | - Optimality: minimal sufficient statistics can be shown to be complete for full-rank exponential families. Because of this, they can be used to find the unique Minimum Variance Unbiased Estimators(MVUE) via Rao-Blackwellization 219 | 220 | Let $X_1,...,X_n$ be an i.i.d. sample of random variables with density or frequency function $f(x;\theta)$. Let $T(\bold X)$ be a sufficient statistic for $\theta_0$ and $U(\bold X)$ and unbiased estimator of $\theta_0$. If $\tilde U(\bold X)=E[U(\bold X)|T(\bold X)]$, then $\tilde U(\bold X)$ is an unbiased estimator of $\theta$, and $var[\tilde U(\bold X)]\le var[U(\bold X)]$ for all $\theta$. 221 | 222 | ## Multivariate Normal 223 | 224 | $$ 225 | f(\bold x)=\prod_{i=1}^d\frac{1}{\sqrt{2\pi}}exp(-\frac{1}{2}x_i^2)=\frac{1}{(2\pi)^{d/2}}exp(-\frac{1}{2}\sum_{i=1}^dx_i^2)=\frac{1}{(2\pi)^{d/2}}exp(-\frac{1}{2}\bold x^T\bold x) 226 | $$ 227 | 228 | $$ 229 | f(\bold x;\mu,\Sigma)=\frac{1}{(2\pi)^{d/2}|\Sigma|^{1/2}}exp(-\frac{1}{2}(\bold x-\mu)^T\Sigma^{-1}(\bold x-\mu)) 230 | $$ 231 | 232 | ### Maximum Likelihood Estimation 233 | 234 | $$ 235 | \hat\mu_{ML}=\bar {\bold X}_n=\frac{1}{n}\sum_{i=1}^n\bold X_i 236 | $$ 237 | 238 | $$ 239 | \Sigma_{ML}=\bold S_n=\frac{1}{n}\sum_{i=1}^n (\bold X_i - \bar{\bold X}_n)(\bold X_i - \bar{\bold X}_n)^T 240 | $$ 241 | 242 | ### Central Limit Theorem 243 | 244 | $$ 245 | \bar X_n\stackrel{P}{\longrightarrow}\mu 246 | $$ 247 | 248 | $$ 249 | \sqrt n(\bar X_n-\mu)\stackrel{D}{\longrightarrow}N(0,C) 250 | $$ 251 | 252 | ### Conditional Distribution 253 | 254 | $$ 255 | \mu_{A|B}=\mu_A+\Sigma_{AB}\Sigma_{BB}^{-1}(x_B-\mu_B) 256 | $$ 257 | 258 | $$ 259 | \Sigma_{A|B}=\Sigma_{AA}-\Sigma_{AB}\Sigma_{BB}^{-1}\Sigma_{BA} 260 | $$ 261 | 262 | ## Missing Data 263 | 264 | ### Missing Pattern 265 | 266 | - Missing completely at random(MCAR): Missing data are independent of the variable itself. 267 | 268 | $$ 269 | P(R=0|Y,X)=P(R=0) 270 | $$ 271 | 272 | - Non-ignorable non-response: The missing pattern is function of the variable itself. 273 | 274 | $$ 275 | P(R=0|Y,X) 276 | $$ 277 | 278 | - Missing at Random: The missing pattern is only a function of the auxiliary variable X. 279 | 280 | $$ 281 | P(R=0|Y,X)=P(R=0|X) 282 | $$ 283 | 284 | ### Covariance Matrix Estimation 285 | 286 | - Complete Case Analysis: drop all observations containing missing entries, and only look at the complete records. 287 | - Available Case Analysis: we use all univariate information available. 288 | - Imputation: replaces the missing entries by mean of corresponding column. 289 | 290 | - EM Algorithm: replace the unobserved part of likelihhod with its conditional expectation given the data. 291 | - E-step: compute $Q(\theta,\theta^{(k)})=E[log f(Y,U;\theta)|Y=y;\theta^{(k)}]$ 292 | - M-step: maximize $Q(\theta;\theta^{(k)})$ Over $\theta$ to obtain $\theta^{(k+1)}=argmax_{\theta}Q(\theta;\theta^{(k)})$ 293 | -------------------------------------------------------------------------------- /Basic Knowledge/Statistics/Stats Modelling.md: -------------------------------------------------------------------------------- 1 | # Stats Modelling 2 | 3 | [toc] 4 | 5 | ## Markov Model 6 | 7 | $X_t$ denote a process taking values in a state space $S=\{1,...,S\}$. The data is of the form $X_0=s_0,...X_k=s_{t_k}$ 8 | $$ 9 | P(X_0=s_0,...X_k=s_{t_k})=P(X_0=s_0)\prod_{j=1}^nP(X_{t_j}=s_j|X_0=s_0,X_{t_1}=s_1,...,X_{t_{j-1}}=s_{j-1}) 10 | $$ 11 | 12 | - Markov Property: given the present, the future is independent of the past(first order) 13 | 14 | $$ 15 | P(X_0=s_0,...X_k=s_{t_k})=P(X_0=s_0)\prod_{i=1}^nP(X_{t_j}|X_{t_{i-1}}=s_{i-1}) 16 | $$ 17 | 18 | - Stationary: the conditional probabilities only depend on the time differences. $P(X_t=s|X_u=r)=P(X_{t-u}=s|X_0=r)$ 19 | - Transition Probabilities: $p_{rs}=P(X_1=s|X_0=r)$ 20 | 21 | ### Likelihood 22 | 23 | $$ 24 | P(X_0=s_0,...X_k=s_{t_k})=P(X_0=s_0)\prod_{i=1}^{k-1}p_{s_is_{i+1}}=p_0\prod_{r=1}^S\prod_{s=1}^Sp_{rs}^{n_{rs}} 25 | $$ 26 | 27 | $$ 28 | l(\bold P)=\sum_{r=1}^S\sum_{s=1}^Sn_{rs}\log p_{rs}+\log p_0 29 | $$ 30 | 31 | ## Time Series 32 | 33 | ### Measure of Dependence 34 | 35 | - Auto Covariance: $\gamma(s,t)=cov(Y_s,Y_t)=E[(Y_t-\mu_t)(Y_s-\mu_s)]$ 36 | - Auto Correlation: $\rho(t,s)=corr(Y_t,Y_s)=\frac{\gamma(t,s)}{\sqrt{\gamma(t,t)\gamma(s,s)}}$ 37 | - Partial Auto Correlation: $\rho'_t=corr(Y_0,Y_t|Y_1,...,Y_{t-1})$ 38 | 39 | ### Stationarity 40 | 41 | - Strict Stationarity: for any finite subset, the joint distribution of $Y_{t+s}$ and $Y_s$ is the same 42 | - Weak Stationarity: $E[Y_t]=\mu$ and $cov(Y_s,Y_{s+t})=\gamma_t$ does not depend on s 43 | - White Noise: A stochastic process $\{Y_t\}$ is called white noise if its elements are uncorrelated with mean 0 and variance $\sigma^2$ 44 | 45 | ### Autoregressive Models 46 | 47 | AR model of order p: 48 | $$ 49 | Y_t-\mu=\sum_{j=1}^{p}\alpha_j(Y_{t-k}-\mu)+\epsilon_t 50 | $$ 51 | 52 | - $Y_t$ is stationary 53 | - AR(p) is a p-order Markov process 54 | - The partial autocorrelation is $\alpha_1$, ..., $\alpha_p$ 55 | 56 | ### Moving Average Models 57 | 58 | MA model of order q: 59 | $$ 60 | Y_t-\mu=\sum_{j=1}^q\beta_j\epsilon_{t-j}+\epsilon_t 61 | $$ 62 | 63 | - $E[Y_t]=\mu$ and $var(Y_t)=\sigma^2(1+\beta_1^2+...+\beta_q^2)$ for all t 64 | - This process is stationary and such that $\rho_t=0$ For $t>q$ 65 | - Stationary autoregressive processes and moving average processes are linear processes 66 | 67 | ### ARMA Models 68 | 69 | ARMA model of order p and q: 70 | $$ 71 | Y_t-\mu=\sum_{j=1}^p\alpha_j(Y_{t-j}-\mu)+\sum_{j=1}^q\beta_j\epsilon_{t-j}+\epsilon_t 72 | $$ 73 | 74 | ### Maximum Likelihood Estimation 75 | 76 | $Y_t$ is a Gaussian time series with mean zero and autocovariance function $\gamma(i,j)=E[X_iX_j]$. $\bold X_n=(X_1,...,X_n)^T$ and its covariance matrix $\bold T_n=E[\bold X_n\bold X_n^T]$. 77 | $$ 78 | L_n(\bold T_n)=\frac{1}{(2\pi)^{-n/2}|\bold T_n|}exp(-\frac{1}{2}\bold X_n^T\bold T_n\bold X_n) 79 | $$ 80 | 81 | - $\bold T_n$ is expressible in terms of $\theta=(\alpha_1,...,\alpha_p,\beta_1,...,\beta_q)^T$ and $\sigma^2$ for an ARMA(p,q) 82 | - Large sample distribution $\sqrt n(\hat \theta-\theta)\stackrel{d}{\longrightarrow}N(0,I(\theta)^{-1})$ 83 | 84 | ### ARIMA Models 85 | 86 | d-fold differencing can be shown to remove a polynomial trend of order d 87 | 88 | ARMA(p, q) with d-fold difference of $Y_t$ is known as integrated autoregressive moving average ARIMA(p, d, q) 89 | 90 | ### ARCH Models 91 | 92 | $$ 93 | Y_t=\sigma_t\epsilon_t 94 | $$ 95 | 96 | $$ 97 | \sigma_t^2=\beta_0+\beta_1Y_{t-1}^2 98 | $$ 99 | 100 | - $\sigma_t^2$ is increased if the previous observation was far from 0 101 | - Model captures changing volatility common in financial time series 102 | - GARCH Model: $\sigma_t^2=\beta_0+\beta_1Y_{t-1}^2+\delta\sigma_{t-1}^2$ 103 | 104 | ### SARIMA Models 105 | 106 | differencing can remove seasonality by lag-d differencing, but it is not the same as d-differencing which removes trends of order d 107 | 108 | ## Linear Regression 109 | 110 | ### Straight Line Regression 111 | 112 | $$ 113 | Y_i=\beta_0+\beta_1X_{i1}+...+\beta_dX_{id}+\epsilon_i,i=1,...,n 114 | $$ 115 | 116 | - Response variable: $Y_i$ 117 | - Covariates: $X_{i1}, X_{i2}, ..., X_{id}$ 118 | - Noise term: $\epsilon_i$, assumed to be i.i.d. with $E[\epsilon_i]=0$ and $var[\epsilon_i]=\sigma^2$ and independent of $X_{ij}$ 119 | 120 | $$ 121 | \hat \beta=argmin||\bold Y -\bold X\bold \beta||_2^2=(\bold X^T\bold X)^{-1}\bold X^T\bold Y=\beta+(\bold X^T\bold X)^{-1}\bold X^T\epsilon 122 | $$ 123 | 124 | - Consistency: $\hat \beta\stackrel{p}{\longrightarrow} \beta$ 125 | - Asymptotic normality: $\sqrt n(\hat \beta-\beta)\stackrel{d}{\longrightarrow}N(\bold 0,\sigma^2\bold Q^{-1})$, where $\bold Q=E[\bold X_1\bold X_1^T]$ 126 | 127 | ### Normal Linear Model 128 | 129 | $$ 130 | \hat \beta\sim N(\beta,\sigma^2(\bold X^T\bold X)^{-1}) 131 | $$ 132 | 133 | - $\hat \sigma^2=\frac{1}{n-p}||\bold Y-\bold X\hat \beta||_2^2$ is an unbiased estimator of $\sigma^2$. 134 | - $\hat \beta$ and $\hat \sigma^2$ are independent, and $\bar Y$ and $\sum_{i=1}^n(Y_i-\bar Y)$ are independent 135 | - for each parameter, $(\hat \beta_j-\beta_j)/\hat \sigma_{\hat \beta_j}\sim t_{n-p}$ 136 | 137 | ### Likelihood Ratio 138 | 139 | $$ 140 | \hat \beta=(\bold X^T\bold X)^{-1}\bold X^T\bold Y 141 | $$ 142 | 143 | $$ 144 | \hat \beta_1=(\bold X_1^T\bold X_1)^{-1}\bold X_1^T\bold Y 145 | $$ 146 | 147 | $$ 148 | 2(l(\hat \beta)-l(\hat \beta_1))=n\log (1+\frac{n-p}{p-q}\frac{p-q}{n-p}\frac{||\bold Y-\bold X_1\hat \beta_1||_2^2-||\bold Y-\bold X\hat \beta||_2^2}{||\bold Y-\bold X\hat \beta||_2^2}) 149 | $$ 150 | 151 | $$ 152 | 2(l(\hat \beta)-l(\hat \beta_1))=n\log (1+\frac{p-q}{n-p}F) 153 | $$ 154 | 155 | ### Box-Cox transformation 156 | 157 | $$ 158 | y^{(\lambda)}=(y^{\lambda}-1)/\lambda, \lambda=0\ else\ \log(y) 159 | $$ 160 | 161 | $$ 162 | f(\epsilon)=\frac{1}{\sqrt{2\pi \sigma^2}}exp(-\frac{\epsilon^2}{2\sigma^2}) 163 | $$ 164 | 165 | $$ 166 | f_Y(y)=f_{\epsilon}(g^{-1}(y))|\frac{\partial g^{-1}(y)}{\partial y}|=\frac{y_i^{\lambda-1}}{\sqrt{2\pi\sigma^2}}exp(-\frac{1}{2\sigma^2}(y_i-x_i^T\beta)^2) 167 | $$ 168 | 169 | $$ 170 | l=-\frac{1}{2}(n\log \sigma^2+\sum(y_i-\bold x_i^T\beta)^2)+(\lambda-1)\sum \log y_i 171 | $$ 172 | 173 | ### Goodness of fit 174 | 175 | $$ 176 | R^2=1-\frac{SS(\hat \beta)}{SS_0} 177 | $$ 178 | 179 | $$ 180 | Adjusted-R^2=1-\frac{SS(\hat \beta)}{SS_0}\frac{n-1}{n-d-1} 181 | $$ 182 | 183 | ### Stepwise Methods 184 | 185 | - Forward Selection: greedy model selection by sequentially increasing the dimension of the. problem by including the covariates that leas to the best imporvement of the model fit. 186 | - Backward Selection: Backward selection starts from the model containing all terms, and then successively drops the least significant term at each stage. 187 | 188 | Akaike Information Criterion (AIC) 189 | $$ 190 | AIC(\hat \beta)=n\log SS(\hat \beta)+2p 191 | $$ 192 | Bayesian Information Criterion (BIC) 193 | $$ 194 | BIC(\hat \beta)=n\log SS(\hat \beta)+p\log n 195 | $$ 196 | LASSO Regression 197 | $$ 198 | \hat \beta_{lasso}=argmin\sum_{i=1}^n(y_i-\beta_0-\sum_{j=1}^px_{ij}\beta_j)^2 199 | $$ 200 | 201 | $$ 202 | s.t. \sum_{j=1}^p|\beta_j|\le t 203 | $$ 204 | 205 | ## Generalized Linear Model 206 | 207 | ### Introduction 208 | 209 | - The response Y to follow any exponential family distribution, not just the normal distribution 210 | - Link the expectation of the response variable through a function g instead of making them equal to each other. The conditional variance is a known function of the mean parameter 211 | 212 | $$ 213 | g(\mu)=g(E[Y|\bold X=\bold x])=\bold x^T\beta=\eta 214 | $$ 215 | 216 | $$ 217 | V[Y_i|\bold X_i=\bold x_i]=\phi V(\mu_i) 218 | $$ 219 | 220 | - Canonical link: use the function that maps the mean of our response distribution to the parameter $\theta$ which we get by writing our density in the form 221 | 222 | $$ 223 | g(\mu)=(b')^{-1}(\mu) 224 | $$ 225 | 226 | - Consistency and asymptotic normality 227 | 228 | $$ 229 | Consistency:\hat \beta\stackrel{p}{\longrightarrow} \beta 230 | $$ 231 | 232 | $$ 233 | Asymptotic\ normality:\sqrt n(\hat \beta-\beta)\stackrel{d}{\longrightarrow} N(0,I(\beta)^{-1}),I(\beta)=E[-\frac{\partial^2}{\partial \beta\partial \beta^{T}}\log f(Y_i,X_i,\beta)] 234 | $$ 235 | 236 | ### Goodness of fit 237 | 238 | One way to access the adequacy of a model is to compare its fit with a saturated model. The maximum likelihood achievable in a model with n parameters is $l_n(y, y)$. The one evaluated at the maximum likelihood fits $\hat \mu_i=g^{-1}(x_i^T\hat \beta)$ 239 | 240 | - Scaled deviance generalizes the sum of squares errors used in linear regression 241 | 242 | $$ 243 | D^*(\bold y,\hat \mu)=\sum_{i=1}^n2[l_n(y_i;y_i)-l_n(\hat \mu_i;y_i)]=\sum_{i=1}^n d_i 244 | $$ 245 | 246 | - Pearson's Statistic 247 | 248 | $$ 249 | P=\sum_{i=1}^n\frac{(y_i-\hat\mu_i)^2}{\phi V(\hat \mu_i)} 250 | $$ 251 | 252 | ## Survival Data 253 | 254 | - The data of interest represent the length of time until some event happens 255 | - Frequently survival data are note completely observed because the event of interest has not occurred at the time the data was recorded 256 | 257 | ### Function 258 | 259 | - Survival function: survival function is the complement of the cdf which gives the probability of surviving at time t 260 | 261 | $$ 262 | S(t)=P(T>t)=1-P(T\le t)=1-F(t) 263 | $$ 264 | 265 | - Harzard function 266 | 267 | $$ 268 | h(t)=lim_{\delta->0}\frac{P(t\le T\le t+\delta|T\ge t)}{\delta}=\frac{f(t)}{1-F(t)}=-\frac{\partial S(t)}{\partial t} 269 | $$ 270 | 271 | ### Censoring 272 | 273 | $$ 274 | (Y_i,\delta_i)=\{min(T_i,C_i),||-_{T_iy_i\ (event\ happened) 283 | $$ 284 | 285 | $$ 286 | (Y_i,\delta_i)=(y_i,0)\ when\ T_i\ge y_i,C_i=y_i\ (event\ not\ happened\ yet) 287 | $$ 288 | 289 | ### Likelihood 290 | 291 | $$ 292 | P(Y_i=y_i,\delta_i=1)=f_{\theta}(y_i)(1-G(y_i)) 293 | $$ 294 | 295 | $$ 296 | P(Y_i=y_i,\delta_i=0)=(1-F_{\theta}(y_i))g(y_i) 297 | $$ 298 | 299 | $$ 300 | L(\theta;y,\delta)=\prod f_{\theta}(y_i)^{\delta_i}(1-F_{\theta}(y_i))^{1-\delta_i} 301 | $$ 302 | 303 | $$ 304 | \hat \theta_{ML}=argmax\sum(\delta_i\log f_{\theta}(y_i)+(1-\delta_i)\log (1-F_{\theta}(y_i))) 305 | $$ 306 | 307 | ### Kaplan-Meier Estimator 308 | 309 | $$ 310 | \hat S(t)=\prod (1-\frac{d_i}{r_i}) 311 | $$ 312 | 313 | $$ 314 | d_i=\sum_{j=1}^n\delta_j 1_{Y_j=t_i}(failures\ at\ t_i) 315 | $$ 316 | 317 | $$ 318 | r_i=\sum_{j=1}^n1_{Y_j\ge t_i}(at\ risk\ at\ t_i) 319 | $$ 320 | 321 | ### Cox Regression 322 | 323 | $$ 324 | \lambda(t|X)=\lambda_0(t)exp(X^T\beta) 325 | $$ 326 | 327 | $\lambda_0(t)=\lambda(t|X=0)$ is called baseline hazard function. This is a semi-parametric model because the baseline hazard function is fully non-parametric. 328 | $$ 329 | \frac{\lambda(t|X=1)}{\lambda(t|X=0)}=exp(\beta) 330 | $$ 331 | 332 | $$ 333 | \log(\frac{\lambda(t|X_1=x_1,...,X_j=x_j+1,...,X_k=x_k)}{\lambda(t|X_1=x_1,...,X_j=x_j,...,X_k=x_k)})=\beta_j 334 | $$ 335 | 336 | With one unit increase in $X_j$ while other covariates remain the same, the log ratio of hazard functions is $\beta_j$ times of originial log ratio of hazard function. 337 | 338 | ## Nonparameteric Regression 339 | 340 | ### Statistical Model 341 | 342 | $$ 343 | Y_i=f(X_i)+\epsilon_i,i=1,2,...,n 344 | $$ 345 | 346 | - $Y_i$: Response variable 347 | - $X_i$: Covariate 348 | - Unknown smooth regression function f 349 | - Noise term $\epsilon_i$, assumed to be i.i.d. and independent of $X_i$ 350 | 351 | ### Kernel Smoothing 352 | 353 | $$ 354 | \hat f_h(x)=\frac{1}{nh}\sum_{i=1}^nK(\frac{x_i-x}{h}) 355 | $$ 356 | 357 | Nadaraya-Watson Estimator: a weighted average of the response variables Y within the bandwidth centered at X 358 | $$ 359 | \hat f_{NW}(x)=\sum_{i=1}^n\frac{K(\frac{x-x_i}{h})y_i}{\sum_{j=1}^nK(\frac{x-x_j}{h})} 360 | $$ 361 | Local Polynomials Regression: Taylor's theorem can motivate the use of local polynomial fits since it suggests that f(x) can be estimated using the observed points $x_i$ 362 | $$ 363 | f(x)=\beta_0+\beta_1(x-x_i)+...+(\beta_p)(x-x_i)^p 364 | $$ 365 | 366 | $$ 367 | \hat \beta_x=argmin(\sum_{i=1}^n(y_i-\beta_{x0}-\beta_{x1}(x-x_i)-...-\beta_{xp}(x-x_i)^p)^2K(\frac{x-x_i}{h})) 368 | $$ 369 | 370 | $$ 371 | \hat \beta_x=argmin((y-X_x\beta_x)^TW_x(y-X_x\beta_x))=(X_x^TW_xX_x)^{-1}X_x^TW_xy 372 | $$ 373 | 374 | ### Spline Smoothing 375 | 376 | Approximate $f(x)$ with a linear combination of known functions $b_j(x)$ 377 | $$ 378 | f(x)=\sum_{j=0}^K\beta_j b_j(x) 379 | $$ 380 | 381 | -------------------------------------------------------------------------------- /Frameworks/AWS/IAM.md: -------------------------------------------------------------------------------- 1 | [toc] 2 | 3 | # IAM 4 | 5 | ## Overview 6 | 7 | **Prompt: Tell me some basics about AWS IAM.** 8 | 9 | AWS Identity And Access Management Fundamentals 10 | 11 | IAM plays a critical role in ensuring the security of your AWS environment by controlling who can do what within your AWS account. Properly configuring and managing IAM is essential for maintaining the security and compliance of your AWS infrastructure. 12 | 13 | 1. **User and Permission Management:** 14 | - Create and manage user accounts, groups, and roles. 15 | - Users can be individuals or applications. 16 | - Groups and roles help organize and assign permissions. 17 | 2. **Fine-Grained Access Control:** 18 | - Granular control over who can access AWS resources and what actions they can perform. 19 | - Define policies for specific resource access and operations. 20 | 21 | ## Identities 22 | 23 | ### Users 24 | 25 | **Prompt: Give me an introduction about AWS IAM Users.** 26 | 27 | Certainly, here's an introduction to AWS IAM Users with several bullet points: 28 | 29 | - **AWS IAM Users:** 30 | - AWS Identity and Access Management (IAM) users are entities within an AWS account. 31 | - They represent individuals, applications, or services that require access to AWS resources. 32 | - IAM users are separate from the AWS account owner and serve as the basis for controlled access. 33 | 34 | - **Access Control and Security:** 35 | - IAM users are created and managed to control access to AWS services and resources. 36 | - They allow organizations to enforce the principle of least privilege by granting specific permissions only to those who need them. 37 | - This fine-grained access control enhances security and reduces the risk of unauthorized access. 38 | 39 | - **Access Types:** 40 | - IAM users can be configured for different types of access: 41 | - **Programmatic Access:** Via access keys (access key ID and secret access key) for API operations. 42 | - **Console Access:** For interaction with the AWS Management Console. 43 | 44 | - **Permissions and Policies:** 45 | - IAM policies are used to define what AWS resources users can access and what actions they can perform. 46 | - Policies are attached to IAM users and determine their level of access. 47 | 48 | - **Enhanced Security Measures:** 49 | - IAM users can have multi-factor authentication (MFA) enabled for an additional layer of security. 50 | - Password policies can be applied to enforce strong password requirements. 51 | 52 | - **Accountability and Auditing:** 53 | - IAM users enable organizations to maintain accountability by tracking who has access to AWS resources. 54 | - Detailed audit trails are maintained, aiding in security monitoring and compliance. 55 | 56 | In summary, AWS IAM Users are a fundamental component of access control and security in AWS. They allow organizations to manage and delegate access while enforcing security best practices and ensuring accountability for resource access. 57 | 58 | ### Roles 59 | 60 | **Prompt: Give me an introduction about AWS IAM Roles** 61 | 62 | - **AWS IAM Roles:** 63 | - AWS Identity and Access Management (IAM) Roles are a core feature of AWS IAM. 64 | - Roles are used to delegate permissions securely to AWS resources, services, and external entities. 65 | - They play a crucial role in managing access within AWS environments while enhancing security and eliminating the need for long-term credentials. 66 | - **Trusted Entities:** 67 | - Roles are intended to be assumed by trusted entities, such as AWS services, applications, or external users. 68 | - When a trusted entity assumes a role, it gains temporary permissions to perform actions based on the role's policies. 69 | - **Temporary Credentials:** 70 | - One key aspect of roles is the issuance of temporary security credentials, including an access key ID, secret access key, and session token. 71 | - These temporary credentials have a limited lifespan (usually 1 hour by default) and are automatically rotated, reducing the risk associated with long-term access keys. 72 | - **Cross-Account Access:** 73 | - Roles enable secure sharing of resources between AWS accounts, providing a mechanism for cross-account access. 74 | - This facilitates collaboration, centralized resource management, and resource sharing with trusted partners or third-party services. 75 | - **Role-Based Access Control:** 76 | - Roles follow the principle of role-based access control (RBAC), where permissions are assigned to roles, and entities assume roles to acquire those permissions. 77 | - This approach simplifies access management and ensures that permissions are only granted when needed. 78 | - **Trust Policy and Permissions Policy:** 79 | - Roles consist of two key policies: the trust policy and permissions policy. 80 | - The trust policy defines the trusted entities that can assume the role. 81 | - The permissions policy defines the permissions granted to the entity that assumes the role. 82 | - **Secure and Dynamic Access:** 83 | - IAM roles contribute to a more secure AWS environment by reducing the exposure of long-term credentials. 84 | - They also facilitate dynamic and just-in-time access to resources. 85 | 86 | In summary, AWS IAM Roles are a fundamental component of AWS security and access management. They enable secure delegation of permissions, reduce the risk associated with access keys, and support a variety of use cases, enhancing security, and access control in AWS environments. 87 | 88 | ## Permissions 89 | 90 | ### Policies 91 | 92 | **Prompt: Give me an introduction about AWS IAM policies.** 93 | 94 | Certainly, here's an introduction to AWS IAM policies: 95 | 96 | - **AWS IAM Policies:** 97 | - AWS Identity and Access Management (IAM) policies are essential components for controlling access to AWS resources securely. 98 | - They define the permissions and access rights that determine what users, groups, and roles can do within an AWS environment. 99 | - **Access Control Rules:** 100 | - IAM policies serve as access control rules that specify who has access to which AWS resources and what actions they can perform on those resources. 101 | - They are used to enforce the principle of least privilege, granting only the necessary permissions for individuals and entities to carry out their tasks. 102 | - **Policy Components:** 103 | - IAM policies are typically written in JSON (JavaScript Object Notation) format, making them human-readable and machine-readable. 104 | - A policy consists of one or more statements, each describing a specific permission or restriction. 105 | - Each statement contains: 106 | - An effect (allow or deny). 107 | - A resource (the AWS resource to which the statement applies). 108 | - An action (the AWS actions that are allowed or denied). 109 | - Conditions (optional criteria for when the policy is applied). 110 | - **Managed and Custom Policies:** 111 | - AWS provides managed policies that cover common use cases and best practices. These managed policies can be attached to users, groups, and roles. 112 | - Organizations can also create custom policies tailored to their unique requirements, granting precise control over permissions. 113 | - **Permission Inheritance:** 114 | - IAM policies can be attached directly to users, groups, or roles. 115 | - When a user is a member of a group or assumes a role, they inherit the permissions of the group or role in addition to their individual permissions. 116 | - **Versioning and Auditing:** 117 | - IAM policies support versioning, allowing organizations to manage changes to policies over time. 118 | - Organizations can audit and monitor access to AWS resources by reviewing detailed logs and activity reports generated based on policy evaluations. 119 | 120 | In summary, AWS IAM policies are a fundamental component of access control and security in AWS environments. They define who can access what resources and what actions they can perform, offering organizations the flexibility to tailor access permissions to their specific needs while maintaining a secure and compliant AWS environment. 121 | -------------------------------------------------------------------------------- /Frameworks/AWS/Lambda.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Frameworks/AWS/Lambda.md -------------------------------------------------------------------------------- /Frameworks/AWS/VPC.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Frameworks/AWS/VPC.md -------------------------------------------------------------------------------- /Frameworks/Big Data/Kafka.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Frameworks/Big Data/Kafka.md -------------------------------------------------------------------------------- /Frameworks/Big Data/Spark.md: -------------------------------------------------------------------------------- 1 | # Spark 2 | 3 | ![What is Spark? - Introduction to Apache Spark and Analytics - AWS](../../Images/what-is-apache-spark.b3a3099296936df595d9a7d3610f1a77ff0749df.png) 4 | 5 | ## Components 6 | 7 | ### Spark Driver 8 | 9 | - **Role**: The Spark Driver is the heart of a Spark application. It converts the user's program into tasks, schedules the tasks on the executors, and orchestrates the execution by distributing the tasks among the executors. 10 | - Components 11 | - **SparkContext**: Establishes the connection to the Spark execution environment and acts as the client through which the Spark application is executed. It's responsible for making RDDs resilient and distributed across the cluster. 12 | - **DAGScheduler**: Translates RDDs into execution graphs (DAGs) and stages (groups of tasks). 13 | - **TaskScheduler**: Responsible for sending tasks to the cluster, tracking their execution, and retrying if necessary. 14 | 15 | ### Spark Executor 16 | 17 | - **Role**: Executors are responsible for executing the tasks assigned to them by the driver. Each executor runs in its own JVM on a worker node. 18 | - **Functionality**: Executors perform computations on the tasks, read and write data to external sources, and store intermediate data in memory or disk. Executors report the state of the computation back to the driver node. 19 | 20 | ### Cluster Manager 21 | 22 | - **Role**: The Cluster Manager is an external service for acquiring resources on the cluster (e.g., nodes in the cluster where executors should be launched). It's responsible for managing and allocating resources for Spark applications. 23 | - Types 24 | - **Standalone**: A simple cluster manager included with Spark that makes it easy to set up a cluster. 25 | - **Apache Mesos**: A general cluster manager that can also run Hadoop MapReduce and service applications. 26 | - **Hadoop YARN**: The resource manager in Hadoop 2. 27 | - **Kubernetes**: An open-source system for automating deployment, scaling, and management of containerized applications. 28 | 29 | ## Workflow 30 | 31 | 1. **Initialization**: When a Spark application starts, the Driver program creates a SparkContext, which then connects to a Cluster Manager to ask for resources to launch Executors. 32 | 2. **Task Distribution**: The Driver program converts the application into a set of stages that are broken down into tasks. These tasks are bundled and sent to the appropriate Executors for execution. 33 | 3. **Task Execution**: Executors run the tasks and save the results. If any computations require shuffling data across the executors (for operations like `reduceByKey`), Spark will manage the data transfer. 34 | 4. **Results**: Once all tasks are executed, the results are sent back to the Driver for aggregation or further processing. The final results can be collected at the Driver or written to an external data source. 35 | 36 | ## Data Model 37 | 38 | - **RDD (Resilient Distributed Datasets)**: The fundamental data structure of Spark. It represents an immutable, distributed collection of objects that can be processed in parallel. RDDs support two types of operations: transformations and actions. 39 | - **DAG Execution**: Unlike traditional MapReduce models which execute tasks in a linear sequence, Spark uses a DAG (Directed Acyclic Graph) to represent a sequence of computations. Each node represents an RDD partition, and each edge represents a transformation that leads to a new RDD. 40 | - **DataFrame:** The DataFrame is a key component of Apache Spark’s data processing model, introduced as an extension of the RDD (Resilient Distributed Dataset) concept to provide a higher-level abstraction. DataFrames offer a more structured and intuitive interface for data manipulation and analysis, making Spark more accessible for users familiar with SQL and data analysis tools. 41 | 42 | ## Functions 43 | 44 | ### Transformation Functions 45 | Transformation functions create a new dataset from an existing one. They are lazy operations, meaning they don't compute their results right away. Instead, Spark remembers the set of transformations applied to some base dataset. The transformed dataset is only computed when an action is called. Examples include: 46 | 47 | - `map`: Applies a function to each element in the dataset. 48 | - `filter`: Returns a new dataset formed by selecting those elements of the source on which the function returns true. 49 | - `flatMap`: Similar to map, but each input item can be mapped to 0 or more output items. 50 | - `reduceByKey`: When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function. 51 | - `join`: Joins two datasets based on the keys. 52 | 53 | ### Action Functions 54 | Action functions trigger the computation and return values. Actions are the operations that produce a result from a Spark computation. Unlike transformations, actions force the computation of the transformations required for the result. Examples include: 55 | 56 | - `collect`: Returns all the elements of the dataset as an array. 57 | - `count`: Returns the number of elements in the dataset. 58 | - `first`: Returns the first element in the dataset. 59 | - `take`: Returns an array with the first n elements of the dataset. 60 | - `reduce`: Aggregates the elements of the dataset using a function, returning a single value. 61 | 62 | ### Aggregate Functions 63 | Aggregate functions compute a summary statistic (or statistics) about a dataset. Spark SQL provides built-in methods for aggregating data, such as counting, summing, or averaging. These functions are often used in conjunction with groupBy operations. Examples include: 64 | 65 | - `sum`: Calculates the sum of a numeric column. 66 | - `avg`: Computes the average of a numeric column. 67 | - `max`: Finds the maximum value in a numeric column. 68 | - `min`: Finds the minimum value in a numeric column. 69 | - `groupBy`: Groups the DataFrame using the specified columns, then, we can run aggregation on them. 70 | 71 | ### Window Functions 72 | Window functions perform calculations across a set of rows related to the current row. This is similar to aggregate functions, but, instead of returning a single value for the entire table, a separate value is returned for each row. Examples include: 73 | 74 | - `rank`: Assigns a rank to each row within a partition of a result set. 75 | - `dense_rank`: Similar to `rank`, but without gaps in the ranking sequence when there are ties. 76 | - `row_number`: Assigns a unique sequential integer to rows within a partition of a result set, starting at 1 for the first row in each partition. 77 | 78 | ### UDFs (User-Defined Functions) 79 | Spark allows you to define custom transformations using UDFs. A UDF is a function created by the user to extend the capabilities of Spark's built-in functions. UDFs can be used in Spark SQL to perform operations that are not available through standard Spark functions. 80 | 81 | ### Column Functions 82 | In Spark DataFrames, column functions are used to perform operations on DataFrame columns. They can be used for simple transformations like adding or subtracting values, or for more complex operations like applying regular expressions. Examples include: 83 | 84 | - `lit`: Creates a Column of literal value. 85 | - `col` or `column`: Returns a Column based on the given column name. 86 | - `asc` or `desc`: Returns a Column sorted in ascending or descending order, respectively. 87 | 88 | ## HDFS 89 | 90 | ### Architecture 91 | 92 | - **NameNode**: The master server that manages the file system namespace and regulates access to files by clients. It maintains the file system tree and the metadata for all the files and directories. This information is stored in memory for fast access. 93 | - **DataNode**: These are the slave servers responsible for storing the actual data. A HDFS cluster may consist of hundreds or thousands of DataNodes. The DataNodes handle read and write requests from the file system’s clients, and they also manage block creation, deletion, and replication based on instructions from the NameNode. 94 | - **Secondary NameNode**: Despite its name, it does not serve as a backup to the NameNode but rather performs housekeeping tasks for it, such as periodically merging changes with the main metadata. 95 | 96 | ### File Type 97 | 98 | - **Text Files** 99 | 100 | - The simplest form of data storage, where data is stored in plain text format. 101 | 102 | - Easy to read and write but not the most efficient in terms of storage and processing, especially for large datasets. 103 | 104 | - **Sequence Files** 105 | 106 | - A binary file format that stores key-value pairs. 107 | 108 | - It's a flat file consisting of binary key/value pairs, which is splittable and thus suitable for MapReduce jobs. 109 | 110 | - Offers a higher performance than plain text files and is suitable for storing intermediate data in Hadoop computations. 111 | 112 | - **Avro** 113 | 114 | - A row-based format developed within the Apache Hadoop project. 115 | 116 | - It supports schema evolution, allowing serialized data to be stored with its schema for easy data interpretation. 117 | 118 | - Offers efficient data compression and serialization, making it suitable for both storage and network transmission. 119 | 120 | - **Parquet** 121 | 122 | - A columnar storage file format optimized for use in the Hadoop ecosystem. 123 | 124 | - It's highly efficient for both storage and performance, especially for complex nested data structures. 125 | 126 | - Supports schema evolution and is ideal for queries that fetch specific columns from large datasets, significantly reducing I/O operations. 127 | 128 | ## Spark Run 129 | 130 | - Application Properties 131 | 132 | - **`--class`**: The entry point for your application (e.g., `org.apache.spark.examples.SparkPi`). 133 | 134 | - **`--name`**: Name of your Spark application. 135 | 136 | - **`--master`**: The master URL for the cluster (e.g., `spark://host:port`, `yarn`, `local[4]`). 137 | 138 | - Deploying Modes 139 | - **`--deploy-mode`**: Whether to deploy your application in `client` or `cluster` mode. 140 | 141 | - Resource Allocation 142 | 143 | - **`--executor-memory`**: Amount of memory to use per executor process (e.g., `1000M`, `2G`). 144 | 145 | - **`--driver-memory`**: Amount of memory to use for the driver process. 146 | 147 | - **`--executor-cores`**: The number of cores to use on each executor. 148 | 149 | - **`--num-executors`**: The number of executors to launch for YARN or Kubernetes modes. For standalone mode, use `--total-executor-cores`. 150 | 151 | - **`--total-executor-cores`**: For Spark standalone and Mesos only, the total cores for all executors. 152 | 153 | ```shell 154 | spark-submit \ 155 | --class org.apache.spark.examples.SparkPi \ 156 | --master spark://host:port \ 157 | --deploy-mode cluster \ 158 | --executor-memory 4G \ 159 | --driver-memory 2G \ 160 | --executor-cores 2 \ 161 | --num-executors 10 \ 162 | /path/to/spark-examples_*.jar \ 163 | 1000 164 | ``` -------------------------------------------------------------------------------- /Images/007D0E12-DC9B-4787-91A9-FCEA8EB25A4F.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/007D0E12-DC9B-4787-91A9-FCEA8EB25A4F.png -------------------------------------------------------------------------------- /Images/008i3skNgy1gyu0c5s7gaj31c6092wfc.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/008i3skNgy1gyu0c5s7gaj31c6092wfc.jpg -------------------------------------------------------------------------------- /Images/008i3skNgy1gyu0c83rl1j31ci0ju76k.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/008i3skNgy1gyu0c83rl1j31ci0ju76k.jpg -------------------------------------------------------------------------------- /Images/008i3skNgy1gyu0cax566j31ac0is75t.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/008i3skNgy1gyu0cax566j31ac0is75t.jpg -------------------------------------------------------------------------------- /Images/008i3skNgy1gyu0cen7olj319q0a0t9w.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/008i3skNgy1gyu0cen7olj319q0a0t9w.jpg -------------------------------------------------------------------------------- /Images/04E687DF-CB15-4C86-A420-625B1288AEF9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/04E687DF-CB15-4C86-A420-625B1288AEF9.png -------------------------------------------------------------------------------- /Images/091-udp-user-datagram-protocol-01.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/091-udp-user-datagram-protocol-01.jpeg -------------------------------------------------------------------------------- /Images/1*0dsWFuc0pDmcAmHJUh7wqg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1*0dsWFuc0pDmcAmHJUh7wqg.png -------------------------------------------------------------------------------- /Images/1*CMGFtehu01ZEBgzHG71sMg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1*CMGFtehu01ZEBgzHG71sMg.png -------------------------------------------------------------------------------- /Images/1*F9QnGqmAMcXEr1lQeSiqVA.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1*F9QnGqmAMcXEr1lQeSiqVA.png -------------------------------------------------------------------------------- /Images/1*HeDm_FpLVoR3d97NKdKNLw.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1*HeDm_FpLVoR3d97NKdKNLw.png -------------------------------------------------------------------------------- /Images/1*LwOBbwGXMZUy6OzkFAPTzw.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1*LwOBbwGXMZUy6OzkFAPTzw.png -------------------------------------------------------------------------------- /Images/1*cuOmGT7NevP9oJFJfVpRKA.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1*cuOmGT7NevP9oJFJfVpRKA.png -------------------------------------------------------------------------------- /Images/1*kSRH4T8S1YmAuHbpgQ3Ylw.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1*kSRH4T8S1YmAuHbpgQ3Ylw.png -------------------------------------------------------------------------------- /Images/1*u2el-HrqRPVk7x0xlvs_CA.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1*u2el-HrqRPVk7x0xlvs_CA.png -------------------------------------------------------------------------------- /Images/1*vbRXwvbk4ep_kK3dgfAWzQ.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1*vbRXwvbk4ep_kK3dgfAWzQ.png -------------------------------------------------------------------------------- /Images/1*x4BemYz3TpQxAiMx4E_1qQ.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1*x4BemYz3TpQxAiMx4E_1qQ.png -------------------------------------------------------------------------------- /Images/1200px-Huffman_tree_2.svg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1200px-Huffman_tree_2.svg.png -------------------------------------------------------------------------------- /Images/1200px-IPv4_Packet-en.svg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1200px-IPv4_Packet-en.svg.png -------------------------------------------------------------------------------- /Images/1538057889628.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1538057889628.png -------------------------------------------------------------------------------- /Images/157-Datalink-PDU2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/157-Datalink-PDU2.jpg -------------------------------------------------------------------------------- /Images/1620764635917.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1620764635917.png -------------------------------------------------------------------------------- /Images/1783_2454_329-fat32-file-system.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1783_2454_329-fat32-file-system.jpg -------------------------------------------------------------------------------- /Images/1C2567BE-17C9-43AE-98C9-A864103CCE98.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/1C2567BE-17C9-43AE-98C9-A864103CCE98.png -------------------------------------------------------------------------------- /Images/20220829220512.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/20220829220512.png -------------------------------------------------------------------------------- /Images/20220830151759.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/20220830151759.png -------------------------------------------------------------------------------- /Images/2335a085-5678-472d-8bf6-509f073cce6c.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/2335a085-5678-472d-8bf6-509f073cce6c.png -------------------------------------------------------------------------------- /Images/2BE5A89A-9473-420B-B659-D5F010B2BB8D.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/2BE5A89A-9473-420B-B659-D5F010B2BB8D.png -------------------------------------------------------------------------------- /Images/2CDNtEvHIomxnzf.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/2CDNtEvHIomxnzf.png -------------------------------------------------------------------------------- /Images/3atNUyejpIVchEF.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/3atNUyejpIVchEF.png -------------------------------------------------------------------------------- /Images/400px-Sample-network-diagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/400px-Sample-network-diagram.png -------------------------------------------------------------------------------- /Images/440px-Virtual_address_space_and_physical_address_space_relationship.svg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/440px-Virtual_address_space_and_physical_address_space_relationship.svg.png -------------------------------------------------------------------------------- /Images/440px-X86_Paging_PAE_4K.svg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/440px-X86_Paging_PAE_4K.svg.png -------------------------------------------------------------------------------- /Images/4F00CAC7-1C7B-4065-A509-F7B57DD7FBE3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/4F00CAC7-1C7B-4065-A509-F7B57DD7FBE3.png -------------------------------------------------------------------------------- /Images/58182variations_comparison.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/58182variations_comparison.png -------------------------------------------------------------------------------- /Images/5FyWjSG1DvLVph6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/5FyWjSG1DvLVph6.png -------------------------------------------------------------------------------- /Images/5SEPxn9AGVq2J3w.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/5SEPxn9AGVq2J3w.png -------------------------------------------------------------------------------- /Images/5bc71daf871a6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/5bc71daf871a6.png -------------------------------------------------------------------------------- /Images/5bc71ed75cefe.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/5bc71ed75cefe.png -------------------------------------------------------------------------------- /Images/8RktEjwCXJU92Wn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/8RktEjwCXJU92Wn.png -------------------------------------------------------------------------------- /Images/8TaCWgzYcHkMu1x.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/8TaCWgzYcHkMu1x.png -------------------------------------------------------------------------------- /Images/8wsLWXR1jaAvgKC.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/8wsLWXR1jaAvgKC.png -------------------------------------------------------------------------------- /Images/A320790E-26BD-4370-8B87-47E08D04FA9C.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/A320790E-26BD-4370-8B87-47E08D04FA9C.png -------------------------------------------------------------------------------- /Images/AkOG1FI9uYNVSiX.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/AkOG1FI9uYNVSiX.png -------------------------------------------------------------------------------- /Images/BECB3588-AFBF-42F2-9A1C-22916449B1B4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/BECB3588-AFBF-42F2-9A1C-22916449B1B4.png -------------------------------------------------------------------------------- /Images/BERT.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/BERT.png -------------------------------------------------------------------------------- /Images/BfT2ScuOlW3DY1V.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/BfT2ScuOlW3DY1V.png -------------------------------------------------------------------------------- /Images/BiFHO5rXQyJ9CoE.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/BiFHO5rXQyJ9CoE.png -------------------------------------------------------------------------------- /Images/Blank diagram (1).png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Blank diagram (1).png -------------------------------------------------------------------------------- /Images/Blank diagram-3028745.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Blank diagram-3028745.png -------------------------------------------------------------------------------- /Images/Blank diagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Blank diagram.png -------------------------------------------------------------------------------- /Images/Bp8QcoewPVYhF9C.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Bp8QcoewPVYhF9C.png -------------------------------------------------------------------------------- /Images/CB177F61-F635-4014-B166-2C9DDFE51FA3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/CB177F61-F635-4014-B166-2C9DDFE51FA3.png -------------------------------------------------------------------------------- /Images/CBOW-Model-from-Original-Paper15.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/CBOW-Model-from-Original-Paper15.png -------------------------------------------------------------------------------- /Images/CL84sPApiofEzGX.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/CL84sPApiofEzGX.jpg -------------------------------------------------------------------------------- /Images/CN-20231012130356517.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/CN-20231012130356517.png -------------------------------------------------------------------------------- /Images/CN.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/CN.png -------------------------------------------------------------------------------- /Images/Capture-24.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Capture-24.png -------------------------------------------------------------------------------- /Images/CbJ2PA1qd4skVv9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/CbJ2PA1qd4skVv9.png -------------------------------------------------------------------------------- /Images/Cl3Mdkr1HoXfe5z.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Cl3Mdkr1HoXfe5z.png -------------------------------------------------------------------------------- /Images/DA5651B0-8568-45A2-A26E-B5DE1F699F9B.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/DA5651B0-8568-45A2-A26E-B5DE1F699F9B.png -------------------------------------------------------------------------------- /Images/DB8B54EB-D3C3-4EB0-A8F4-71163B1426E0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/DB8B54EB-D3C3-4EB0-A8F4-71163B1426E0.png -------------------------------------------------------------------------------- /Images/E63DEDAE-89A8-4A35-B994-24482EC5FE27.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/E63DEDAE-89A8-4A35-B994-24482EC5FE27.png -------------------------------------------------------------------------------- /Images/EN-tcp-20231012130628676.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/EN-tcp-20231012130628676.png -------------------------------------------------------------------------------- /Images/EN-tcp-verbindungsabbau.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/EN-tcp-verbindungsabbau.png -------------------------------------------------------------------------------- /Images/EN-tcp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/EN-tcp.png -------------------------------------------------------------------------------- /Images/FAwhLygaZbrdlq1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/FAwhLygaZbrdlq1.png -------------------------------------------------------------------------------- /Images/File_Hierarchy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/File_Hierarchy.png -------------------------------------------------------------------------------- /Images/Flowcharts (1)-3026409.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Flowcharts (1)-3026409.png -------------------------------------------------------------------------------- /Images/Flowcharts (1).png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Flowcharts (1).png -------------------------------------------------------------------------------- /Images/Flowcharts.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Flowcharts.png -------------------------------------------------------------------------------- /Images/GEeuqFxMY9noD2b.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/GEeuqFxMY9noD2b.jpg -------------------------------------------------------------------------------- /Images/GZaKDHj4Ir9LVCF.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/GZaKDHj4Ir9LVCF.png -------------------------------------------------------------------------------- /Images/GeLMhaPE9muIjgd.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/GeLMhaPE9muIjgd.png -------------------------------------------------------------------------------- /Images/HRwP1uQhVj9zEqo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/HRwP1uQhVj9zEqo.jpg -------------------------------------------------------------------------------- /Images/IErKH5JjNV8eTn3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/IErKH5JjNV8eTn3.png -------------------------------------------------------------------------------- /Images/IJcUaSYs4uQ1vVl.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/IJcUaSYs4uQ1vVl.png -------------------------------------------------------------------------------- /Images/IbzHhrMq9UwGd25.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/IbzHhrMq9UwGd25.png -------------------------------------------------------------------------------- /Images/IckhsWpVL8TDnYi.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/IckhsWpVL8TDnYi.png -------------------------------------------------------------------------------- /Images/Iz7Ti-20231025130710487.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Iz7Ti-20231025130710487.png -------------------------------------------------------------------------------- /Images/Iz7Ti.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Iz7Ti.png -------------------------------------------------------------------------------- /Images/JPOM4bnYzKqX9DG.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/JPOM4bnYzKqX9DG.png -------------------------------------------------------------------------------- /Images/LCdpFfP8twkD6S4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/LCdpFfP8twkD6S4.png -------------------------------------------------------------------------------- /Images/LeNet5-fm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/LeNet5-fm.png -------------------------------------------------------------------------------- /Images/NHiFXYjkBSVM49L.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/NHiFXYjkBSVM49L.png -------------------------------------------------------------------------------- /Images/NZgAcEaq74b3r1e.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/NZgAcEaq74b3r1e.png -------------------------------------------------------------------------------- /Images/NkWiVebrlOf4dQ2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/NkWiVebrlOf4dQ2.png -------------------------------------------------------------------------------- /Images/SGHnm1TwRgBVNyF.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/SGHnm1TwRgBVNyF.png -------------------------------------------------------------------------------- /Images/SYNXkEyBGf6l4vW.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/SYNXkEyBGf6l4vW.png -------------------------------------------------------------------------------- /Images/Sbs79dz2qLZrth8.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Sbs79dz2qLZrth8.jpg -------------------------------------------------------------------------------- /Images/Screenshot-2019-07-21-at-2.57.13-AM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Screenshot-2019-07-21-at-2.57.13-AM.png -------------------------------------------------------------------------------- /Images/Screenshot-from-2021-03-18-12-47-59.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Screenshot-from-2021-03-18-12-47-59.png -------------------------------------------------------------------------------- /Images/Sliding-Window-Protocol.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Sliding-Window-Protocol.jpg -------------------------------------------------------------------------------- /Images/Sliding_SET_2-1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Sliding_SET_2-1.jpg -------------------------------------------------------------------------------- /Images/Stop-and-Wait-ARQ.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Stop-and-Wait-ARQ.png -------------------------------------------------------------------------------- /Images/TWH2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/TWH2.png -------------------------------------------------------------------------------- /Images/The-architecture-of-Skip-gram-model-20.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/The-architecture-of-Skip-gram-model-20.png -------------------------------------------------------------------------------- /Images/U3iEf2QnpCOFqsS.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/U3iEf2QnpCOFqsS.png -------------------------------------------------------------------------------- /Images/UWVkO6PIpX95DA7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/UWVkO6PIpX95DA7.png -------------------------------------------------------------------------------- /Images/Untitled drawing (1).png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Untitled drawing (1).png -------------------------------------------------------------------------------- /Images/VMPcZnrig6opvI1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/VMPcZnrig6opvI1.png -------------------------------------------------------------------------------- /Images/XjicpDbhByIVZeo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/XjicpDbhByIVZeo.png -------------------------------------------------------------------------------- /Images/Xy32mSYOxFnrVwz.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/Xy32mSYOxFnrVwz.png -------------------------------------------------------------------------------- /Images/ZSzqvTtModxWU3K.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/ZSzqvTtModxWU3K.png -------------------------------------------------------------------------------- /Images/alipay.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/alipay.jpeg -------------------------------------------------------------------------------- /Images/attention_research_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/attention_research_1.png -------------------------------------------------------------------------------- /Images/bkNRTaSdzp87BYA.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/bkNRTaSdzp87BYA.png -------------------------------------------------------------------------------- /Images/cDJTWHUgOPkS2ua.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/cDJTWHUgOPkS2ua.png -------------------------------------------------------------------------------- /Images/dg2W549oVLIzB6r.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/dg2W549oVLIzB6r.png -------------------------------------------------------------------------------- /Images/dns-root-server.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/dns-root-server.png -------------------------------------------------------------------------------- /Images/efj4eq30q56ute916yfi.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/efj4eq30q56ute916yfi.png -------------------------------------------------------------------------------- /Images/git-cheatsheet-6.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/git-cheatsheet-6.jpg -------------------------------------------------------------------------------- /Images/git_commands-20231005101828239.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/git_commands-20231005101828239.png -------------------------------------------------------------------------------- /Images/git_commands.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/git_commands.png -------------------------------------------------------------------------------- /Images/httpatomoreillycomsourceoreillyimages9320.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/httpatomoreillycomsourceoreillyimages9320.png -------------------------------------------------------------------------------- /Images/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F641bd7c1-f965-4144-99c3-06d733c3be14_3486x2853.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F641bd7c1-f965-4144-99c3-06d733c3be14_3486x2853.jpeg -------------------------------------------------------------------------------- /Images/illu.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/illu.png -------------------------------------------------------------------------------- /Images/image-20220305124311303.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20220305124311303.png -------------------------------------------------------------------------------- /Images/image-20220305150629948.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20220305150629948.png -------------------------------------------------------------------------------- /Images/image-20220305152405197.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20220305152405197.png -------------------------------------------------------------------------------- /Images/image-20220305153830056.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20220305153830056.png -------------------------------------------------------------------------------- /Images/image-20220305161459418.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20220305161459418.png -------------------------------------------------------------------------------- /Images/image-20220305201150385.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20220305201150385.png -------------------------------------------------------------------------------- /Images/image-20221009115237311.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20221009115237311.png -------------------------------------------------------------------------------- /Images/image-20230114111840096.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230114111840096.png -------------------------------------------------------------------------------- /Images/image-20230114114412320.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230114114412320.png -------------------------------------------------------------------------------- /Images/image-20230114115309584.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230114115309584.png -------------------------------------------------------------------------------- /Images/image-20230114115445001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230114115445001.png -------------------------------------------------------------------------------- /Images/image-20230222223859233.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230222223859233.png -------------------------------------------------------------------------------- /Images/image-20230222224048747.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230222224048747.png -------------------------------------------------------------------------------- /Images/image-20230225211522140.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230225211522140.png -------------------------------------------------------------------------------- /Images/image-20230225211557063.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230225211557063.png -------------------------------------------------------------------------------- /Images/image-20230311142440848.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230311142440848.png -------------------------------------------------------------------------------- /Images/image-20230311142622209.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230311142622209.png -------------------------------------------------------------------------------- /Images/image-20230311142721864.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230311142721864.png -------------------------------------------------------------------------------- /Images/image-20230311142910320.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230311142910320.png -------------------------------------------------------------------------------- /Images/image-20230311143019440.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230311143019440.png -------------------------------------------------------------------------------- /Images/image-20230311143031158.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230311143031158.png -------------------------------------------------------------------------------- /Images/image-20230315204036268.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230315204036268.png -------------------------------------------------------------------------------- /Images/image-20230315204104180.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230315204104180.png -------------------------------------------------------------------------------- /Images/image-20230315204420119.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20230315204420119.png -------------------------------------------------------------------------------- /Images/image-20231116101501454.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231116101501454.png -------------------------------------------------------------------------------- /Images/image-20231116104529337.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231116104529337.png -------------------------------------------------------------------------------- /Images/image-20231116104607353.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231116104607353.png -------------------------------------------------------------------------------- /Images/image-20231116110410692.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231116110410692.png -------------------------------------------------------------------------------- /Images/image-20231116112227032.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231116112227032.png -------------------------------------------------------------------------------- /Images/image-20231121131612791.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231121131612791.png -------------------------------------------------------------------------------- /Images/image-20231201111643716.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231201111643716.png -------------------------------------------------------------------------------- /Images/image-20231201145740349.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231201145740349.png -------------------------------------------------------------------------------- /Images/image-20231205150904749.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231205150904749.png -------------------------------------------------------------------------------- /Images/image-20231205151057529.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231205151057529.png -------------------------------------------------------------------------------- /Images/image-20231206131354092.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231206131354092.png -------------------------------------------------------------------------------- /Images/image-20231211145359412.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231211145359412.png -------------------------------------------------------------------------------- /Images/image-20231212143108175.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231212143108175.png -------------------------------------------------------------------------------- /Images/image-20231212150157274.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image-20231212150157274.png -------------------------------------------------------------------------------- /Images/image6-7664108.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/image6-7664108.png -------------------------------------------------------------------------------- /Images/kHyHs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/kHyHs.png -------------------------------------------------------------------------------- /Images/lQj8AhFHwJYkEoV.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/lQj8AhFHwJYkEoV.png -------------------------------------------------------------------------------- /Images/man-in-the-middle-mitm-attack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/man-in-the-middle-mitm-attack.png -------------------------------------------------------------------------------- /Images/mbKIngJhVoMj25d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/mbKIngJhVoMj25d.png -------------------------------------------------------------------------------- /Images/nvJEei4abws3Rkr.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/nvJEei4abws3Rkr.jpg -------------------------------------------------------------------------------- /Images/oAjQbqfurwDtsan.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/oAjQbqfurwDtsan.png -------------------------------------------------------------------------------- /Images/oIqyPETceM7xbnD.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/oIqyPETceM7xbnD.jpg -------------------------------------------------------------------------------- /Images/pNktMSJAU3rHVf7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/pNktMSJAU3rHVf7.png -------------------------------------------------------------------------------- /Images/pyQ6Z7HBngEfkdI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/pyQ6Z7HBngEfkdI.png -------------------------------------------------------------------------------- /Images/requirement.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/requirement.png -------------------------------------------------------------------------------- /Images/sOnM6eDqWA3QjZP.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/sOnM6eDqWA3QjZP.png -------------------------------------------------------------------------------- /Images/smtp_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/smtp_1.png -------------------------------------------------------------------------------- /Images/tTBlm68bDk2Vc93.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/tTBlm68bDk2Vc93.png -------------------------------------------------------------------------------- /Images/tcp-header.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/tcp-header.png -------------------------------------------------------------------------------- /Images/uFptSkiLynWCjzN.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/uFptSkiLynWCjzN.png -------------------------------------------------------------------------------- /Images/vlQH1xzqCYn2F3c.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/vlQH1xzqCYn2F3c.png -------------------------------------------------------------------------------- /Images/what-is-apache-spark.b3a3099296936df595d9a7d3610f1a77ff0749df.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/what-is-apache-spark.b3a3099296936df595d9a7d3610f1a77ff0749df.png -------------------------------------------------------------------------------- /Images/yHekzjfKh6taDw2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangjxCS/CSKnowledgeHub/6d7f0457d0746cb9cc51f15d8bcf5c3cd34a82f6/Images/yHekzjfKh6taDw2.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Computer Science Knowledge Hub 2 | 3 | Welcome to CSKnowledgeHub, your go-to open-source repository for sharing computer science knowledge and preparing for interviews. Our mission is to provide a dedicated space for aspiring software engineers, helping them navigate the path from being a student to becoming a successful engineer in U.S. tech companies. 4 | 5 | Within our repository, you will find a diverse range of content, including: 6 | 7 | 1. **Computer Science Fundamentals:** Access comprehensive course content covering the essential basics of computer science. 8 | 2. **Interview Preparation for System Design:** Sharpen your skills and knowledge to excel in system design interviews, a critical aspect of tech job interviews. 9 | 3. **Work-Related Computer Science and Frameworks:** Explore resources related to real-world applications of computer science concepts and various frameworks commonly used in industry settings. 10 | 4. **Study Resources:** Discover a curated collection of study materials to support your journey in becoming a proficient software engineer. 11 | 12 | ## About me 13 | 14 | [![LinkedIn](https://img.shields.io/badge/LinkedIn-Connect%20with%20me-blue?style=for-the-badge&logo=linkedin)](https://www.linkedin.com/in/jingxiangzhang/) [![Zhihu](https://img.shields.io/badge/Zhihu-Follow%20Me-blue?style=for-the-badge&logo=zhihu)](https://www.zhihu.com/people/zhangjx831) [![Sponsor Me on Alipay](https://img.shields.io/badge/Alipay-Sponsor%20Me-blue?style=for-the-badge&logo=alipay)](Images/alipay.jpeg) [![Consult](https://img.shields.io/badge/Zoom-Consult%20Me-blue?style=for-the-badge&logo=zoom)](https://calendly.com/zhangjx/30min) 15 | 16 | I transitioned from a bachelor's degree in Environmental Engineering at Tsinghua University to a career in software engineering. I hold master's degrees in Data Science from Columbia University and Computer Science from Georgia Tech. This unique academic journey has given me valuable insights into becoming a software engineer, particularly from a non-computer science background. I excel at finding and utilizing resources effectively for continuous learning in this field. 17 | 18 | - **Connect With Me**: Connect with me on Linkedin to stay updated on my professional journey, share insights, or collaborate on interesting projects. 19 | - **Follow me**: Follow me on Zhihu to learn about why and how I make a switch to Software Engineer career. 20 | 21 | - **Sponsor Me:** Did you find this repository useful? If so, consider sponsoring to buy me a beer as a token of appreciation. Your support goes a long way in helping me continue my work and provide valuable resources to the community. 22 | - **Consult Me:** Schedule a 30-minute phone call and follow up services with 50 RMB. I specialize in consultation services in the following areas: 23 | - **School Admission:** Planning to pursue higher education? I can guide you through the admissions process, helping you choose the right school and navigate the application journey. 24 | - **Software Engineer Career:** Looking to transition into a software engineering career? I offer tailored suggestions and insights to help you succeed in your career switch. 25 | - **U.S. Life Preparation:** Preparing for life in the United States? I can provide valuable advice and tips to ensure a smooth transition and a successful stay in the U.S. 26 | 27 | ## Catalog 28 | 29 | ### Basic Knowledge 30 | 31 | - [Data Structures & Algorithms](Basic%20Knowledge/Data%20Structure%20&%20Algorithm/) 32 | - [Sorting and Searching](Basic%20Knowledge/Data%20Structure%20&%20Algorithm/Sorting%20and%20Searching.md) 33 | - [Tree](Basic%20Knowledge/Data%20Structure%20&%20Algorithm/Tree.md) 34 | - [Graph](Basic%20Knowledge/Data%20Structure%20&%20Algorithm/Graph.md) 35 | - [Backtrack](Basic%20Knowledge/Data%20Structure%20&%20Algorithm/Backtrack.md) 36 | - [Dynamic Programming](Basic%20Knowledge/Data%20Structure%20&%20Algorithm/Dynamic%20Programming.md) 37 | - [Design](Basic%20Knowledge/Data%20Structure%20&%20Algorithm/Design.md) 38 | - [Operating System](Basic%20Knowledge/Operating%20System/) 39 | - [Introduction](Basic%20Knowledge/Operating%20System/Introduction.md) 40 | - [Process Management](Basic%20Knowledge/Operating%20System/Process%20Management.md) 41 | - [Thread Management](Basic%20Knowledge/Operating%20System/Thread%20Management.md) 42 | - [Memory Management](Basic%20Knowledge/Operating%20System/Memory%20Management.md) 43 | - [Device Management](Basic%20Knowledge/Operating%20System/Device%20Management.md) 44 | - [Database](Basic%20Knowledge/Database/) 45 | - [Introduction](Basic%20Knowledge/Database/Introduction.md) 46 | - [SQL](Basic%20Knowledge/Database/SQL.md) 47 | - [Data Model](Basic%20Knowledge/Database/Data%20Model.md) 48 | - [Database Design](Basic%20Knowledge/Database/Database%20Design.md) 49 | - [Computer Network](Basic%20Knowledge/Computer%20Network/) 50 | - [Introduction](Basic%20Knowledge/Computer%20Network/Introduction.md) 51 | - [Application Layer](Basic%20Knowledge/Computer%20Network/Application%20Layer.md) 52 | - [Transport Layer](Basic%20Knowledge/Computer%20Network/Transport%20Layer.md) 53 | - [Network Layer](Basic%20Knowledge/Computer%20Network/Network%20Layer.md) 54 | - [Link Layer](Basic%20Knowledge/Computer%20Network/Link%20Layer.md) 55 | - [Machine Learning](Basic%20Knowledge/Machine%20Learning/) 56 | - [Machine Learning](Basic%20Knowledge/Machine%20Learning/Machine%20Learning.md) 57 | - [Deep Learning](Basic%20Knowledge/Machine%20Learning/Deep%20Learning.md) 58 | - [Computer Vision](Basic%20Knowledge/Machine%20Learning/Computer%20Vision.md) 59 | - [Natural Language Processing](Basic%20Knowledge/Machine%20Learning/Natural%20Language%20Processing.md) 60 | - [Recommender System](Basic%20Knowledge/Machine%20Learning/Recommder%20System.md) 61 | - [Reinforcement Learning](Basic%20Knowledge/Machine%20Learning/Reinforcement%20Learning.md) 62 | - [MLOps](Basic%20Knowledge/Machine%20Learning/MLOps.md) 63 | - [Statistics](Basic%20Knowledge/Statistics/) 64 | - [Probabilities and Statistics](Basic%20Knowledge/Statistics/Probs%20and%20Stats.md) 65 | - [Statistical Inference](Basic%20Knowledge/Statistics/Stats%20Inference.md) 66 | - [Statistical Modelling](Basic%20Knowledge/Statistics/Stats%20Modelling.md) 67 | 68 | ### Frameworks 69 | 70 | - [Web Development](Frameworks/Web%20Development/) 71 | - [React](Frameworks/Web%20Development/React.md) 72 | - [Flask](Frameworks/Web%20Development/Flask.md) 73 | - [DevOps](Frameworks/DevOps/) 74 | - [Git](Frameworks/DevOps/Git.md) 75 | - [Docker](Frameworks/DevOps/Docker.md) 76 | - [Kubernetes](Frameworks/DevOps/Kubernetes.md) 77 | - [AWS](Frameworks/AWS/) 78 | - [IAM](Frameworks/AWS/IAM.md) 79 | - [VPC](Frameworks/AWS/VPC.md) 80 | - [Lambda](Frameworks/AWS/Lambda.md) 81 | 82 | ### System Desing 83 | 84 | - [Web Application](System%20Design/Web%20Application.md) 85 | 86 | ## Studying Resources 87 | 88 | - [Alex Xu's System Design](https://bytebytego.com/) 89 | - [Grokking System Design](https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers) 90 | - [labuladong's Algorithm Preparation](https://github.com/labuladong/fucking-algorithm) 91 | 92 | - [CS DIY Learning](https://csdiy.wiki/) 93 | -------------------------------------------------------------------------------- /System Design/Incident Insights.md: -------------------------------------------------------------------------------- 1 | # Incident Insights 2 | 3 | ## Requirments 4 | 5 | requirement 6 | 7 | - **User Authentication and Account Management:** 8 | - Implement a secure user signup and login process. 9 | - Allow users to create and manage their own accounts. 10 | - Enable users to invite others to join their account with appropriate access controls and permissions. 11 | - **Data Integration with Ticketing Tools:** 12 | - Develop integrations to ingest ticket data from various ticketing tools like PagerDuty, Opsgenie, etc. 13 | - Ensure compatibility and secure data transfer protocols for these integrations. 14 | - Provide options for users to customize the data they want to import from these tools. 15 | - **Data Analysis and Incident Clustering:** 16 | - Design algorithms to analyze ticket data and cluster incidents based on patterns. 17 | - Implement machine learning or statistical techniques to identify and categorize incident patterns. 18 | - Develop functionality to update and refine clustering algorithms based on new data. 19 | - **Incident Trend Visualization:** 20 | - Create a dashboard to visualize incident trends using charts and graphs. 21 | - Allow users to filter, sort, and drill down into specific data points for detailed analysis. 22 | - Provide options for users to customize the dashboard based on their preferences. 23 | - **Integration with GenAI for Command Generation:** 24 | - Integrate with GenAI or similar services to generate debugging commands for incidents or clusters. 25 | - Ensure that the integration securely handles data and respects user privacy. 26 | - Provide users with options to refine or customize the command suggestions. 27 | - **Remote Command Execution Capability:** 28 | - Develop a secure, remote command execution feature that allows users to run commands on their clusters, virtual machines, etc., directly from the app. 29 | - Ensure robust authentication and authorization mechanisms to prevent unauthorized access. 30 | - Include features to track and log command execution for auditing and troubleshooting purposes. 31 | - **User Interface and Experience:** 32 | - Design an intuitive and user-friendly interface that allows easy navigation and access to all features. 33 | - Ensure the application is responsive and accessible on various devices and browsers. 34 | - Implement feedback mechanisms to gather user input for continuous improvement. 35 | - **Documentation and Support:** 36 | - Provide comprehensive documentation on how to use the application, including guides for integration and troubleshooting. 37 | - Set up a support system for users to report issues or seek help. 38 | 39 | ## High-Level Design 40 | 41 | ### Architecture 42 | 43 | Flowcharts 44 | 45 | 1. **Web App:**The main interface that users interact with. It serves the client-side of your application, typically consisting of HTML, CSS, and JavaScript. 46 | 2. **CDN (Content Delivery Network):**A distributed network of servers that delivers static content to users based on their geographic location to reduce latency. 47 | 3. **DNS Server:**Translates domain names into IP addresses so browsers can load internet resources. It directs the user's request to the appropriate server. 48 | 4. **Load Balancer:**Distributes incoming network traffic across multiple servers to ensure no single server becomes overwhelmed and to improve redundancy and reliability. 49 | 5. **Web Proxy:**Acts as an intermediary between the users’ requests from the internet and the API server. It can also cache content and handle SSL termination. 50 | 6. **API Server:**The server that processes API requests. It executes business logic, interacts with the database, and communicates with other services. 51 | 7. **Message Queue:**Decouples processes by allowing them to communicate asynchronously. The API server sends messages to the queue, which are then processed by background workers. 52 | 8. **Background Worker:**A service that processes jobs from the message queue. This can include tasks such as sending emails, processing files, or performing batch operations. 53 | 9. **Database:**Stores and manages data. The API server queries the database to retrieve, update, or delete data as requested. 54 | 10. **Cache:**Temporarily stores frequently accessed data to reduce database load and speed up response times. 55 | 11. **Data Pipeline Orchestrator:**Manages and automates the flow of data between systems. It ensures that data processing tasks are executed in the correct order and manages dependencies between tasks. 56 | 57 | ### Technology 58 | 59 | - **Web App:** React, Typescript 60 | - **CDN (Content Delivery Network):** AWS Cloudfront 61 | - **DNS Server:** AWS Route 53 62 | - **Load Balancer:** AWS Elastic Load Balancer (ELB) 63 | - **Web Proxy:** Nginx deployed on Kubernetes 64 | - **API Server:** Python Flask gunicorn deployed on Kubernetes 65 | - **Message Queue:** Redis Queue 66 | - **Background Worker:** Kubernetes pods listening on Redis Queue 67 | - **Database:** PostgreSQL 68 | - **Cache:** Redis 69 | - **Data Pipeline Orchestrator:** Airbyte 70 | - **Monitoring:** Sentry, Pagerduty, Prometheus 71 | - **CI/CD:** Github Action, ArgoCD 72 | - **Authorization:** Auth0 73 | 74 | ## Deep Dive 75 | 76 | ### Integrations 77 | 78 | Flowcharts (1) 79 | 80 | 1. **Website:** 81 | - A user provides credentials for their ticketing tool. 82 | - These credentials are then verified by the system to ensure they are correct. 83 | - Upon successful verification, a connection to the user's ticketing tool is established. 84 | 2. **API Server:** 85 | - The API server saves the integration details in database 86 | - It can submit a job to message queue to load data steammingly. 87 | - It can create a workflow in Airbyte, which is a data pipeline orchestrator. 88 | 3. **Message Queue:** 89 | - The message queue receives the data load job and holds it until a background worker is available to process it. 90 | 4. **Data Pipeline Orchestrator:** 91 | - The orchestrator schedules and manages the execution of data workflows. 92 | - It performs the initial run of the workflow and schedules subsequent runs as necessary. 93 | 5. **Background Worker:** 94 | - A background worker picks up the data load job from the message queue. This worker is responsible for loading streaming incidents into the system. 95 | - Another background worker can also be triggered from Data Pipeline Orchestrator to load history data or do incremental data update. 96 | - Once the data load is finished, the data is saved into database. 97 | 6. **Database:** 98 | - The database serves as the persistent storage for all data related to tickets and integration details. 99 | - It stores the processed ticket information from the streaming incidents and the integration details from the data pipeline orchestrator. 100 | 7. **Cron Job:** 101 | - The cron job kicks off every 15 minute to transform the new data in the database to the unified schema 102 | - The unified schema unifies the schema from different ticketing tools 103 | 104 | ### Runbooks 105 | 106 | Blank diagram 107 | 108 | 1. **Website:** 109 | - Users input ticket data or prompts into the website. 110 | - The website sends this data via an API request to the backend to initiate runbook generation or conduct a search. 111 | - Periodically, the frontend makes API requests to check the status and results of the runbook generation job. 112 | - Users can save the generated runbooks, prompting the frontend to send a save request to the backend. 113 | 114 | 2. **API Server:** 115 | - On receiving a request to generate a runbook, the server submits the task to the message queue and records the job status in the database. 116 | - If it receives a save or delete runbook request, the server updates the runbook vector DB index with the new data or removes the existing data. 117 | 118 | 3. **Message Queue:** 119 | - Manages the queue of jobs for runbook generation, ensuring they're processed sequentially or based on priority. 120 | 121 | 4. **Background Worker:** 122 | - Listens to the message queue and dequeues jobs for processing. 123 | - Interacts with OpenAI using the RAG approach, where it first retrieves relevant commands or content from the runbooks index and then generates a new runbook by sending prompts based on ticket data or user input to OpenAI. 124 | - Saves the generated runbook or search results to the database and updates the job status to reflect completion. 125 | 126 | 5. **Cache:** 127 | - Stores frequently accessed data, like active runbook jobs and their statuses, to provide quicker access and reduce database load. 128 | 129 | 6. **Database:** 130 | - Holds temporary results of runbook generations and job statuses. 131 | - Permanently stores runbooks if a user decides to save them. 132 | 133 | 7. **Runbook Index:** 134 | - A vector database that indexes the content of runbooks, allowing for efficient retrieval during the RAG process. 135 | - Updated when runbooks are saved or deleted to maintain an accurate and searchable index. 136 | 137 | ### Automations 138 | 139 | Blank diagram (1) 140 | 141 | 1. **Website:** 142 | - Guides users through integrating with AWS, detailing how to set up roles with the necessary permissions to access AWS services like EKS, EC2, VPC, etc. 143 | - Collects AWS credentials, including the role and AWS Lambda function names, from the user and securely transmits them to the API server. 144 | 2. **User AWS Account:** 145 | - The user follows the website's instructions to create a role within their AWS account, granting specific permissions needed for automation tasks. 146 | - The user creates an AWS Lambda function intended for automation of tasks based on runbook instructions. 147 | - Once set up, the user submits their AWS credentials to the web app, enabling it to execute actions on their AWS resources. 148 | 3. **API Server:** 149 | - Safely stores the received AWS user credentials in a secure manner adhering to best practices for handling sensitive information. 150 | - Keeps a record of automation details, which include configurations and scripts that specify the automation tasks. 151 | 4. **User Runbook:** 152 | - Users compile runbooks with detailed instructions for automating tasks on AWS resources. 153 | - Users configure triggers for runbooks, such as scheduled executions or event-based activations, and communicate these settings to the API server. 154 | 5. **Lambda Function:** 155 | - Stands by for execution, set to carry out the automation tasks as outlined in the user-defined runbook. 156 | 6. **Database:** 157 | - The database acts as the persistent storage for AWS credentials, automation configurations, and runbook details, facilitating quick retrieval and secure storage. 158 | 7. **Workflow Triggers:** 159 | - Users have the option to manually trigger runbooks, initiating the execution of the associated Lambda function. 160 | - The API server orchestrates the automation workflow, which can automatically invoke the Lambda function with parameters derived from the runbook. 161 | - The API server can programmatically trigger the user's Lambda function as outlined by the runbook's automation workflow. 162 | 163 | ## Wrap up 164 | 165 | ### Summary 166 | 167 | 1. **User Authentication and Account Management**: Secure signup and login, account management, and invitation with access controls. 168 | 2. **Data Integration with Ticketing Tools**: Ingesting data from ticketing tools, ensuring compatibility and customizable data import. 169 | 3. **Data Analysis and Incident Clustering**: Using algorithms and machine learning to cluster and categorize incident patterns. 170 | 4. **Incident Trend Visualization**: A dashboard for visualizing trends with customization options. 171 | 5. **Integration with GenAI for Command Generation**: Generating debugging commands, ensuring secure data handling. 172 | 6. **Remote Command Execution Capability**: Secure feature for running commands remotely with robust authentication and logging. 173 | 174 | ### Improvement 175 | 176 | 1. **Enhanced Machine Learning Models**: Refine the data analysis algorithms to improve accuracy in incident clustering and prediction. 177 | 2. **Real-time Data Processing**: Implement real-time data analytics for faster incident response and trend detection. 178 | 3. **Advanced Security Features**: Introduce additional layers of security, especially for the remote command execution feature. 179 | -------------------------------------------------------------------------------- /System Design/URL Shortener.md: -------------------------------------------------------------------------------- 1 | # URL Shortener 2 | 3 | ## Requirment 4 | 5 | ### Functional 6 | 7 | - The system must accept a long URL and return a shorter, unique URL. 8 | 9 | - The short URL should redirect users to the original long URL. 10 | - The system should perform validation on input URLs to ensure they are properly formatted. 11 | 12 | - When a long URL is submitted more than once, the system can return the existing short URL. 13 | 14 | - The system must correctly redirect users accessing a short URL to the original long URL. 15 | - The system should track and report on metrics such as the number of times a short URL is accessed (click-through rate). 16 | 17 | ### Non-functional 18 | 19 | - **Performance:** The system should handle URL redirection with low latency, ideally in under a few hundred milliseconds. 20 | - **Performance:**The system should be capable of processing a high volume of URL shortening requests per second. 21 | 22 | - **Scalability:** The system architecture should support scaling to accommodate growth in user traffic and data volume. 23 | - **Scalability:** It should be possible to scale the system horizontally by adding more servers. 24 | - **Availability:** The system should be highly available, with a target uptime of 99.99% or higher. 25 | - **Reliability:** The system should ensure that short URLs, once created, continue to function over time unless explicitly expired. 26 | 27 | ## Estimate 28 | 29 | ### Assumption 30 | 31 | - 1 million URL write request per day, 100:1 Read-write ratio, 100 million URL read request per day 32 | - 500 bytes for one url 33 | - cache hit rate 80% for 20% data, cache expiration in one day 34 | 35 | ### Traffic 36 | 37 | - Read request: 100 million / 24 / 3600 = 1157 query/second 38 | - Write request: 1 million / 24 / 3600 = 11.6 query/second 39 | 40 | ### Storage 41 | 42 | - URL storage for one day is 1 million * 500 bytes = 500 MB, and for 10 years is 1.825 TB 43 | 44 | ### Memory 45 | 46 | - 101 million requests per day and cache 20% percent of the data, to be 0.2 * 101 * 500 bytes = 10.1 GB 47 | 48 | ## High Level Design 49 | 50 | ![img](../Images/1*HeDm_FpLVoR3d97NKdKNLw.png) 51 | 52 | ### Component 53 | 54 | - **Web Application:** A simple, user-friendly web interface where users can enter URLs to be shortened and receive their shortened versions. It can also display analytics for URLs the user has created if account management is supported. 55 | - **API Gateway:** Provides a programmable interface for developers to create, retrieve, and manage short URLs programmatically. This can be RESTful or GraphQL APIs. 56 | 57 | - **URL Shortening Service:** The core logic for accepting long URLs, generating short codes (unique identifiers), and storing the mapping between the short code and the long URL in the database. This service also handles the generation of custom URLs if requested by the user. 58 | - **Redirection Service:** Handles requests to short URLs, looks up the corresponding long URL, and redirects the client using HTTP 301 or 302 responses. 59 | 60 | - **URL Storage:** A database capable of storing mappings between short codes and long URLs, along with metadata such as creation date, expiration date (if applicable), and the owner (if user accounts are implemented). 61 | - **Load Balancer:** Distributes incoming traffic across multiple instances of the application to ensure scalability and reliability. 62 | - **Cache:** Reduces database load by caching popular URLs. Commonly accessed short URLs and their corresponding long URLs are stored temporarily to speed up redirection and analytics processing. 63 | - **CDN (Content Delivery Network):** (Optional) A CDN can be used to cache and serve static content, reducing latency for users distributed globally. 64 | 65 | ### Workflow 66 | 67 | 1. **Shortening a URL:** 68 | - User submits a long URL through the UI or API. 69 | - The URL shortening service validates the URL, generates a short code, stores the mapping in the URL storage, and returns the short URL to the user. 70 | 2. **Accessing a Short URL:** 71 | - A client requests a short URL. 72 | - The redirection service looks up the short code in the cache or the URL storage database. 73 | - If the short URL exists, the service redirects the client to the long URL; otherwise, it returns an error message. 74 | 3. **Viewing Analytics:** 75 | - (Optional) Users request analytics for a short URL through the UI or API. 76 | - The analytics service retrieves and aggregates data from the analytics database and presents it to the user. 77 | 78 | ## Design Deep Dive 79 | 80 | ### Schema 81 | 82 | - URLs Table: 83 | - `short_code` (Primary Key): A unique identifier for the shortened URL. 84 | - `original_url`: The original URL that is being shortened. 85 | - `creation_date`: The date and time when the shortened URL was created. 86 | - `expiration_date` (Optional): If URLs are set to expire, this field stores that date. 87 | - `user_id` (Optional): If implementing user accounts, this associates a URL with a user. 88 | - Analytics Table: 89 | - `short_code`: A foreign key linking to the URLs table. 90 | - `access_date`: The date and time of access. 91 | - `referrer`: The referring site, if available. 92 | - `ip_address`: The IP address of the requester. 93 | 94 | ### Shortening Algorithm 95 | 96 | - **Base62 Encoding**: 97 | - Utilizes a character set comprising digits [0-9], lowercase [a-z], and uppercase [A-Z] letters. 98 | - Short URLs of varying lengths can represent a vast number of unique URLs (e.g., 5 characters for ~916 million URLs, 6 for ~56 billion, and 7 for ~3500 billion). 99 | - The decision to use 7 characters allows for the generation of approximately 3500 billion unique URLs. 100 | - **Techniques for Generating Unique Short URLs**: 101 | - **Random Number Generation**: Selects random characters for each position in the short URL and checks for uniqueness in the database, which becomes less efficient as the database grows. 102 | - **Base Conversion**: Converts a large numeric counter value to base62, ensuring uniqueness. This method requires incrementing the counter for each new URL request. 103 | - **MD5 Hashing**: Generates a 128-bit hash (32 hexadecimal digits) of the original URL. The first 7 characters are used to create the short URL, ensuring its uniqueness through database checks. 104 | 105 | -------------------------------------------------------------------------------- /System Design/Web Application.md: -------------------------------------------------------------------------------- 1 | # Web Application 2 | 3 | image-20231116101501454 4 | 5 | ## Introduction 6 | 7 | 1. **User Initiates Request**: The user enters a URL in their browser or interacts with an app that generates a request to a web server. 8 | 9 | 2. **DNS Resolution** 10 | 11 | - The browser asks a DNS resolver to translate the domain name into an IP address. 12 | 13 | - The DNS resolver queries various DNS servers (root, TLD, and authoritative) to find the IP address of the server hosting the web application. 14 | 15 | 16 | 3. **CDN Interaction** 17 | 18 | - If the requested content is static (like images, CSS, JavaScript files), the DNS might direct the request to a Content Delivery Network (CDN). 19 | 20 | - The CDN serves the content from the nearest edge server to reduce latency. If the edge server doesn't have the content cached, it retrieves it from the origin server and caches it for future requests. 21 | 22 | 23 | 4. **Load Balancer Distribution** 24 | 25 | - If the request is for dynamic content, the IP address resolved by the DNS points to a load balancer. 26 | 27 | - The load balancer distributes the request to one of the web application servers, typically based on factors like current load, server health, or geographic location. 28 | 29 | 30 | 5. **Web Server Processing**: The web server receives the request and processes it. This may involve interacting with backend services, APIs, or a message queue for asynchronous processing tasks. 31 | 32 | 6. **Message Queue (Optional)**: For operations that don't require immediate processing (like sending emails or processing logs), the web server might place tasks in a message queue. Separate worker servers can asynchronously process these tasks later, reducing response time. 33 | 34 | 7. **Cache Check** 35 | 36 | - Before querying the database for data, the web server first checks if the requested data is available in the cache (like Redis or Memcached). 37 | 38 | - If a cache hit occurs (data found in cache), it returns the data to the user, significantly reducing the load on the database. 39 | 40 | 41 | 8. **Database Interaction** 42 | 43 | - If the data is not in the cache (cache miss), the web server queries the database. 44 | 45 | - The database processes the query and returns the results to the web server. 46 | 47 | 48 | 9. **Response to User**: The web server then sends the response back to the user, either the static content from the CDN, the dynamically generated content, or the data retrieved from the database/cache. 49 | 50 | 10. **Closing the Loop**: The web browser or client app receives the response and renders it for the user. 51 | 52 | ## Load Balancer 53 | 54 | image-20231116104529337 55 | 56 | ### Functions 57 | 58 | - **Distribution of Client Requests**: It distributes incoming network traffic across a group of backend servers (also known as a server farm or server pool). This distribution ensures that no single server bears too much demand. By spreading the requests evenly, it increases the application's responsiveness and availability. 59 | - **Health Checks**: Load balancers continually check the health of servers to ensure they can handle requests. If a server fails, the load balancer redirects traffic to the remaining online servers. When the offline server becomes healthy again, the load balancer resumes sending traffic to it. 60 | - **Traffic Management and Optimization**: They can optimize the service delivery based on various metrics like server response time, the number of connections to a server, or even the current load of each server. This ensures users receive the quickest and most reliable access to the resources they request. 61 | 62 | ### Algorithms 63 | 64 | - **Round Robin**: Distributes client requests sequentially among the servers in the pool. 65 | - **Least Connections**: Directs traffic to the server with the fewest active connections. 66 | - **IP Hash**: Determines which server to use based on the client’s IP address, ensuring a user is consistently sent to the same server. 67 | 68 | ## CDN 69 | 70 | image-20231116104607353 71 | 72 | ### Functions 73 | 74 | - **Caching Static Resources**: CDNs store (cache) static content like HTML pages, javascript files, stylesheets, images, and videos. When a user requests this content, it is served from the nearest CDN server rather than the origin server, significantly reducing delivery time. 75 | - **Reducing Bandwidth Costs**: By caching and delivering content from edge servers closer to the user, CDNs reduce the amount of data that the origin server needs to transmit, thus lowering bandwidth consumption and costs. 76 | - **Improving Website Load Times**: By serving content from locations closer to the end-users, CDNs reduce latency and packet loss, resulting in faster loading of web pages. 77 | - **Handling High Traffic Loads**: During traffic spikes, CDNs can efficiently distribute the load, preventing the origin server from becoming overwhelmed, ensuring the website remains available and responsive. 78 | - **Enhancing Security**: Many CDNs offer security features like DDoS protection, web application firewalls, and SSL/TLS encryption to secure data in transit. 79 | 80 | ### Workflow 81 | 82 | - When a user requests a webpage, the CDN redirects this request from the originating site’s server to a server in the CDN that is closest to the user. 83 | - The CDN server checks if it has a cached version of the content. If yes, it delivers this content to the user. If not, it requests the content from the origin server, caches it for future requests, and then delivers it to the user. 84 | 85 | ## Server 86 | 87 | image-20231116110410692 88 | 89 | ### Scaling 90 | 91 | - **Vertical Scaling (Scaling Up/Down)** 92 | - Advantages 93 | - **Simplicity**: Easier to implement as it involves a single machine. 94 | - **Data Management**: Managing data is simpler since everything is stored on a single machine. 95 | - Disadvantages 96 | - **Limited Scalability**: There's a limit to how much you can upgrade a single machine. 97 | - **Downtime**: Upgrading hardware may require downtime. 98 | - **Single Point of Failure**: If the server goes down, the entire service can become unavailable. 99 | - **Cost**: Beyond certain limits, it becomes more expensive to scale vertically compared to horizontally. 100 | - **Horizontal Scaling (Scaling In/Out)** 101 | - Advantages 102 | - **Scalability**: Easily scales to handle increased load by simply adding more machines. 103 | - **Flexibility**: You can add or remove servers as needed, which is cost-effective and efficient. 104 | - **Redundancy**: Offers high availability and fault tolerance, as failure in one server does not bring down the entire system. 105 | - Disadvantages 106 | - **Complexity**: Requires more sophisticated networking and software architectures. 107 | - **Data Consistency**: Keeping data synchronized across servers can be challenging. 108 | 109 | ## Message Queue 110 | 111 | ### Functions 112 | 113 | - **Asynchronous Processing**: Message queues decouple the components of a web application, allowing for asynchronous processing. When a task is sent to the queue, it can be processed at a later time without blocking the sender. 114 | - **Load Balancing**: They help distribute workload evenly across multiple processing nodes, preventing any single node from being overwhelmed by high volumes of requests. 115 | - **Fault Tolerance and Reliability**: In case of a failure in processing, message queues can retain the messages, allowing for reprocessing once the issue is resolved. This enhances the reliability of the application. 116 | - **Scalability**: By decoupling components, message queues enable scaling parts of the system independently based on their individual load and processing requirements. 117 | - **Ordering and Guarantees**: Some message queues can guarantee message ordering and delivery, ensuring that messages are processed in the order they are received and without loss. 118 | 119 | ### Use Cases 120 | 121 | - **Background Processing**: Handling tasks like sending emails, generating reports, or processing analytics in the background, away from the main flow of user requests. 122 | - **Inter-Service Communication**: Facilitating communication in a microservices architecture, where different services interact through messages. 123 | - **Event-Driven Architecture**: Acting as an event bus to propagate events across different parts of the application. 124 | 125 | ## Database 126 | 127 | ### Master Slave 128 | 129 | image-20231116112227032 130 | 131 | 1. **Master Database**: This is the primary database that handles all the write operations (INSERT, UPDATE, DELETE). It's the authoritative source of data. If the master database goes offline, a slave database will be promoted to be the new master. 132 | 2. **Slave Database(s)**: These are the copies of the master database. They handle read operations (SELECT queries). Slaves replicate data from the master database to stay updated. If only one slave database is available and it goes offline, read operations will be directed to the master database temporarily 133 | 134 | ### Sharding 135 | 136 | - **Data Division**: In sharding, data is divided across multiple databases. Each shard contains a subset of the total data. The division can be based on various sharding strategies. 137 | 138 | - **Sharding Key**: A key or attribute is used to determine how data is distributed among the shards. This key is typically a column in the database, like user ID, geographical location, or a hash of one of the fields. 139 | 140 | - **Range-Based Sharding**: Data is partitioned according to a range of values in the sharding key (e.g., date ranges, alphabetical ranges). 141 | 142 | - **Hash-Based Sharding**: A hash function is applied to the sharding key to determine the shard. This helps in evenly distributing data. 143 | 144 | - **List-Based Sharding**: Data is partitioned based on a list of values. For example, different shards may handle data for different countries. 145 | 146 | - **Directory-Based Sharding**: A lookup service is used to map entries to the correct shard. 147 | 148 | ## Cache 149 | 150 | - **Use Case: **Consider using cache when data is read frequently but modified infrequently 151 | - **Expiration Policy:** 152 | - It is advisable not to make the expiration date too short as this will cause the system to reload data from the database too frequently 153 | - it is advisable not to make the expiration date too long as the data can become stale 154 | - **Consistency:** Inconsistency can happen because data-modifying operations on the data store and cache are not in a single transaction 155 | - **Eviction: **Once the cache is full, any requests to add items to the cache might cause existing items to be removed 156 | - LRU: Least Recently Used 157 | - LFU: Least Frequently Used 158 | - FIFO: First In First Out 159 | 160 | ## Session 161 | 162 | To ensure a stateless architecture while handling user session data, the key is to store session information outside of your application servers. This approach allows the servers to process requests without maintaining any user-specific state, enabling scalability and flexibility, particularly in distributed environments. 163 | 164 | ## Automation 165 | 166 | #### Continuous Integration (CI) 167 | 168 | - **Automated Testing**: Automatically run tests whenever new code is committed to the source repository to ensure that new changes don't break the application. 169 | - **Integration of Changes**: Regularly and frequently integrating code changes into a shared repository, reducing integration conflicts. 170 | 171 | #### Continuous Delivery 172 | 173 | - **Automated Release Process**: Automatically preparing and building the application for release to a production-like environment. This ensures the application is always in a deployable state. 174 | 175 | #### Continuous Deployment 176 | 177 | - **Automated Deployment**: Every change that passes the automated tests is automatically deployed to the production environment. -------------------------------------------------------------------------------- /System Design/Web Crawler.md: -------------------------------------------------------------------------------- 1 | # Web Crawler 2 | 3 | ## Requirments 4 | 5 | ### Functional 6 | 7 | - **URL Fetching**: Ability to retrieve web pages by URL. 8 | - **Link Extraction**: Identifying and extracting links from a webpage to other pages. 9 | - **Content Analysis**: Capability to analyze the content of webpages to determine relevance, categorize content, or extract specific information. 10 | - **Robots.txt Compliance**: Respecting website crawl-delay, allow, and disallow directives to avoid overloading web servers. 11 | - **Duplicate Detection**: Mechanism to identify and avoid crawling the same URL multiple times. 12 | 13 | ### Non-Functional 14 | 15 | - **Performance**: High efficiency in processing URLs and analyzing content, with minimal latency. 16 | - **Scalability**: Ability to scale horizontally to handle increases in load, potentially by adding more crawling instances. 17 | - **Reliability**: High uptime and robustness, with mechanisms to recover from crashes or network failures. 18 | 19 | ## High-Level Design 20 | 21 | ![high-level architecture of a distributed web crawler](../Images/1*F9QnGqmAMcXEr1lQeSiqVA.png) 22 | 23 | ### Component 24 | 25 | - **Crawler Frontend** 26 | 27 | - **Scheduler**: Determines the order in which URLs are crawled, based on prioritization algorithms (e.g., depth-first, breadth-first, or priority-based on page importance). 28 | 29 | - **URL Frontier**: A queue of URLs to be visited, managed by the Scheduler. It stores URLs discovered but not yet crawled, ensuring they are unique to avoid duplicate crawling. 30 | 31 | - **URL Fetcher** 32 | 33 | - Responsible for downloading web pages. This component sends HTTP requests to the URLs in the URL Frontier and fetches the web content. 34 | 35 | - Must handle various HTTP methods, status codes, and also respect `robots.txt` directives. 36 | 37 | - **Link Extractor** 38 | 39 | - Parses the fetched web pages to extract links to other pages. 40 | 41 | - This module must be able to handle and normalize relative and absolute URLs, adding the new URLs back to the URL Frontier for crawling. 42 | 43 | - **Content Processor** 44 | 45 | - Analyzes the content of each page to extract useful information, depending on the crawler's purpose (e.g., indexing content, data extraction). 46 | 47 | - Could include processing tasks like text extraction, keyword analysis, and metadata extraction. 48 | 49 | - **Data Storage** 50 | 51 | - Stores the crawled information in a structured format for easy retrieval. This could involve a database or a file system, depending on the scale and purpose of the crawler. 52 | 53 | - Needs to be designed for efficient storage and retrieval, potentially including indexing mechanisms for quick searches. 54 | 55 | - **Duplication Detector** 56 | 57 | - Ensures that the same content or URL is not processed multiple times, saving resources and avoiding data redundancy. 58 | 59 | - Often implemented using a fast lookup data structure, such as a hash table or Bloom filter. 60 | 61 | ### Workflow 62 | 63 | 1. The URL Frontier sends URLs to the URL Fetcher. 64 | 2. The URL Fetcher retrieves the content and passes it to the Link Extractor and Content Processor. 65 | 3. New URLs from the Link Extractor are filtered by the Duplication Detector and added back to the URL Frontier if unique. 66 | 4. Processed content from the Content Processor is stored in the Data Storage system. 67 | 68 | ## Deep Dive 69 | 70 | ### URL Frontier 71 | 72 | - **Partitioning**: The URL frontier should be partitioned across multiple machines to balance the load. Partitioning can be based on URL domain, hash, or other criteria to ensure efficient distribution and minimize bottlenecks. 73 | - **High Availability and Redundancy**: Implement redundancy and data replication strategies to prevent data loss and ensure the URL frontier is always available. 74 | - **Central Coordinator**: A central system or service (like Apache ZooKeeper) that manages the distribution of tasks and coordination between different components of the crawler. It ensures that there are no conflicts and helps in managing the crawler state across the cluster. 75 | - **Task Queue Management**: Use distributed queues (e.g., Apache Kafka, RabbitMQ) to manage URLs to be crawled. These queues serve as the URL Frontier, distributing crawl tasks across multiple crawling instances. 76 | 77 | ### Schema 78 | 79 | URLs Table/Collections: This table stores information about each URL that needs to be or has been crawled. 80 | 81 | - **URL_ID**: A unique identifier for each URL. This could be a hash of the URL. 82 | - **URL**: The actual URL string. 83 | - **Status**: Indicates the crawl status (e.g., pending, in-progress, completed, failed). 84 | - **Priority**: A numerical value indicating the crawl priority of the URL. Higher values indicate higher priority. 85 | - **Last_Crawled**: Timestamp of when the URL was last crawled. 86 | - **Crawl_Frequency**: Indicates how frequently the URL should be recrawled. 87 | - **Robots_Txt**: Cached `robots.txt` rules for the domain, to avoid fetching it before each crawl. 88 | - **Hash_Content**: A hash of the last crawled content, used for detecting changes in the content. 89 | - **Retry_Count**: Number of times the crawl was retried due to failures. 90 | 91 | Content Table/Collections: NoSQL key-value store 92 | 93 | --------------------------------------------------------------------------------