├── README.md └── images ├── DAG.png ├── VCexample.gif ├── adjacencymatrix.png ├── bfs.png ├── bst.png ├── dfs.png ├── doublylinkedlist.png ├── graph.png ├── hashtable.png ├── heap.png ├── heapshape.png ├── huffmantree.png ├── linkedlist.png ├── lovedijk.png ├── mergesort.png ├── mst.PNG ├── queue.png ├── quicksort.png ├── radixsort.png ├── red-blacktrees.png ├── stack.png ├── tree.png └── trie.png /README.md: -------------------------------------------------------------------------------- 1 | # The Ultimate Data Structures & Algorithms Guide 2 | A Comprehensive Data Structures and Algorithms reference guide that originated from a C-based Data Structures and Algorithms course, and have expanded as I have taken more algorithms courses from college and algorithms questions for software engineering interviews. I try to state definitions and examples in my own words to help clarify common questions and precisely compare different algorithms and data structures. I also include other concepts from my upper level algorithms and theory courses to supplement your DS&A knowledge. 3 | 4 | ### Some of the illustrations below are taken from one of my favorite TA's of all time and [their notes from this class](https://charlierose.dev/ref/cs260.pdf) 5 | 6 | ### [To bottom](#hard-algorithms) 7 | 8 | ## Arraylists/Lists/Dynamic Arrays 9 | * O(1) access by index, O(n) by searching (if unsorted) 10 | 11 | ## Hashtables/Hashmaps/Hashsets/Dictionaries/Lookup Tables 12 | * all of these terms mean the same idea: **map keys to values with O(1) lookup** 13 | * stored in linkedlists or arraylists to avoid collisions, known as **chaining** with an **Open Hash Table**, or use **open addressing** to associate keys with sequences and store the new key in the next open item in the sequence with a **Closed Hash Table** (though you need to double the size and rehash values if you run out of space) 14 | * put (key, value) into hash table and get(key) value 15 | ![Alt text](/images/hashtable.png) 16 | 17 | * Dictionaries are just Hash Tables without a hash function, just the literal string key mapping to a value 18 | * Useful for identifying duplicates or keeping track of only distinct values 19 | 20 | ## Linked Lists 21 | * non contigous sequence of nodes that hold data and point to the next node (and previous node if working with a doubly linked list) 22 | * O(n) access to the nth element, but O(1) inserting/removing elements from the beginning of the list 23 | ![Alt text](/images/linkedlist.png) 24 | ![Alt text](/images/doublylinkedlist.png) 25 | 26 | ## Stacks 27 | * LIFO (last on, first off), elements are added on top of the stack 28 | * implemented as linked list or arraylist 29 | * **pop() and push() from the top**, peek(), and isEmpty() are O(1), but O(n) to access a certain element 30 | ![Alt text](/images/stack.png) 31 | 32 | Stacks are often used with backtracking algorithms and can be used with depth first search. Anything dealing with symbolic computation or notation problems often use stacks. 33 | 34 | ## Queues 35 | * FIFO (first in, first out), elements are added to the back of the queue 36 | * also implemented as linked list or arraylist 37 | * **add() to the back, remove() from the front**, peek(), and isEmpty() are also O(1), but O(n) search 38 | ![Alt text](/images/queue.png) 39 | 40 | ### Priority Queues 41 | * elements added via comparison and have ordering or weighting to them 42 | 43 | 44 | ## More useful Array algorithms 45 | 46 | ### Two Pointer 47 | Use two pointers to iterate through an array often bringing array problems from $O(n^2)$ to $O(n)$. This can be done in two ways: 48 | 1. Left pointer at the start of the array and right pointer at the end. While left < right, you move through the entire array at once. The Two Sum series, container with most rainwater, and trapping rainwater are great examples. 49 | 2. You can use the two pointer approach with linked lists by having a slow and fast pointer (pointer iterating through two nodes at a time) for many linkedlist problems. 50 | 51 | ### Sliding Window 52 | 53 | 54 | ## Sorting & Searching Algorithms 55 | Essential: 56 | ### Merge Sort 57 | * O(n log (n)) 58 | * divide array in half, sorts each half and then merge sorts the halves of those halves,... merge sort the halves until you end up merge sorting single element arrays 59 | ![Alt text](/images/mergesort.png) 60 | * the log (n) comes from the dividing of the array in half, and the n comes from having to lineary search the halves for out of order pairs as you merge them back together 61 | 62 | ### Quick Sort 63 | * O(n log (n)), but O(n^2) worst case 64 | * pick random element and use it to partition the array such that all elements that are less than the partition come before and all elements greater come after. Repeatedly partioning the array will become sorted, as you swap any elements out of order 65 | ![Alt text](/images/quicksort.png) 66 | * Quick Sort does everything in place and is **much more memory efficient than merge sort** 67 | 68 | ### Radix Sort 69 | * O(kn) for k passes 70 | * iterate through each digit and group numbers together by each digit 71 | ![Alt text](/images/radixsort.png) 72 | 73 | ### Linear Search 74 | * O(n) 75 | ### Binary Search 76 | * O(log (n)) 77 | * only for sorted arrays & lists 78 | 79 | Nice to Know: 80 | ### Bubble Sort 81 | * O(n^2) 82 | * start at the beginning and swap first two elements if first is greater than the 2nd, then do the same while iterating through the rest of the pairs 83 | ### Selection Sort 84 | * O(n^2) 85 | * find smallest element with a linear scan and move it to the front, then do the same for the 2nd smallest, 3rd,...nth smallest untill the elements are in place 86 | 87 | ## Trees 88 | * Hierarchical data structure with root node having zero or more child nodes, and each child node has zero or more child nodes,... 89 | * acyclic graphs with n-1 edges for n nodes 90 | * leafs are the nodes with no children 91 | * depth is the number of edges from the root, height is the number of edges from the nearest leaf, tree has an overal height as the number of edges from the deepest leaf to the root 92 | ![Alt text](/images/tree.png) 93 | 94 | * Binary trees have at most 2 children and the following properties: 95 | * **Complete** binary tree has every level fully filled with children except possibly the last level which is filled left to right 96 | * **Full** binary trees have all nodes with either 0 or 2 children, but never 1 97 | * **Perfect** binary trees have all nodes with 2 children and all leafs are at the same level 98 | 99 | * 3 ways to traverse a tree (do this recursively): 100 | * Preorder: Node, Left, Right 101 | * Inorder: Left, Node, Right 102 | * Postorder: Left, Right, Node 103 | 104 | 105 | ### Binary Search Trees 106 | * binary tree with every node has its left descendants less than or equal to itself, and its right descendants greater than or equal to itself 107 | * O(logn) search 108 | ![Alt text](/images/bst.png) 109 | 110 | * to delete a node with 2 children, replace the node with either the minimum of the left subtree or maximum of the right subtree; either takes O(n) 111 | * perfect BST has height = 2^(n+1) -1 nodes, for n nodes 112 | 113 | ### Red-Black Trees 114 | * **self balancing** binary search tree 115 | 116 | ![Alt text](/images/red-blacktrees.png) 117 | 118 | 1. every node is red or black 119 | 2. **root** is **black** 120 | 3. **every leaf** is **null and black** 121 | 4. **red node** has **both** its **children black** 122 | 5. all paths from a node to a leaf have same the number of black nodes in between (black-height is constant) 123 | 124 | * rotating red black tree is done in O(1) pointer operations 125 | * to insert a node, color it red 126 | * if insertion causes violation, in one of 3 cases, move violating node up tree until its fixed 127 | 128 | ### Heaps 129 | * complete binary tree (filled left to right, except for maybe the last level) where each node is either smaller (**Min Heap**) or greater (**Max Heap**) than its children 130 | * this is a max heap ↓ 131 | 132 | ![This is a max heap](/images/heap.png) 133 | * for a min heap, the root is the minimum, for a max heap the root is the maximum 134 | ![Alt text](/images/heapshape.png) 135 | 136 | * useful for fast min/max computations in O(1) 137 | * O(n) initialization and O(logn) insertion and deletion 138 | * we need to bubble elements up/down with a min/max heapify recursive function (also called upheaping and downheaping) 139 | 140 | 141 | ### Huffman Coding Tree 142 | * store characters and their frequencies of a string in a min heap; typically implemented as an array 143 | * repeatedly remove the 2 smallest nodes in the heap and create a parent node to combine them and insert the parent node back into the heap 144 | * your heap should have 1 node at the end which is the root node to your huffman tree, located at the first index (the other indicies are merely pointers) 145 | * once you built your tree, generate codes for each character by going down the tree and concatenating a 0 for a left child and 1 for a right child until you reach a leaf; then use your code table to encode a value 146 | ![Alt text](/images/huffmantree.png) 147 | 148 | 149 | ### Tries 150 | * n-array tree with each node a character of the alphabet, with paths down the tree representing words 151 | * are implemented with additional terminating nodes to indicate the end of a word down a path 152 | * useful for checking if something is a valid prefix. Ex. if you store the English language in a trie, then you can easily look up if a string is a prefix of another string in O(n) time where n is the length of the string 153 | ![Alt text](/images/trie.png) 154 | 155 | 156 | ## Graphs 157 | * collection of **verticies** (nodes) containing data connected through **edges**, $$G = (V, E)$$ 158 | * edges can be directed, undirected, weighted, and unweighted 159 | * a **path** is a sequence of vertices connected by edges 160 | * a **cycle** a path where the first and last vertex in a sequence are connected, forming a loop 161 | ![Alt text](/images/graph.png) 162 | 163 | * an undirected graph has n(n+1)/2 maximum edges while a directed graph has (n+1)n edges both for n nodes 164 | * an undirected graph has sum of all degrees equal to twice the number of edges and for a directed graph it has the sum of all degrees equal to the number of edges 165 | 166 | * stored as adjacency list or adjacency matrix where a value or boolean value indicates an edge from node i to j 167 | ![Alt text](/images/adjacencymatrix.png) 168 | * adjacency matricies have a constant O(1) edge lookup, but always uses O(n^2) memory 169 | * adjacency lists have O(n) edge lookup, but best case O(n) memory, worst case O(n^2) 170 | 171 | ### Breadth First Search 172 | * explore each child before going to any of their children in O(V + E) for V verticies and E edges since we visit every vertex once 173 | * important detail: runtime is O(V + 2E) for undirected graphs because you visit every edge twice, O(V + E) for directed graphs because you visit every edge once 174 | * add each child to a **queue** and then go through each child 175 | ![Alt text](/images/bfs.png) 176 | * better for finding shortest path between 2 nodes than DFS because it discovers verticies in increasing order of distance, and which verticies are reachable from a node (quickly finds its neighbors) 177 | ``` 178 | def bfs(graph, node): 179 | visited = {} 180 | queue = [] 181 | visited.add(node) 182 | queue.append(node) 183 | 184 | while len(queue) > 0:# keep adding to the queue and exploring 185 | s = queue.pop(0) #s is the current node 186 | print(s) 187 | for n in graph[s]: #look at all the neighboring nodes of s 188 | if n not in visited: 189 | visited.add(n) 190 | queue.append(n) 191 | ``` 192 | * runtime might seem like O(V^2 + E) (also seems like it for depth first search) because we have a nested loop, but in amortized analysis we show you don't actually visit each vertex more than once because the runtime is the degree of the current vertex + 1 193 | 194 | 195 | ### Depth First Search 196 | * explore each branch completely before moving on to the next branch in O(V + E) for V verticies and E edges 197 | * implemented with a **stack** or recursively 198 | ![Alt text](/images/dfs.png) 199 | * better than BFS if we want to visit every node in a graph 200 | * imposes a tree structure of the graph as it visits down a recursion tree 201 | 202 | ``` 203 | # recursively (better) 204 | visited = {} 205 | def dfs(graph, node, visited): 206 | if node not in visited: 207 | print(node) 208 | visited.add(node) 209 | for n in graph[node]: 210 | dfs(graph, n, visited) 211 | 212 | # iteratively 213 | def dfs(graph, node): 214 | visited = {} 215 | stack = [] 216 | visited.add(node) 217 | stack.append(node) 218 | 219 | while len(stack) > 0: 220 | s = stack.pop(0) 221 | if n not in visited: 222 | print(s) 223 | visited.add(n) 224 | for n in reversed(graph(s)): #because a stack is first in first out, we visit first seen paths 225 | stack.apppend(n) 226 | ``` 227 | 228 | * when doing a DFS on a directed graph, we can have multiple types of edges (u,v): 229 | * Tree eges where v was discovered by exploring edge (u,v) 230 | * Forward edge v is a proper descendant of u (skips ahead in path) 231 | * **Back edge** where v is an ancestor of u (skips backward in path, points to an edge previously visited of u) 232 | * Cross edge is where u and v are not ancestors or descendants of one another (edge usually connects between different trees) 233 | * time stamps in a DFS run allow you to determine ancestor/descent relation and if there's any **cycles** 234 | * **existence of a back edge** proves there's a **cycle** and **no back edges** proves the graph is **acylic** 235 | 236 | ### Directed Acyclic Graphs 237 | * type of graph (directed and acylic) with a start and an end node useful for describing ordering constraints, precedence, dependencies, etc 238 | * A **Topological Sort** can be performed to sort the verticies of a directed acyclic graph with DFS in O(V+E), so there are no back edges, into a linear ordering of the verticies 239 | 240 | ![Alt text](/images/DAG.png) 241 | * strongly connected components of a directed graph has every pair of verticies u and v reachable from each other 242 | * to find them, call DFS to compute finishing times and call DFS again on the graph but invert the directed edges and output the verticies of each tree in the second DFS as a seperate strongly connected component all done in O(V+E) 243 | * Kosaraju's Algorithm gets the number of strongly connected components in O(V) 244 | * loop through pairs of verticies and see if a grouping of pair of nodes can be reached from each other, if so that grouping is a strongly connected component 245 | 246 | ### Minimum Spanning Trees 247 | * tree with minimum total weights that spans all nodes in a weighted graph 248 | * calculated with a **Greedy Algorithm** by choosing the locally optimal solution 249 | * 2^V possible trees and we want to have V-1 edges in our tree 250 | ![Alt-text](/images/mst.PNG) 251 | **Kruskal's Algorithm** 252 | * O(ElogV) 253 | * at each vertex sort edges by weight and include edge in MST if it dosen't form a cycle with the edges already taken (using set unions) 254 | * repeat until there are V-1 edges 255 | 256 | **Prim's Algorithm** 257 | * O(ElogV) 258 | * at each vertex add a 2nd vertex connected to the first through a minimum weight edge and keep connecting descendent nodes with smallest edges using a min heap 259 | 260 | * both algorithms find a minimum spanning tree in O(ElogV), not the most minimum spanning tree 261 | * Prim's is better if you know nothing about your edges 262 | * Kruskal's is better if your edges are already sorted or somewhat sorted 263 | 264 | # Shortest Path Algorithms 265 | Basic ideas: 266 | * sub paths of shortest paths should themselves be shortest paths otherwise the overall path isn't a shortest path 267 | * triangle inequality: 268 | * $\delta (s,v) \leq \delta (s,u)+ \delta (u,v)$ 269 | * adding a detour to a path will usually increase the path length than going directly to the destination 270 | 271 | ### Bellman-Ford Algorithm 272 | * finds shortest path from a source to all other vertices in a weighted graph in O(VE) (with no negative cycles) 273 | * set all nodes to $\infty$ 274 | * starts with an estimate and re-examines edges to converge it to the shortest paths 275 | * iterate through all vertices and re-examine all edges 276 | *set their values to be the shortest path from the source found by adding the values of the previous nodes on the same path 277 | * we **"Relax"** the edges by comparing the new path and the old path we initially found and choosing the smaller of the paths to set the next node to be, otherwise we don't change the next node 278 | * **there can only be |V| - 1 edges in a path**, otherwise |V| or more indicates a cycle since we would have a repeated vertex 279 | * intuitively, after the first iteration we keep checking if the edges offer any improvement to the shortest path 280 | * ends the program and returns if there exists at least one negative weight cycle 281 | * a negative cycle implies there dosen't exist a shortest path 282 | * can terminate early depending on how smart you pick the order of edges to traverse if the edges don't improve anything 283 | 284 | ### Dijkstra's Algorithm 285 | [!Alt-text](https://raw.githubusercontent.com/thelazyaz/data-structures-algorithms/master/images/lovedijk.png) 286 | * finds shortest path from a source to all other vertices in a weighted graph in O(Vlogv + ElogV) (with no negative weights) 287 | * **greedy algorithm** that uses a min heap to select smallest edge to visit 288 | * set all nodes to $\infty$ 289 | * examine edges leaving node and choose the smallest edge that hasn't been seen before: 290 | * gets the smallest edge through extract min with a min heap 291 | * we also upheap the neighboring edges which takes O(logV) so that extract min is in O(1) 292 | * set next node to values of shortest path by relaxing the neighboring edges of the min edge we just extracted 293 | * unlike in Bellman-Ford, we don't relax edges V times, we do it in O(ElogV) for each vertex V 294 | * Dijkstra's dosen't work with negative edges, because once it visits a node it assumes its found the shortest path and won't revisit it 295 | *which means that if a negative edge exists somewhere it may not be found and Dijkstra's won't create the actual shortest path for vertices that could be shortened with negative edges 296 | * This is an issue of the Greedy approach since we assumed that the minimality won't changed if we add a number to any vertex path, which is only true for positive numbers 297 | ``` 298 | def dijkstra(self, start): 299 | D = {v:float('inf') for v in range(self.v)} 300 | D[start] = 0 301 | 302 | pq = PriorityQueue() 303 | pq.put((0, start)) 304 | while not pq.empty(): 305 | (dist, curr) = pq.get() 306 | self.visited.append(curr) 307 | for neighbor in range(self.v): 308 | if self.edges[curr][neighbor] != -1: 309 | distance = self.edges[curr][neighbor] 310 | if neighbor not in self.visited: 311 | old_cost = D[neighbor] 312 | new_cost = D[curr] + distance 313 | if new_cost < old_cost: 314 | pq.put((new_cost, neighbor)) 315 | D[neighbor] = new_cost 316 | return D 317 | 318 | # for single source shortest path between 2 nodes: 319 | self.INF = float("inf") 320 | self.n = n 321 | 322 | adj_list = collections.defaultdict(list) 323 | for u, v, c in edges: 324 | adj_list[u].append((v,c)) 325 | self.adj_list = adj_list 326 | 327 | def shortestPath(self, node1: int, node2: int) -> int: 328 | h = [] 329 | best = [self.INF]*self.n 330 | heapq.heappush(h, (0, node1)) 331 | best[node1] = 0 332 | 333 | while len(h) > 0: 334 | d, current = heapq.heappop(h) 335 | 336 | if current == node2: 337 | return d 338 | if best[current] > d: 339 | continue 340 | for v,c in self.adj_list[current]: 341 | if best[v] > d + c: 342 | best[v] = d + c 343 | heapq.heappush(h, (d + c, v)) 344 | return -1 345 | ``` 346 | 347 | ### Floyd Warshall Algorithm 348 | * finds the shortest paths between all pairs of vertices in O(V^3) in matrix form 349 | * initialize entries to neighboring edges 350 | * for each pair of nodes find the cost of the shortest path by adding intermediary nodes and see if they improve the shortest path: 351 | * given $D^{k-1}(u,v)$, we want to see if: 352 | $$D^{k}(u,v) = D^{k-1}(u,k) + D^{k-1}(k,v)$$ 353 | for an intermediate node k 354 | * if an intermediary node makes a shorter path we update the shortest path for that pair of verticies u and v 355 | * else shortest path remains the same 356 | * since each vertex is considered as an intermediary node at some point the algorith runs the calculation in a triple nested for loop: 357 | ``` 358 | # If D is the adjacency matrix 359 | for v in V: 360 | for i in V: 361 | for j in V[0]: 362 | D[i,j] = min(D[i,j], D[i,k] + D[k, j]) 363 | ``` 364 | * Floyd Warshall can **detect negative cycles** if the **diagonals are negative** 365 | * by comparison, using Dijkstra's to find all pair shortest paths would take O(V^2 * VlogV) and Bellman ford O(V^2 * VE) 366 | 367 | _______ 368 | ***Anything hereafter is beyond the scope of the original class, but does show up in Leetcode and Coding Interview Problems which preparing for is typically the main motivation for most people behind studying these data structures and algorithms*** 369 | 370 | ### Dynamic Programming 371 | * break down a problem into smaller sub-problems (that all depend on each other), store the calculations of those sub-problems in an array or hash table, and then reuse them to find the overall solution 372 | * commonly used whenever you have to find the min/max or optimal of something 373 | * is done through **Memoization** with a recursively called function 374 | Example: 375 | Finding the nth Fibonacci number is the same as the sum of the previous two Fibonacci numbers. With our 2 base cases 376 | ``` 377 | def fib(n): 378 | memo = {} 379 | memo[0] = 0 380 | memo[1] = 1 381 | 382 | def fibHelper(n, memo): 383 | if n in memo: 384 | return memo[n] 385 | else: 386 | memo[n] = fibHelper(n-1, memo) + fibHelper(n-2, memo) 387 | return memo[n] 388 | return fibHelper(n, memo) 389 | ``` 390 | 391 | * is also done with a **Bottom Up** approach where we create an array or hash table, store calculations in it, and iterate through them oftentimes using the index as the key and its calculated value as the value 392 | Example: 393 | For the climbing stairs problem (the same as Fibonacci), you can climb 1 or 2 steps, and to find the distinct number of ways to climb to the top we just reuse the previous two-step calculations 394 | ``` 395 | def climbStairs(self) -> int: 396 | dp = [0]*(n+1) 397 | dp[0] = 1 398 | dp[1] = 1 399 | for i in range(2, n+1): 400 | dp[i] = dp[i-1] + dp[i-2] 401 | return dp[n] 402 | ``` 403 | 404 | * you'll also notice while Dynamic Programming problems can be the most difficult data structures and algorithms problems, they usually have a very simple and short solution to them 405 | 406 | 407 | ### Union Find 408 | * whenever you have a **disjoint** graph or **disjoint** set, this is where Union Find is most naturally used especially for finding number of connected components 409 | * Union merges 2 groups in O(n) by unifying one root node of one group to the root node of the other group to make one of the root nodes be the parent of the other 410 | * Find figures out what group an element belongs to in O(n) by following the parent nodes until a self-loop if reached (a node who's parent is itself) 411 | * You typically: 412 | ``` 413 | def find(node): 414 | while node != parent[node]: 415 | parent[node] = parent[parent[node]] 416 | node = parent[node] 417 | return node 418 | 419 | def union(node1, node2): 420 | p1 = find(node1) 421 | p2 = find(node2) 422 | ``` 423 | * is also used for path compression 424 | * 425 | 426 | ### Knuth Morris Pratt (KMP) Pattern Searching Algorithm 427 | Return all indicies of occurences of a string **pat** within another larger string **txt**. 428 | Whenever we detect a mismatch after some matches, we already know some of the characters in the text of the next window, so we can avoid matching the characters that we know will anyway match. As we slide the window, we only have to compare the next character after the window to avoid recomparing the same *n-1* characters from the previous window. Each value, lps[i] is the length of longest proper prefix of pat[0..i] which is also a suffix of pat[0...i] 429 | 430 | We do this using a longest proper prefix (LPS or suffix) array that tells us the amount of characters to be skipped as a preprocessing step. A proper prefix is a prefix that doesn't include whole string. For example, prefixes of "abc" are "", "a", "ab" and "abc" but proper prefixes are "", "a" and "ab" only. Suffixes of the string are "", "c", "bc", and "abc". 431 | ``` 432 | def constructLps(pat, lps): 433 | 434 | # len stores the length of longest prefix which 435 | # is also a suffix for the previous index 436 | len_ = 0 437 | m = len(pat) 438 | 439 | # lps[0] is always 0 440 | lps[0] = 0 441 | 442 | i = 1 443 | while i < m: 444 | 445 | # If characters match, increment the size of lps 446 | if pat[i] == pat[len_]: 447 | len_ += 1 448 | lps[i] = len_ 449 | i += 1 450 | 451 | # If there is a mismatch 452 | else: 453 | if len_ != 0: 454 | 455 | # Update len to the previous lps value 456 | # to avoid redundant comparisons 457 | len_ = lps[len_ - 1] 458 | else: 459 | 460 | # If no matching prefix found, set lps[i] to 0 461 | lps[i] = 0 462 | i += 1 463 | 464 | def search(pat, txt): 465 | n = len(txt) 466 | m = len(pat) 467 | 468 | lps = [0] * m 469 | res = [] 470 | 471 | constructLps(pat, lps) 472 | 473 | # Pointers i and j, for traversing 474 | # the text and pattern 475 | i = 0 476 | j = 0 477 | 478 | while i < n: 479 | 480 | # If characters match, move both pointers forward 481 | if txt[i] == pat[j]: 482 | i += 1 483 | j += 1 484 | 485 | # If the entire pattern is matched 486 | # store the start index in result 487 | if j == m: 488 | res.append(i - j) 489 | 490 | # Use LPS of previous index to 491 | # skip unnecessary comparisons 492 | j = lps[j - 1] 493 | 494 | # If there is a mismatch 495 | else: 496 | 497 | # Use lps value of previous index 498 | # to avoid redundant comparisons 499 | if j != 0: 500 | j = lps[j - 1] 501 | else: 502 | i += 1 503 | return res 504 | ``` 505 | Code from: https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/ 506 | 507 | 508 | # Hard Algorithms 509 | By **hard**, we mean a problem that does not have a polynomial solution to it. Like how when you play chess, you must make a tree of moves. All problems are either decision problems (yes or no answer), or optimization problems (find min or max of something). You can convert an optimization problem to a decision problem if you reword the problem by asking if such an optimization of a value exists. 510 | 511 | Given a problem **P**, if you want to prove its as hard as an NP-Hard problem, you can **reduce** it (reduction itself should be done in polynomial time). Given two problems **A** and **B**: 512 | * if I can convert **A** => **B** 513 | 1. If **A** is hard, then **B** is at least as hard 514 | 2. If **B** is hard, then we don't know about **A**. We can use a hard problem to solve an easy problem, but we can't use an easy problem to solve a hard problem 515 | 3. If **A** is easy, **B** could be anything. Just because you can't solve **A** in polynomial time with **B** dosen't mean you can't use any other polynomial algorithm to try to solve **A** 516 | 4. If **B** is easy, **A** is easy 517 | 518 | * **NP** is Nondeterministic Polynomial means a **solution** to an NP problem can be **verified in polynomial time** but we don't know if it's solvable in polynomial time 519 | * **NP-Complete** is NP problems that are also as hard as NP-Hard problems. If any NP-complete problem can be solved quickly (in polynomial time), then all problems in NP can also be solved quickly 520 | * **NP-Hard** is a problem at least as hard but can be harder than NP. We don't know if it's solvable or verifiable in P. If you can solve an NP-Hard problem in polynomial time, you can solve all NP problems in polynomial time 521 | 522 | * Cook Levin theorem says all NP Problems can be converted into a satisfiability problem 523 | 524 | * For proving hardness: 525 | * Source problem should be as simple and restricted while the target is as hard as possible 526 | * Source problem should be: 527 | * 3SAT 528 | * Integer Partition 529 | * Vertex Cover 530 | * Hamiltonian Cycle 531 | 532 | ## Common NP-Complete Reduction Algorithms: 533 | * 3 Variable Satisfiability (3SAT, or really any SAT): 534 | * Generate all $2^{n}$ permutations of true/false values for each $k$ variables 535 | * for each permutation: 536 | * try the variables to see if they work and if they do, return the permutation 537 | * $\Theta(2^{n} \cdot k)$ worst case you have to try all $k$ clauses for all $n$ variables 538 | * $O(2^{n} \cdot n^{3})$ for 3SAT where we have $2n$ choices for $3$ clauses and ~$n^3$ max number of unique clauses from $2n \choose 3$ 539 | * Hamiltonian Cycle to Travelling Salesman Problem 540 | * Hamiltonian Cycle checks if all nodes can be visited once in a graph 541 | * for every vertex 542 | * for every vertex 543 | * if (i, j) $\in E$, w(i, j) = 1 meaning we set the edges in the original graph to 1 544 | * else w(i, j) = 2 where we set the the introduced edges to 2 545 | * return Travelling Salesman for size n 546 | * This reduction algorithm runs in $O(V^{2}$) 547 | * Vertex Cover to independent Set 548 | * is there a set of $k$ vertices such that every edge contains at least 1 vertex in the set? 549 | * Invert the edges 550 | * set $k^{'} = |V| - k$ 551 | * return Independent Set(Inverted Graph, complementary k) where independent set finds an independent set of $k$ vertices 552 | * Independent Set to Max Clique 553 | * Max Clique asks if a graph has a clique (set of vertices where every pair of vertices defines an edge) 554 | * Independent Set 555 | * Invert the graph 556 | * for every edge (i, j) not in $E$: 557 | * add (i, j) to inverted edges of inverted graph 558 | * Return Clique(inverted graph, k vertices) 559 | * 3SAT to Vertex Cover 560 | * create $2V$ variables so that each vertex can be either included or not included in the cover to have $n$ variables. Put an edge between each variable and their negation. 561 | * for each $k$ clauses create "gadgets" that connect the clause vertices in a triangle. Put edge between variable and their places in clause triangle 562 | * If vertex cover of size $n + 2k$ exists then that implies the original expression is satisfiable 563 | ![Alt text](/images/VCexample.gif) 564 | -------------------------------------------------------------------------------- /images/DAG.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/DAG.png -------------------------------------------------------------------------------- /images/VCexample.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/VCexample.gif -------------------------------------------------------------------------------- /images/adjacencymatrix.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/adjacencymatrix.png -------------------------------------------------------------------------------- /images/bfs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/bfs.png -------------------------------------------------------------------------------- /images/bst.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/bst.png -------------------------------------------------------------------------------- /images/dfs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/dfs.png -------------------------------------------------------------------------------- /images/doublylinkedlist.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/doublylinkedlist.png -------------------------------------------------------------------------------- /images/graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/graph.png -------------------------------------------------------------------------------- /images/hashtable.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/hashtable.png -------------------------------------------------------------------------------- /images/heap.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/heap.png -------------------------------------------------------------------------------- /images/heapshape.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/heapshape.png -------------------------------------------------------------------------------- /images/huffmantree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/huffmantree.png -------------------------------------------------------------------------------- /images/linkedlist.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/linkedlist.png -------------------------------------------------------------------------------- /images/lovedijk.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/lovedijk.png -------------------------------------------------------------------------------- /images/mergesort.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/mergesort.png -------------------------------------------------------------------------------- /images/mst.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/mst.PNG -------------------------------------------------------------------------------- /images/queue.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/queue.png -------------------------------------------------------------------------------- /images/quicksort.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/quicksort.png -------------------------------------------------------------------------------- /images/radixsort.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/radixsort.png -------------------------------------------------------------------------------- /images/red-blacktrees.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/red-blacktrees.png -------------------------------------------------------------------------------- /images/stack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/stack.png -------------------------------------------------------------------------------- /images/tree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/tree.png -------------------------------------------------------------------------------- /images/trie.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/azavalny/data-structures-algorithms/e7cccfbc8792bf6de7b6c226755d042fb1b4a1c8/images/trie.png --------------------------------------------------------------------------------