├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 ssivart 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # TOC 2 | * [簡介](#簡介) 3 | * [陣列 Array](#陣列-arrays) 4 | * [連結串列 Linked List & 雙向連結串列 Double Linked List](#連結串列-linked-list--雙向連結串列-double-linked-list) 5 | * [堆疊 Stack](#堆疊-stack) 6 | * [佇列 Queue](#佇列-queue) 7 | * [二元搜尋樹 Binary Search Tree](#二元搜尋樹-binary-search-tree) 8 | * [平衡二元搜尋樹 Balancing Binary Search Tree, AVL Tree](#平衡二元搜尋樹-balancing-binary-search-tree-avl-tree) 9 | * [紅黑樹 Red-Black Tree](#紅黑樹-red-black-tree) 10 | * [二元堆積 Binary Heap](#二元堆積-binary-heap) 11 | * [關聯陣列/對映/字典 Associative Array/ Map/ Dictionary](#關聯陣列對映字典-associative-array-map-dictionary) 12 | * [三元搜尋樹 Ternary Search Tree](#三元搜尋樹-ternary-search-tree-tst) 13 | 14 | # 簡介 15 | 16 | ### 什麼是資料結構?為什麼要使用資料結構? 17 | 18 | 是電腦中儲存、組織資料的方式,可以讓我們**有效地儲存資料**,並讓所有運算能最有效率地完成 19 | 20 | 演算法的運行時間是根據資料結構決定的,所以使用適當的資料結構來降低演算法的時間複雜度,如: 21 | 22 | * 最短路徑演算法若無適當的資料結構,運行時間是O(N^2),使用(heap/priority queue)可以大幅降低運行時間至O(N*logN) 23 | 24 | ### 抽象資料型態 Abstract Data Types 25 | 簡單而言,ADT是針對資料結構的「規範」或「描述」,像是物件導向語言裡面的interface,但不會實作細節 26 | 27 | 舉例堆疊的ADT描述: 28 | 29 | * push(): 插入元素 item 至堆疊頂端 30 | * pop(): 移除並回傳堆疊頂端的元素 31 | * peek(): 看堆疊頂端的資料而不取出 32 | * size(): 看堆疊的長度 33 | 34 | ### ADT跟資料結構的關係 35 | 每個ADT在底層都有相對應的資料結構去實作ADT裡定義過的行為(method) 36 | 37 | | ADT | Data Structures | 38 | |--------------------|--------------------| 39 | | Stack | array, linked list | 40 | | Queue | array, linked list | 41 | | Priority Queue | heap | 42 | | Dictionary/Hashmap | array | 43 | 44 | ### 時間複雜度 Big O notation 45 | 描述演算法的效率(複雜度),舉例來說,A宅想要分享他的D槽給B宅,有以下幾種做法: 46 | 1. 從**台北**騎車到**屏東**B宅家 47 | 2. 用網路傳輸,不考慮被FBI攔截的情況 48 | 49 | | | 1GB | 1TB | 500TB | 50 | |--------------|---------|----------|-------------| 51 | | 騎車運送硬碟 | 600 min | 600 min | 600 min | 52 | | 網路傳輸 | 3 min | 3072 min | 1536000 min | 53 | 54 | 從上表來看,騎車這個選項雖然聽起來很蠢,但不管硬碟有多大,都能確保10個小時內可以送達—— `O(1)`;至於網路傳輸隨著檔案越大,所需的時間也越長 —— `O(N)`;從這裡就可以看出常數時間(constant time)和線性時間(linear time)的差別對效率的影響有多大了 55 | 56 | 在表現複雜度函數的時候,有幾個通用的規則: 57 | 58 | * 多個步驟用加法: **O(a+b)** 59 | 60 | ```python 61 | def func(): 62 | # step a 63 | # step b 64 | ``` 65 | 66 | * 省略常數: ~~O(3n)~~ **O(n)** 67 | 68 | ```python 69 | def func(lst): 70 | for i in lst: # O(n) 71 | # do something ... 72 | for i in lst: # O(n) 73 | # do something ... 74 | for i in lst: # O(n) 75 | # do something ... 76 | ``` 77 | 78 | * 不同的input用不同的變數表示: ~~O(N^2)~~ **O(a*b)** 79 | 80 | ```python 81 | def func(la, lb): 82 | for a in la: 83 | for b in lb: 84 | # do something ... 85 | ``` 86 | 87 | * 省略影響不大的變數: ~~O(n+n^2)~~ **O(n^2)** 88 | 89 | ``` 90 | O(n^2) <= O(n+n^2) <= O(n^2 + n^2) 91 | ``` 92 | 93 | ```python 94 | # n^2是主導的變項,所以省略n 95 | def func(la): 96 | 97 | for a in la: # O(n) 98 | # do something ... 99 | 100 | for a in la: # O(n^2) 101 | for b in la: 102 | # do something 103 | ``` 104 | 105 | # 陣列 Array 106 | 107 | 物件或值的集合,每個物件或值可以被陣列的索引(index, key)識別 108 | 109 | * 索引從0開始 110 | * 因為有索引,我們可以對陣列做**隨機存取**(Random Access) 111 | 112 | 優點: 113 | 114 | * 隨機存取不用搜尋就能訪問陣列當中所有值,執行速度快O(1) 115 | * 不會因為鏈結斷裂而遺失資料 116 | * 循序存取快 117 | 118 | 缺點: 119 | 120 | * 重建或插入陣列須要逐一複製裏頭的值,時間複雜度是O(N) 121 | * 編譯的時候必須事先知道陣列的大小,這讓陣列這個資料結構不夠動態(dynamic) 122 | * 通常陣列只能存同一種型別 123 | * 不支援連結串列的共享 124 | 125 | ### Implements 126 | 127 | | | 行為 | big O | 128 | |------------|--------------|-------| 129 | | search | 搜尋 | O(1) | 130 | | insert | 插入第一項 | O(N) | 131 | | append | 插入最後一項 | O(1) | 132 | | remove | 移除第一項 | O(N) | 133 | | removeLast | 移除最後一項 | O(1) | 134 | 135 | ### 以Python實作 136 | 137 | **random indexing: O(1)** 138 | ```python 139 | arr = [1, 2, 3] 140 | arr[0] 141 | ``` 142 | 143 | **linear search: O(n)** 144 | ```python 145 | max = arr[0] 146 | for i in arr: 147 | if i > max: 148 | max = i 149 | ``` 150 | 151 | # 連結串列 Linked List & 雙向連結串列 Double Linked List 152 | 153 | * 節點包含`data`和`referenced object` 154 | * 連結的方式是節點(node)記住其他節點的參考(reference) 155 | * 最後一個節點的參考是NULL 156 | 157 | 優點 158 | 159 | * 各節點型態、記憶體大小不用相同 160 | * 動態佔用的記憶體,不須事先宣告大小 161 | * 插入、刪除快O(1) 162 | 163 | 缺點 164 | 165 | * 不支援隨機存取,只能循序存取(sequencial access),時間複雜度為O(N) 166 | * 須額外空間儲存其他節點的參考 167 | * 可靠性較差,連結斷裂容易遺失資料 168 | * 難以向前(backward)訪問,可以用雙向連結串列來處理,不過會多佔用記憶體空間 169 | 170 | ### Implements 171 | 172 | | | 行為 | big O | 173 | |-------------|--------------|-------| 174 | | search | 搜尋 | O(N) | 175 | | insert | 插入第一項 | O(1) | 176 | | append | 插入最後一項 | O(N) | 177 | | remove | 移除第一項 | O(1) | 178 | | removeLast | 移除最後一項 | O(N) | 179 | 180 | 註:連結串列沒有index,處理插入或移除第N項會需要先循序找到插入/移除位置,因此會需要O(N)的時間 181 | 182 | ### 以Python實作 183 | 184 | 以下的代碼是我實作的範例,有錯誤煩請指正。 185 | 186 | 主要概念是實作`__getitem__`來循序存取(indexing),另外Double Linked List支援反向存取,故訪問`lst[0]`和`lst[-1]`皆可以達成O(1)的時間複雜度 187 | 188 | 執行結果請參考[travishen/gist/linked-list.md](https://gist.github.com/travishen/df37a04582c48d386781077742908107) 189 | 190 | ```python 191 | from collections import Iterable 192 | 193 | class Node: 194 | def __init__(self, data=None, next_node=None): 195 | self.data = data 196 | self.next_node = next_node 197 | 198 | def __repr__(self): 199 | return 'Node(data={!r}, next_node={!r})'.format(self.data, self.next_node) 200 | 201 | class LinkedList(object): 202 | def __init__(self, inital_nodes=None): 203 | self.head = None 204 | self.inital_nodes = inital_nodes 205 | # garbage collect 206 | for node in self: 207 | del node 208 | if isinstance(inital_nodes, Iterable): 209 | for node in reversed(list(inital_nodes)): 210 | self.insert(node) # insert to head 211 | elif inital_nodes: 212 | raise NotImplementedError('Inital with not iterable object') 213 | 214 | def __repr__(self): 215 | return 'LinkedList(inital_nodes={!r})'.format(self.inital_nodes) 216 | 217 | def __len__(self): 218 | count = 0 219 | for node in self: 220 | count += 1 221 | return count 222 | 223 | def __setitem__(self, index, data): 224 | self.insert(data, index) 225 | 226 | def __delitem__(self, index): 227 | self.remove(index, by='index') 228 | 229 | def __getitem__(self, index): 230 | count = 0 231 | current = self.head 232 | index = self.positive_index(index) 233 | while count < index and current is not None: 234 | current = current.next_node 235 | count += 1 236 | if current: 237 | return current 238 | else: 239 | raise IndexError 240 | 241 | def positive_index(self, index): # inplement negative indexing 242 | """ 243 | Use nagative indexing will increase O(N) time complexity 244 | We can improve it with double linded list 245 | """ 246 | if index < 0: 247 | index = len(self) + index 248 | return index 249 | 250 | def insert(self, data, index=0): 251 | index = self.positive_index(index) 252 | if self.head is None: # initial 253 | self.head = Node(data, None) 254 | elif index == 0: # insert to head 255 | new_node = Node(data, self.head) 256 | self.head = new_node 257 | else: # insert to lst[index] 258 | last_node = self[index] 259 | last_node.next_node = Node(data, last_node.next_node) 260 | return None # this instance has changed and didn't create instance 261 | 262 | def search(self, data): 263 | for node in self: 264 | if node.data == data: 265 | return node 266 | return None 267 | 268 | def remove(self, data_or_index, by='data'): 269 | for i, node in enumerate(self): 270 | if (by == 'data' and node.data == data_or_index) or (by == 'index' and i == data_or_index): 271 | if i == 0: 272 | self.head = node.next_node 273 | node.next_node = None 274 | else: 275 | prev_node.next_node = node.next_node 276 | break 277 | prev_node = node 278 | return None # this instance has changed and didn't create instance 279 | 280 | class DoubleLinkedNode(Node): 281 | def __init__(self, data=None, last_node=None, next_node=None): 282 | self.data = data 283 | self.next_node = next_node 284 | self.last_node = last_node 285 | if next_node: 286 | next_node.last_node = self 287 | 288 | class DoubleLinkedList(LinkedList): 289 | def __init__(self, *args, **kwargs): 290 | self.foot = None 291 | super(DoubleLinkedList, self).__init__(*args, **kwargs) 292 | 293 | def __repr__(self): 294 | return 'DoubleLinkedList(inital_nodes={})'.format(self.inital_nodes) 295 | 296 | def __getitem__(self, index): 297 | """ 298 | Support negative indexing in O(N) by setting footer 299 | """ 300 | count = 0 301 | if index >= 0: 302 | current = self.head 303 | while count < index and current is not None: 304 | current = current.next_node 305 | count += 1 306 | else: 307 | current = self.foot 308 | while count > (index + 1) and current is not None: 309 | current = current.last_node 310 | count -= 1 311 | if current: 312 | return current 313 | else: 314 | raise IndexError 315 | 316 | def insert(self, data, index=0): 317 | if self.head is None: # initial 318 | self.head = self.foot = DoubleLinkedNode(data, None, None) 319 | elif index == 0: # insert to head 320 | new_node = DoubleLinkedNode(data, None, self.head) 321 | self.head = new_node 322 | else: # insert to lst[index] 323 | last_node = self[index] 324 | last_node.next_node = DoubleLinkedNode(data, last_node, last_node.next_node) 325 | if last_node.next_node.next_node is None: # set foot 326 | self.foot = last_node.next_node 327 | return None # this instance has changed and didn't create instance 328 | ``` 329 | 330 | ### Linked List現實中的應用 331 | 332 | 1. 低級別的內存管理(Low Level Memory Management),以C語言為例: 333 | 334 | * `malloc()`、 `free()`: 見[Heap Management](https://www.syslinux.org/wiki/index.php?title=Heap_Management) 335 | * `chart * chart_ptr = (chart*)malloc(30);`: 取得30byte的heap memory 336 | 337 | 2. 許多Windows的應用程式:工具列視窗切換、PhotoViewer 338 | 339 | 3. 區塊鏈技術 340 | 341 | ![image](https://i.imgur.com/FcqNnmz.png) 342 | [[圖片來源]](https://codingislove.com/simple-blockchain-javascript/) 343 | 344 | # 堆疊 Stack 345 | 346 | * 推疊是一種抽象資料型態,特性是先進後出(LIFO, last in first out) 347 | * 在高階程式語言,容易用array、linked list來實作 348 | * 大部分的程式語言都是Stack-Oriented,因為仰賴堆疊來處理method call(呼叫堆疊, Call Stack)。可參考[Call Stack, Scope & Lifetime of Variables](https://www.youtube.com/watch?v=1cPSeJLspT8),以及[Python Function Calls and the Stack](https://www.cs.ucsb.edu/~pconrad/cs8/topics.beta/theStack/02/) 349 | 350 | ### Implements 351 | 352 | | | 行為 | big O | 353 | |------|--------------------------|-------| 354 | | push | 將資料放入堆疊的頂端 | O(1) | 355 | | pop | 回傳堆疊頂端資料 | O(1) | 356 | | peek | 看堆疊頂端的資料而不取出 | O(1) | 357 | 358 | ### 應用 359 | * call stack + stack memory 360 | * 深度優先搜尋演算法(Depth-First-Search) 361 | * 尤拉迴路(Eulerian Circuit) 362 | * 瀏覽器回上一頁 363 | * PhotoShop上一步(undo) 364 | 365 | 註:任何遞迴(recursion)形式的演算法,都可以用Stack改寫,例如DFS。不過就算我們使用遞迴寫法,程式最終被parsing還是Stack 366 | 367 | ```python 368 | def factorial(n, cache={}): 369 | if n == 0: # declare base case to prevent stack overflow 370 | return 1 371 | return n * factorial(n-1) 372 | ``` 373 | 374 | ### Stack memory vs Heap memory 375 | 可參考[Stack vs. Heap](https://medium.com/joe-tsai/stack-vs-heap-b4bd500667cd) 376 | 377 | | stack memory | heap memory | 378 | |----------------------------|-------------------------------| 379 | | 有限的記憶體配置空間 | 記憶體配置空間較大 | 380 | | 存活時間規律可預測的 | 存活時間不規律不可預測的 | 381 | | CPU自動管理空間(GC) | 使用者自主管理空間 | 382 | | 區域變數宣告的空間不能更動 | 物件的值可以變動,如realloc() | 383 | 384 | 另外ptt有針對兩者佔用記憶體大小的討論[stack v.s. heap sizes](https://www.ptt.cc/man/C_and_CPP/DD8B/M.1460666895.A.07A.html) 385 | 386 | ### 以Python實作 387 | ```python 388 | class Stack(object): 389 | def __init__(self, initial_data): 390 | self.stack = [] 391 | self.initial_data = initial_data 392 | if isinstance(initial_data, Iterable): 393 | self.stack = list(initial_data) 394 | else: 395 | raise NotImplementedError('Inital with not iterable object') 396 | 397 | def __repr__(self): 398 | return 'Stack(initial_data={!r})'.format(self.initial_data) 399 | 400 | def __len__(self): 401 | return len(self.stack) 402 | 403 | def __getitem__(self, i): 404 | return self.stack[i] 405 | 406 | @property 407 | def is_empty(self): 408 | return len(self.stack) == 0 409 | 410 | def push(self, data): 411 | self.stack.append(data) 412 | 413 | def pop(self): 414 | if not self.is_empty: 415 | return self.stack.pop() 416 | 417 | def peek(self): 418 | return self.stack[-1] 419 | ``` 420 | Using Lists as Stacks 421 | ``` 422 | >>> stack = [3, 4, 5] 423 | >>> stack.append(6) 424 | >>> stack.append(7) 425 | >>> stack 426 | [3, 4, 5, 6, 7] 427 | >>> stack.pop() 428 | 7 429 | >>> stack 430 | [3, 4, 5, 6] 431 | >>> stack.pop() 432 | 6 433 | >>> stack.pop() 434 | 5 435 | >>> stack 436 | [3, 4] 437 | ``` 438 | 439 | # 佇列 Queue 440 | * 佇列是一種抽象資料型態,特性是先進先出(FIFO, first in first out) 441 | * 在高階程式語言,容易用array、linked list來實作 442 | 443 | ### 應用 444 | * 多個程序的資源共享,例如CPU排程 445 | * 非同步任務佇列,例如I/O Buffer 446 | * 廣度優先搜尋演算法(Depth-First-Search) 447 | 448 | ### 以Python實作 449 | ```python 450 | class Queue(object): 451 | def __init__(self, initial_data): 452 | self.queue = [] 453 | self.initial_data = initial_data 454 | if isinstance(initial_data, Iterable): 455 | self.queue = list(initial_data) 456 | else: 457 | raise NotImplementedError('Inital with not iterable object') 458 | 459 | def __repr__(self): 460 | return 'Queue(initial_data={!r})'.format(self.initial_data) 461 | 462 | def __len__(self): 463 | return len(self.queue) 464 | 465 | def __getitem__(self, i): 466 | return self.queue[i] 467 | 468 | @property 469 | def is_empty(self): 470 | return len(self.queue) == 0 471 | 472 | def enqueue(self, data): 473 | return self.queue.append(data) 474 | 475 | def dequeue(self): 476 | return self.queue.pop(0) 477 | 478 | def peek(self): 479 | return self.queue[0] 480 | ``` 481 | 參考 482 | 483 | * [multiprocessing實作的的Queue](https://github.com/python/cpython/blob/master/Lib/multiprocessing/queues.py) 484 | * Using Lists as Queues 485 | ```python 486 | >>> from collections import deque 487 | >>> queue = deque(["Eric", "John", "Michael"]) 488 | >>> queue.append("Terry") # Terry arrives 489 | >>> queue.append("Graham") # Graham arrives 490 | >>> queue.popleft() # The first to arrive now leaves 491 | 'Eric' 492 | >>> queue.popleft() # The second to arrive now leaves 493 | 'John' 494 | >>> queue # Remaining queue in order of arrival 495 | deque(['Michael', 'Terry', 'Graham']) 496 | ``` 497 | 498 | # 二元搜尋樹 Binary Search Tree 499 | 主要的優點就是時間複雜度能優化至O(logN) 500 | 501 | * 每個節點最多有兩個子節點 502 | * 子節點有左右之分 503 | * 左子樹的節點小於根節點、右子樹的節點大於根節點 504 | * 節點值不重複 505 | 506 | | | Average case | Worst case | 507 | |--------|--------------|------------| 508 | | insert | O(logN) | O(N) | 509 | | delete | O(logN) | O(N) | 510 | | search | O(logN) | O(N) | 511 | 512 | 以Python實作insert, remove, search,執行結果請參考[gist](https://gist.github.com/travishen/c4cc5797f8905b2a5b90f2545c374a26) 513 | ```python 514 | class Node(object): 515 | def __init__(self, data): 516 | self._left, self._right = None, None 517 | self.data = int(data) 518 | 519 | def __repr__(self): 520 | return 'Node({})'.format(self.data) 521 | 522 | @property 523 | def left(self): 524 | return self._left 525 | 526 | @left.setter 527 | def left(self, node): 528 | self._left = node 529 | 530 | @property 531 | def right(self): 532 | return self._right 533 | 534 | @right.setter 535 | def right(self, node): 536 | self._right = node 537 | 538 | class BinarySearchTree(object): 539 | def __init__(self, root=None): 540 | self.root = root 541 | self.search_mode = 'in_order' 542 | 543 | 544 | # O(logN) time complexity if balanced, it could reduce to O(N) 545 | def insert(self, data, **kwargs): 546 | """Insert from root""" 547 | BinarySearchTree.insert_node(self.root, data, **kwargs) 548 | 549 | # O(logN) time complexity if balanced, it could reduce to O(N) 550 | def remove(self, data): 551 | """Insert from root""" 552 | BinarySearchTree.remove_node(self.root, data) 553 | 554 | @staticmethod 555 | def insert_node(node, data, **kwargs): 556 | node_consturctor = kwargs.get('node_constructor', None) or Node 557 | if node: 558 | if data < node.data: 559 | if node.left is None: 560 | node.left = node_consturctor(data) 561 | else: 562 | BinarySearchTree.insert_node(node.left, data, **kwargs) 563 | elif data > node.data: 564 | if node.right is None: 565 | node.right = node_consturctor(data) 566 | else: 567 | BinarySearchTree.insert_node(node.right, data, **kwargs) 568 | else: 569 | node.data = data 570 | return node 571 | 572 | @staticmethod 573 | def remove_node(node, data): 574 | 575 | if not node: 576 | return None 577 | 578 | if data < node.data: 579 | node.left = BinarySearchTree.remove_node(node.left, data) 580 | elif data > node.data: 581 | node.right = BinarySearchTree.remove_node(node.right, data) 582 | else: 583 | if not (node.left and node.right): # leaf 584 | del node 585 | return None 586 | if not node.left: 587 | tmp = node.right 588 | del node 589 | return tmp 590 | if not node.right: 591 | tmp = node.left 592 | del node 593 | return tmp 594 | predeccessor = BinarySearchTree.get_max_node(node.left) 595 | node.data = predeccessor.data 596 | node.left = BinarySearchTree.remove_node(node.left, predeccessor.data) 597 | return node 598 | 599 | def get_min(self): 600 | return self.get_min_node(self.root) 601 | 602 | @staticmethod 603 | def get_min_node(node): 604 | if node.left: 605 | return BinarySearchTree.get_max_node(node.left) 606 | return node 607 | 608 | def get_max(self): 609 | return self.get_max_node(self.root) 610 | 611 | @staticmethod 612 | def get_max_node(node): 613 | if node.right: 614 | return BinarySearchTree.get_max_node(node.right) 615 | return node 616 | 617 | def search_decorator(func): 618 | def interface(*args, **kwargs): 619 | res = func(*args, **kwargs) 620 | if isinstance(res, Node): 621 | return res 622 | elif 'data' in kwargs: 623 | for node in res: 624 | if node.data == kwargs['data']: 625 | return node 626 | return res 627 | return interface 628 | 629 | @staticmethod 630 | @search_decorator 631 | def in_order(root, **kwargs): 632 | """left -> root -> right""" 633 | f = BinarySearchTree.in_order 634 | res = [] 635 | if root: 636 | left = f(root.left, **kwargs) 637 | if isinstance(left, Node): 638 | return left 639 | right = f(root.right, **kwargs) 640 | if isinstance(right, Node): 641 | return right 642 | res = left + [root] + right 643 | return res 644 | 645 | @staticmethod 646 | @search_decorator 647 | def pre_order(root, **kwargs): 648 | """root -> left -> right""" 649 | f = BinarySearchTree.pre_order 650 | res = [] 651 | if root: 652 | left = f(root.left, **kwargs) 653 | if isinstance(left, Node): 654 | return left 655 | right = f(root.right, **kwargs) 656 | if isinstance(right, Node): 657 | return right 658 | res = [root] + left + right 659 | return res 660 | 661 | @staticmethod 662 | @search_decorator 663 | def post_order(root, **kwargs): 664 | """root -> right -> root""" 665 | f = BinarySearchTree.post_order 666 | res = [] 667 | if root: 668 | left = f(root.left, **kwargs) 669 | if isinstance(left, Node): 670 | return left 671 | right = f(root.right, **kwargs) 672 | if isinstance(right, Node): 673 | return right 674 | res = left + right + [root] 675 | return res 676 | 677 | def traversal(self, 678 | order:"in_order|post_order|post_order"=None, 679 | data=None): 680 | order = order or self.search_mode 681 | if order == 'in_order': 682 | return BinarySearchTree.in_order(self.root, data=data) 683 | elif order == 'pre_order': 684 | return BinarySearchTree.pre_order(self.root, data=data) 685 | elif order == 'post_order': 686 | return BinarySearchTree.post_order(self.root, data=data) 687 | else: 688 | raise NotImplementedError() 689 | 690 | def search(self, data, *args, **kwargs): 691 | return self.traversal(*args, data=data, **kwargs) 692 | ``` 693 | 694 | ### BST現實中的應用 695 | 696 | * OS file system 697 | * 機器學習:決策樹 698 | 699 | # 平衡二元搜尋樹 Balancing Binary Search Tree, AVL Tree 700 | * 能保證O(logN)的時間複雜度 701 | * 每次insert, delete都要檢查平衡,非平衡需要額外做rotation 702 | * 判斷是否平衡: 703 | - `左子樹高度 - 右子樹高度 > 1`: rotate to right 704 | - `左子樹高度 - 右子樹高度 < -1`: rotate to left 705 | - ![image](https://storage.googleapis.com/ssivart/super9-blog/not-balancing-tree.png) 706 | 707 | | | Average case | Worst case | 708 | |--------|--------------|------------| 709 | | insert | O(logN) | O(logN) | 710 | | delete | O(logN) | O(logN) | 711 | | search | O(logN) | O(logN) | 712 | 713 | 不適合用在排序,時間複雜度為O(N*logN) 714 | 715 | * 插入n個:O(N*logN) 716 | * in-order迭代:O(N) 717 | 718 | 繼承上面BST繼續往下實作,有bug請協助指正,執行結果請參考[gist](https://gist.github.com/travishen/c4cc5797f8905b2a5b90f2545c374a26) 719 | 720 | * 任一節點設定完left或right,更新該節點height 721 | * 每個insert的call stack檢查檢查節點是否平衡,不平衡則rotate 722 | 723 | ```python 724 | class HNode(Node): 725 | def __init__(self, *args, **kwargs): 726 | super(HNode, self).__init__(*args, **kwargs) 727 | self._height = 0 728 | 729 | def __repr__(self): 730 | return 'HNode({})'.format(self.data) 731 | 732 | @property 733 | def height(self): 734 | return self._height 735 | 736 | def set_height(self): 737 | if self.left is None and self.right is None: 738 | self._height = 0 739 | else: 740 | self._height = max(self.left_height, self.right_height) + 1 741 | return self._height 742 | 743 | 744 | @Node.left.setter 745 | def left(self, node): 746 | self._left = node 747 | self.set_height() 748 | 749 | @Node.right.setter 750 | def right(self, node): 751 | self._right = node 752 | self.set_height() 753 | 754 | @property 755 | def sub_diff(self): 756 | return self.left_height - self.right_height 757 | 758 | @property 759 | def left_height(self): 760 | if self.left: 761 | return self.left.height 762 | return -1 763 | 764 | @property 765 | def right_height(self): 766 | if self.right: 767 | return self.right.height 768 | return -1 769 | 770 | @property 771 | def is_balance(self): 772 | return abs(self.sub_diff) <= 1 773 | 774 | def balance(self, data): 775 | 776 | if self.sub_diff > 1: 777 | if data < self.left.data: # left left heavy 778 | return self.rotate('right') 779 | if data > self.left.data: # left right heavy 780 | self.left = self.left.rotate('left') 781 | return self.rotate('right') 782 | 783 | if self.sub_diff < -1: 784 | if data > self.right.data: 785 | return self.rotate('left') # right right heavy 786 | if data < self.right.data: # right left heavy 787 | self.right = self.right.rotate('right') 788 | return self.rotate('left') 789 | 790 | return self 791 | 792 | def rotate(self, to:"left|right"): 793 | if to == 'right': 794 | tmp = self.left 795 | tmp_right = tmp.right 796 | # update 797 | tmp.right = self 798 | self.left = tmp_right 799 | print('Node {} right rotate to {}!'.format(self, tmp)) 800 | return tmp # return new root 801 | if to == 'left': 802 | tmp = self.right 803 | tmp_left = tmp.left 804 | # update 805 | tmp.left = self 806 | self.right = tmp_left 807 | print('Node {} left rotate to {}!'.format(self, tmp)) 808 | return tmp # return new root 809 | raise NotImplementedError() 810 | 811 | class AVLTree(BinarySearchTree): 812 | def __init__(self, *args, **kwargs): 813 | super(AVLTree, self).__init__(*args, **kwargs) 814 | 815 | def insert(self, data): 816 | AVLTree.insert_node(self.root, data, tree=self) # pass self as keyword argument to update self.root 817 | self.update_height() 818 | 819 | def remove(self, data): 820 | AVLTree.remove_node(self.root, data, tree=self) # pass self as keyword argument to update self.root 821 | self.update_height() 822 | 823 | def rotate_decorator(func): 824 | def interface(*args, **kwargs): 825 | node = func(*args, **kwargs) 826 | 827 | data = args[1] 828 | tree = kwargs.get('tree') 829 | 830 | new_root = node.balance(data) 831 | 832 | if node == tree.root: 833 | tree.root = new_root 834 | 835 | return interface 836 | 837 | def update_height(self): 838 | for n in self.traversal(order='in_order'): 839 | n.set_height() 840 | 841 | @property 842 | def is_balance(self): 843 | return self.root.is_balance 844 | 845 | @rotate_decorator 846 | def insert_node(*args, **kwargs): 847 | return BinarySearchTree.insert_node(*args, node_constructor=HNode, **kwargs) 848 | 849 | @rotate_decorator 850 | def remove_node(*args, **kwargs): 851 | return BinarySearchTree.remove_node(*args, **kwargs) 852 | ``` 853 | 854 | # 紅黑樹 Red-Black Tree 855 | * 相較於AVL樹,紅黑樹犧牲了部分平衡性換取插入/刪除操作時更少的翻轉操作,整體效能較佳(插入、刪除快) 856 | * 不像AVL樹的節點屬性用height來判斷是否須翻轉,而是用紅色/黑色來判斷 857 | - 根節點、末端節點(NULL)是黑色 858 | - 紅色節點的父節點和子節點是黑色 859 | - 每條路徑上黑色節點的數量相同 860 | - 每個新節點預設是紅色,若違反以上規則: 861 | - 翻轉,或 862 | - 更新節點顏色 863 | 864 | ![image](https://storage.googleapis.com/ssivart/super9-blog/red-black-tree.png) 865 | 866 | | | Average case | Worst case | 867 | |--------|--------------|------------| 868 | | insert | O(logN) | O(logN) | 869 | | delete | O(logN) | O(logN) | 870 | | search | O(logN) | O(logN) | 871 | 872 | github上用python實作的範例:[Red-Black-Tree](https://github.com/stanislavkozlovski/Red-Black-Tree/blob/master/rb_tree.py) 873 | 874 | # 優先權佇列 Priority Queue 875 | * 相較於Stack或Queue,對資料項目的取出順序是以權重(priority)來決定 876 | * 常用heap來實作 877 | 878 | # 二元堆積 Binary Heap 879 | * 是一種二元樹資料結構,通常透過一維陣列(one dimension array) 880 | * 根據排序行為分成`min`及`max`: 881 | - max heap: 父節點的值(value)或權重(key)大於子節點 882 | - min heap: 父節點的值(value)或權重(key)小於子節點 883 | * 必須是完全(compelete)二元樹或近似完全二元樹 884 | 885 | 註: 886 | * heap資料結構跟heap memory沒有關聯 887 | * 優勢在於取得最大權重或最小權重項目(root),時間複雜度為O(1) 888 | 889 | | | time complexity | 890 | |--------|------------------------------------| 891 | | insert | O(N) + O(logN) reconsturct times | 892 | | delete | O(N) + O(logN) reconsturct times | 893 | 894 | ### 應用 895 | * 堆積排序法(Heap Sort) 896 | * 普林演算法(Prim's Algorithm) 897 | * 戴克斯特拉演算法(Dijkstra's Algorithm) 898 | 899 | ### 堆積排序 Heapsort 900 | * 是一種比較排序法(Comparision Sort) 901 | * 主要優勢在於能確保O(NlogN)的時間複雜度 902 | * 屬於原地演算法(in-place algorithm),缺點是每次排序都須重建heap——增加O(N)時間複雜度 903 | * 在一維陣列起始位置為0的indexing: 904 | 905 | ![image](https://storage.googleapis.com/ssivart/super9-blog/heap-indexing.png) 906 | 907 | 操作可參考這篇文章:[Comparison Sort: Heap Sort(堆積排序法) 908 | ](http://alrightchiu.github.io/SecondRound/comparison-sort-heap-sortdui-ji-pai-xu-fa.html) 909 | 910 | 用Python實作Max Binary Heap,請參考[gist](https://gist.github.com/travishen/1230001923ddfac2e6bf5c752f4daa12) 911 | ```python 912 | class Heap(object): 913 | """Max Binary Heap""" 914 | 915 | def __init__(self, capacity=10): 916 | self._default = object() 917 | self.capacity = capacity 918 | self.heap = [self._default] * self.capacity 919 | 920 | def __len__(self): 921 | return len(self.heap) - self.heap.count(self._default) 922 | 923 | def __getitem__(self, i): 924 | return self.heap[i] 925 | 926 | def insert(self, item): 927 | """O(1) + O(logN) time complexity""" 928 | if self.capacity == len(self): # full 929 | return 930 | 931 | self.heap[len(self)] = item 932 | 933 | self.fix_up(self.heap.index(item)) # check item's validation 934 | 935 | def fix_up(self, index): 936 | """ 937 | O(logN) time complexity 938 | Violate: 939 | 1. child value > parent value 940 | """ 941 | parent_index = (index-1)//2 942 | if index > 0 and self.heap[index] > self.heap[parent_index]: 943 | # swap 944 | self.swap(index, parent_index) 945 | self.fix_up(parent_index) # recursive 946 | 947 | def fix_down(self, index): 948 | """ 949 | O(logN) time complexity 950 | Violate: 951 | 1. child value > parent value 952 | """ 953 | parent = self.heap[index] 954 | left_child_index = 2 * index + 1 955 | right_child_index = 2 * index + 2 956 | largest_index = index 957 | 958 | if left_child_index < len(self) and self.heap[left_child_index] > parent: 959 | largest_index = left_child_index 960 | 961 | if right_child_index < len(self) and self.heap[right_child_index] > self.heap[largest_index]: 962 | largest_index = right_child_index 963 | 964 | if index != largest_index: 965 | self.swap(index, largest_index) 966 | self.fix_down(largest_index) # recursive 967 | 968 | def heap_sort(self): 969 | """ 970 | O(NlogN) time complixity 971 | """ 972 | for i in range(0, len(self)): 973 | self.poll() 974 | 975 | def swap(self, i1, i2): 976 | self.heap[i1], self.heap[i2] = self.heap[i2], self.heap[i1] 977 | 978 | def poll(self): 979 | max_ = self.max_ 980 | 981 | self.swap(0, len(self) - 1) # swap first and last 982 | self.heap[len(self) - 1] = self._default 983 | self.fix_down(0) 984 | 985 | return max_ 986 | 987 | @property 988 | def max_(self): 989 | return self.heap[0] 990 | ``` 991 | 992 | [python build-in heapq](https://gist.github.com/travishen/295ff80289bf54869d9842d285faa1a5) 993 | 994 | # 關聯陣列/對映/字典 Associative Array/ Map/ Dictionary 995 | * 鍵、值的配對(key-value) 996 | * 相較於樹狀資料結構,劣勢在於排序困難 997 | * 主要操作: 998 | - 新增、刪除、修改值 999 | - 搜尋已知的鍵 1000 | 1001 | ![image](https://storage.googleapis.com/ssivart/super9-blog/assosiate-array.png) 1002 | 1003 | ### hash function 1004 | * division method: modulo operator 1005 | 1006 | > h(x) = n % m 1007 | 1008 | > n: number of keys, m: number of buckets 1009 | 1010 | #### Collision 1011 | 當多個key存取同一個bucket(slot),解決collision會導致時間複雜度提高 1012 | 1013 | ``` 1014 | h(26) = 26 mod 6 = 2 1015 | h(50) = 50 mod 6 = 2 1016 | ``` 1017 | 1018 | 解法: 1019 | 1020 | * chaining: 在同一個slot用linked list存放多個關聯 1021 | * open addressing: 分配另一個空的slot 1022 | - linear probing: 線性探測 1023 | - quadratic probing: 二次方探測,如1, 2, 4, 8... 1024 | - rehashing 1025 | 1026 | Second Round皆有詳盡解說: 1027 | * [Hash Table:Open Addressing](http://alrightchiu.github.io/SecondRound/hash-tableopen-addressing.html) 1028 | * [Hash Table:Chaining](http://alrightchiu.github.io/SecondRound/hash-tablechaining.html) 1029 | 1030 | #### Dynamic resizing 1031 | > load factor(佔用率): n / m 1032 | 1033 | * load factor會影響到存取的效能,因此須要根據使用率動態變更陣列大小; 1034 | * 舉例來說,Java觸發resize的時機點大約是佔用超過75%時、Python則約是66% 1035 | 1036 | #### 應用 1037 | 1038 | * 資料庫 1039 | * Network Routing 1040 | * Rabin-Karp演算法 1041 | * Hashing廣泛用於資料加密 1042 | 1043 | 參考: 1044 | 1045 | * http://www.globalsoftwaresupport.com/use-prime-numbers-hash-functions/ 1046 | * http://alrightchiu.github.io/SecondRound/hash-tableintrojian-jie.html#collision 1047 | 1048 | 以Python實作,請參考[gist](https://gist.github.com/travishen/f51365915ef7f178623a2cc9b2ede886) 1049 | 1050 | ```python 1051 | from collections import Iterable 1052 | from functools import reduce 1053 | 1054 | class HashTable(object): 1055 | def __init__(self, size=10): 1056 | self.size = 10 1057 | self.keys = [None] * self.size 1058 | self.values = [None] * self.size 1059 | 1060 | def __repr__(self): 1061 | return 'HashTable(size={})'.format(self.size) 1062 | 1063 | def put(self, key, value): 1064 | index = self.hash(key) 1065 | 1066 | while self.keys[index] is not None: # collision 1067 | if self.keys[index] == key: # update 1068 | self.values[index] = value 1069 | return 1070 | index = (index + 1) % self.size # rehash 1071 | 1072 | self.keys[index] = key 1073 | self.values[index] = value 1074 | 1075 | def get(self, key): 1076 | if key in self.keys: 1077 | return self.values[self.hash(key)] 1078 | return None 1079 | 1080 | def hash(self, key): 1081 | if isinstance(key, Iterable): 1082 | sum = reduce(lambda prev, n: prev + ord(n), key, 0) 1083 | else: 1084 | sum = key 1085 | 1086 | return sum % self.size 1087 | ``` 1088 | 1089 | | | Average case | Worst case | 1090 | |--------|--------------|------------| 1091 | | insert | O(1) | O(N) | 1092 | | delete | O(1) | O(N) | 1093 | | search | O(1) | O(N) | 1094 | 1095 | 1096 | # 三元搜尋樹 Ternary Search Tree, TST 1097 | * 相較其他樹狀資料結構而言,佔用記憶體空間較小 1098 | * 只儲存string,不存NULL或其他物件 1099 | * 父節點可以有3個子節點:`left(less)`、`middle(equal)`、`right(greater)` 1100 | * 可以同時用來當作hashmap使用,也可以做排序 1101 | * 效能上比hashmap更佳,在解析key時是漸進式的(如`cat`若root沒有c就不用繼續找了) 1102 | 1103 | ![image](https://storage.googleapis.com/ssivart/super9-blog/ternary-search-tree.png) 1104 | 1105 | ### 應用 1106 | * autocompelete 1107 | * 拼字檢查 1108 | * 最近鄰居搜尋(Near-neighbor) 1109 | * WWW package routing 1110 | * 最長前綴匹配(perfix matching) 1111 | * Google Search 1112 | 1113 | 以Python實作,請參考[gist](https://gist.github.com/travishen/cae7587e6d870d3f189fdcd70b96a8cc) 1114 | ```python 1115 | class Node(object): 1116 | def __init__(self, char): 1117 | self.char = char 1118 | self.left = self.middle = self.right = None 1119 | self.value = None 1120 | 1121 | class TernarySearchTree(object): 1122 | def __init__(self): 1123 | self.root = None 1124 | 1125 | def __repr__(self): 1126 | return 'TernarySearchTree()' 1127 | 1128 | def put(self, key, value): 1129 | self.root = self.recursive(key, value)(self.root, 0) 1130 | 1131 | def get(self, key): 1132 | node = self.recursive(key)(self.root, 0) 1133 | if node: 1134 | return node.value 1135 | return -1 1136 | 1137 | def recursive(self, key, value=None): 1138 | 1139 | def putter(node, index): 1140 | char = key[index] 1141 | 1142 | if node is None: 1143 | node = Node(char) 1144 | if char < node.char: 1145 | node.left = putter(node.left, index) 1146 | elif char > node.char: 1147 | node.right = putter(node.right, index) 1148 | elif index < len(key) - 1: 1149 | node.middle = putter(node.middle, index+1) 1150 | else: 1151 | node.value = value 1152 | 1153 | return node 1154 | 1155 | def getter(node, index): 1156 | char = key[index] 1157 | 1158 | if node is None: 1159 | return None 1160 | 1161 | if char < node.char: 1162 | return getter(node.left, index) 1163 | elif char > node.char: 1164 | return getter(node.right, index) 1165 | elif index < len(key) - 1: 1166 | return getter(node.middle, index+1) 1167 | else: 1168 | return node 1169 | 1170 | if value: 1171 | return putter 1172 | else: 1173 | return getter 1174 | ``` 1175 | --------------------------------------------------------------------------------