├── LICENSE
└── README.md


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 ssivart
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
   1 | # TOC
   2 | * [簡介](#簡介)
   3 | * [陣列 Array](#陣列-arrays)
   4 | * [連結串列 Linked List & 雙向連結串列 Double Linked List](#連結串列-linked-list--雙向連結串列-double-linked-list)
   5 | * [堆疊 Stack](#堆疊-stack)
   6 | * [佇列 Queue](#佇列-queue)
   7 | * [二元搜尋樹 Binary Search Tree](#二元搜尋樹-binary-search-tree)
   8 | * [平衡二元搜尋樹 Balancing Binary Search Tree, AVL Tree](#平衡二元搜尋樹-balancing-binary-search-tree-avl-tree)
   9 | * [紅黑樹 Red-Black Tree](#紅黑樹-red-black-tree)
  10 | * [二元堆積 Binary Heap](#二元堆積-binary-heap)
  11 | * [關聯陣列/對映/字典 Associative Array/ Map/ Dictionary](#關聯陣列對映字典-associative-array-map-dictionary)
  12 | * [三元搜尋樹 Ternary Search Tree](#三元搜尋樹-ternary-search-tree-tst)
  13 | 
  14 | # 簡介
  15 | 
  16 | ### 什麼是資料結構？為什麼要使用資料結構？
  17 | 
  18 | 是電腦中儲存、組織資料的方式，可以讓我們**有效地儲存資料**，並讓所有運算能最有效率地完成
  19 | 
  20 | 演算法的運行時間是根據資料結構決定的，所以使用適當的資料結構來降低演算法的時間複雜度，如：
  21 | 
  22 | * 最短路徑演算法若無適當的資料結構，運行時間是O(N^2)，使用(heap/priority queue)可以大幅降低運行時間至O(N*logN)
  23 | 
  24 | ### 抽象資料型態 Abstract Data Types
  25 | 簡單而言，ADT是針對資料結構的「規範」或「描述」，像是物件導向語言裡面的interface，但不會實作細節
  26 | 
  27 | 舉例堆疊的ADT描述：
  28 | 
  29 | * push(): 插入元素 item 至堆疊頂端
  30 | * pop(): 移除並回傳堆疊頂端的元素
  31 | * peek(): 看堆疊頂端的資料而不取出
  32 | * size(): 看堆疊的長度
  33 | 
  34 | ### ADT跟資料結構的關係
  35 | 每個ADT在底層都有相對應的資料結構去實作ADT裡定義過的行為(method)
  36 | 
  37 | | ADT                | Data Structures    |
  38 | |--------------------|--------------------|
  39 | | Stack              | array, linked list |
  40 | | Queue              | array, linked list |
  41 | | Priority Queue     | heap               |
  42 | | Dictionary/Hashmap | array              |
  43 | 
  44 | ### 時間複雜度 Big O notation
  45 | 描述演算法的效率（複雜度），舉例來說，A宅想要分享他的D槽給B宅，有以下幾種做法：
  46 | 1. 從**台北**騎車到**屏東**B宅家
  47 | 2. 用網路傳輸，不考慮被FBI攔截的情況
  48 | 
  49 | |              | 1GB     | 1TB      | 500TB       |
  50 | |--------------|---------|----------|-------------|
  51 | | 騎車運送硬碟 | 600 min | 600 min  | 600 min     |
  52 | | 網路傳輸     | 3 min   | 3072 min | 1536000 min |
  53 | 
  54 | 從上表來看，騎車這個選項雖然聽起來很蠢，但不管硬碟有多大，都能確保10個小時內可以送達—— `O(1)`；至於網路傳輸隨著檔案越大，所需的時間也越長 —— `O(N)`；從這裡就可以看出常數時間(constant time)和線性時間(linear time)的差別對效率的影響有多大了
  55 | 
  56 | 在表現複雜度函數的時候，有幾個通用的規則：
  57 | 
  58 | * 多個步驟用加法: **O(a+b)**
  59 | 
  60 | ```python
  61 | def func():
  62 |   # step a
  63 |   # step b
  64 | ```
  65 | 
  66 | * 省略常數: ~~O(3n)~~ **O(n)**
  67 | 
  68 | ```python
  69 | def func(lst):
  70 |   for i in lst:  # O(n)
  71 |     # do something ...
  72 |   for i in lst:  # O(n)
  73 |     # do something ...
  74 |   for i in lst:  # O(n)
  75 |     # do something ...
  76 | ```
  77 | 
  78 | * 不同的input用不同的變數表示: ~~O(N^2)~~ **O(a*b)**
  79 | 
  80 | ```python
  81 | def func(la, lb):
  82 |   for a in la:
  83 |     for b in lb:
  84 |       # do something ...
  85 | ```
  86 | 
  87 | * 省略影響不大的變數: ~~O(n+n^2)~~ **O(n^2)**
  88 | 
  89 | ```
  90 | O(n^2) <= O(n+n^2) <= O(n^2 + n^2)
  91 | ```
  92 | 
  93 | ```python
  94 | # n^2是主導的變項，所以省略n
  95 | def func(la):
  96 | 
  97 |   for a in la:  # O(n)
  98 |     # do something ...
  99 |  
 100 |   for a in la:  # O(n^2)
 101 |     for b in la:
 102 |       # do something
 103 | ```
 104 | 
 105 | # 陣列 Array
 106 | 
 107 | 物件或值的集合，每個物件或值可以被陣列的索引(index, key)識別
 108 | 
 109 | * 索引從0開始
 110 | * 因為有索引，我們可以對陣列做**隨機存取**(Random Access)
 111 | 
 112 | 優點：
 113 | 
 114 | * 隨機存取不用搜尋就能訪問陣列當中所有值，執行速度快O(1)
 115 | * 不會因為鏈結斷裂而遺失資料
 116 | * 循序存取快
 117 | 
 118 | 缺點：
 119 | 
 120 | * 重建或插入陣列須要逐一複製裏頭的值，時間複雜度是O(N)
 121 | * 編譯的時候必須事先知道陣列的大小，這讓陣列這個資料結構不夠動態(dynamic)
 122 | * 通常陣列只能存同一種型別
 123 | * 不支援連結串列的共享
 124 | 
 125 | ### Implements
 126 | 
 127 | |            | 行為         | big O |
 128 | |------------|--------------|-------|
 129 | | search     | 搜尋         | O(1)  |
 130 | | insert     | 插入第一項   | O(N)  |
 131 | | append     | 插入最後一項 | O(1)  |
 132 | | remove     | 移除第一項   | O(N)  |
 133 | | removeLast | 移除最後一項 | O(1)  |
 134 | 
 135 | ### 以Python實作
 136 | 
 137 | **random indexing: O(1)**
 138 | ```python
 139 | arr = [1, 2, 3]
 140 | arr[0]
 141 | ```
 142 | 
 143 | **linear search: O(n)**
 144 | ```python
 145 | max = arr[0]
 146 | for i in arr:
 147 |   if i > max:
 148 |     max = i
 149 | ```
 150 | 
 151 | # 連結串列 Linked List & 雙向連結串列 Double Linked List 
 152 | 
 153 | * 節點包含`data`和`referenced object`
 154 | * 連結的方式是節點(node)記住其他節點的參考(reference)
 155 | * 最後一個節點的參考是NULL
 156 | 
 157 | 優點
 158 | 
 159 | * 各節點型態、記憶體大小不用相同
 160 | * 動態佔用的記憶體，不須事先宣告大小
 161 | * 插入、刪除快O(1)
 162 | 
 163 | 缺點
 164 | 
 165 | * 不支援隨機存取，只能循序存取(sequencial access)，時間複雜度為O(N)
 166 | * 須額外空間儲存其他節點的參考
 167 | * 可靠性較差，連結斷裂容易遺失資料
 168 | * 難以向前(backward)訪問，可以用雙向連結串列來處理，不過會多佔用記憶體空間
 169 | 
 170 | ### Implements
 171 | 
 172 | |             | 行為         | big O |
 173 | |-------------|--------------|-------|
 174 | | search      | 搜尋         | O(N)  |
 175 | | insert      | 插入第一項   | O(1)  |
 176 | | append      | 插入最後一項 | O(N)  |
 177 | | remove      | 移除第一項   | O(1)  |
 178 | | removeLast  | 移除最後一項 | O(N)  |
 179 | 
 180 | 註：連結串列沒有index，處理插入或移除第N項會需要先循序找到插入/移除位置，因此會需要O(N)的時間
 181 | 
 182 | ### 以Python實作
 183 | 
 184 | 以下的代碼是我實作的範例，有錯誤煩請指正。
 185 | 
 186 | 主要概念是實作`__getitem__`來循序存取(indexing)，另外Double Linked List支援反向存取，故訪問`lst[0]`和`lst[-1]`皆可以達成O(1)的時間複雜度
 187 | 
 188 | 執行結果請參考[travishen/gist/linked-list.md](https://gist.github.com/travishen/df37a04582c48d386781077742908107)
 189 | 
 190 | ```python
 191 | from collections import Iterable
 192 | 
 193 | class Node:
 194 |     def __init__(self, data=None, next_node=None):
 195 |         self.data = data
 196 |         self.next_node = next_node
 197 |         
 198 |     def __repr__(self):
 199 |         return 'Node(data={!r}, next_node={!r})'.format(self.data, self.next_node)
 200 | 
 201 | class LinkedList(object):
 202 |     def __init__(self, inital_nodes=None):
 203 |         self.head = None
 204 |         self.inital_nodes = inital_nodes
 205 |         # garbage collect
 206 |         for node in self:
 207 |             del node
 208 |         if isinstance(inital_nodes, Iterable):
 209 |             for node in reversed(list(inital_nodes)):
 210 |                 self.insert(node)  # insert to head
 211 |         elif inital_nodes:
 212 |             raise NotImplementedError('Inital with not iterable object')
 213 |                 
 214 |     def __repr__(self):
 215 |         return 'LinkedList(inital_nodes={!r})'.format(self.inital_nodes)
 216 |         
 217 |     def __len__(self):        
 218 |         count = 0
 219 |         for node in self:
 220 |             count += 1
 221 |         return count
 222 |     
 223 |     def __setitem__(self, index, data):
 224 |         self.insert(data, index)
 225 |     
 226 |     def __delitem__(self, index):
 227 |         self.remove(index, by='index')
 228 |                    
 229 |     def __getitem__(self, index):
 230 |         count = 0
 231 |         current = self.head
 232 |         index = self.positive_index(index)
 233 |         while count < index and current is not None:
 234 |             current = current.next_node
 235 |             count += 1
 236 |         if current:
 237 |             return current
 238 |         else:
 239 |             raise IndexError
 240 |             
 241 |     def positive_index(self, index):  # inplement negative indexing
 242 |         """
 243 |         Use nagative indexing will increase O(N) time complexity
 244 |         We can improve it with double linded list
 245 |         """
 246 |         if index < 0:  
 247 |             index = len(self) + index
 248 |         return index
 249 |         
 250 |     def insert(self, data, index=0):
 251 |         index = self.positive_index(index)  
 252 |         if self.head is None:  # initial 
 253 |             self.head = Node(data, None)
 254 |         elif index == 0:  # insert to head
 255 |             new_node = Node(data, self.head)
 256 |             self.head = new_node
 257 |         else:  # insert to lst[index]
 258 |             last_node = self[index]
 259 |             last_node.next_node = Node(data, last_node.next_node)            
 260 |         return None  # this instance has changed and didn't create instance
 261 |         
 262 |     def search(self, data):
 263 |         for node in self:
 264 |             if node.data == data:
 265 |                 return node
 266 |         return None
 267 |     
 268 |     def remove(self, data_or_index, by='data'):
 269 |         for i, node in enumerate(self):
 270 |             if (by == 'data' and node.data == data_or_index) or (by == 'index' and i == data_or_index):
 271 |                 if i == 0:
 272 |                     self.head = node.next_node
 273 |                     node.next_node = None
 274 |                 else:
 275 |                     prev_node.next_node = node.next_node
 276 |                 break               
 277 |             prev_node = node
 278 |         return None  # this instance has changed and didn't create instance
 279 |         
 280 | class DoubleLinkedNode(Node):
 281 |     def __init__(self, data=None, last_node=None, next_node=None):
 282 |         self.data = data
 283 |         self.next_node = next_node
 284 |         self.last_node = last_node
 285 |         if next_node:
 286 |             next_node.last_node = self
 287 |             
 288 | class DoubleLinkedList(LinkedList):
 289 |     def __init__(self, *args, **kwargs):
 290 |         self.foot = None
 291 |         super(DoubleLinkedList, self).__init__(*args, **kwargs)            
 292 |         
 293 |     def __repr__(self):
 294 |         return 'DoubleLinkedList(inital_nodes={})'.format(self.inital_nodes)
 295 |         
 296 |     def __getitem__(self, index):
 297 |         """
 298 |         Support negative indexing in O(N) by setting footer
 299 |         """
 300 |         count = 0
 301 |         if index >= 0:
 302 |             current = self.head
 303 |             while count < index and current is not None:
 304 |                 current = current.next_node
 305 |                 count += 1
 306 |         else:
 307 |             current = self.foot
 308 |             while count > (index + 1) and current is not None:
 309 |                 current = current.last_node
 310 |                 count -= 1
 311 |         if current:
 312 |             return current
 313 |         else:
 314 |             raise IndexError
 315 |     
 316 |     def insert(self, data, index=0):
 317 |         if self.head is None:  # initial 
 318 |             self.head = self.foot = DoubleLinkedNode(data, None, None)
 319 |         elif index == 0:  # insert to head
 320 |             new_node = DoubleLinkedNode(data, None, self.head)
 321 |             self.head = new_node
 322 |         else:  # insert to lst[index]
 323 |             last_node = self[index]
 324 |             last_node.next_node = DoubleLinkedNode(data, last_node, last_node.next_node) 
 325 |             if last_node.next_node.next_node is None:  # set foot
 326 |                 self.foot = last_node.next_node
 327 |         return None  # this instance has changed and didn't create instance        
 328 | ```
 329 | 
 330 | ### Linked List現實中的應用
 331 | 
 332 | 1. 低級別的內存管理（Low Level Memory Management），以C語言為例：
 333 | 
 334 | * `malloc()`、 `free()`: 見[Heap Management](https://www.syslinux.org/wiki/index.php?title=Heap_Management)
 335 | * `chart * chart_ptr = (chart*)malloc(30);`: 取得30byte的heap memory
 336 | 
 337 | 2. 許多Windows的應用程式：工具列視窗切換、PhotoViewer
 338 | 
 339 | 3. 區塊鏈技術
 340 | 
 341 | ![image](https://i.imgur.com/FcqNnmz.png) 
 342 | [[圖片來源]](https://codingislove.com/simple-blockchain-javascript/)
 343 | 
 344 | # 堆疊 Stack 
 345 | 
 346 | * 推疊是一種抽象資料型態，特性是先進後出（LIFO, last in first out）
 347 | * 在高階程式語言，容易用array、linked list來實作
 348 | * 大部分的程式語言都是Stack-Oriented，因為仰賴堆疊來處理method call(呼叫堆疊, Call Stack)。可參考[Call Stack, Scope & Lifetime of Variables](https://www.youtube.com/watch?v=1cPSeJLspT8)，以及[Python Function Calls and the Stack](https://www.cs.ucsb.edu/~pconrad/cs8/topics.beta/theStack/02/)
 349 | 
 350 | ### Implements
 351 | 
 352 | |      | 行為                     | big O |
 353 | |------|--------------------------|-------|
 354 | | push | 將資料放入堆疊的頂端     | O(1)  |
 355 | | pop  | 回傳堆疊頂端資料         | O(1)  |
 356 | | peek | 看堆疊頂端的資料而不取出 | O(1)  |
 357 | 
 358 | ### 應用
 359 | * call stack + stack memory
 360 | * 深度優先搜尋演算法（Depth-First-Search）
 361 | * 尤拉迴路（Eulerian Circuit）
 362 | * 瀏覽器回上一頁
 363 | * PhotoShop上一步(undo)
 364 | 
 365 | 註：任何遞迴(recursion)形式的演算法，都可以用Stack改寫，例如DFS。不過就算我們使用遞迴寫法，程式最終被parsing還是Stack
 366 | 
 367 | ```python
 368 | def factorial(n, cache={}):
 369 |     if n == 0:  # declare base case to prevent stack overflow
 370 |         return 1
 371 |     return n * factorial(n-1)
 372 | ```
 373 | 
 374 | ### Stack memory vs Heap memory
 375 | 可參考[Stack vs. Heap](https://medium.com/joe-tsai/stack-vs-heap-b4bd500667cd)
 376 | 
 377 | | stack memory               | heap memory                   |
 378 | |----------------------------|-------------------------------|
 379 | | 有限的記憶體配置空間       | 記憶體配置空間較大            |
 380 | | 存活時間規律可預測的                   | 存活時間不規律不可預測的                      |
 381 | | CPU自動管理空間(GC)            | 使用者自主管理空間            |
 382 | | 區域變數宣告的空間不能更動 | 物件的值可以變動，如realloc() |
 383 | 
 384 | 另外ptt有針對兩者佔用記憶體大小的討論[stack v.s. heap sizes](https://www.ptt.cc/man/C_and_CPP/DD8B/M.1460666895.A.07A.html)
 385 | 
 386 | ### 以Python實作
 387 | ```python
 388 | class Stack(object):
 389 |     def __init__(self, initial_data):
 390 |         self.stack = []
 391 |         self.initial_data = initial_data
 392 |         if isinstance(initial_data, Iterable):
 393 |             self.stack = list(initial_data)
 394 |         else:
 395 |             raise NotImplementedError('Inital with not iterable object')
 396 |             
 397 |     def __repr__(self):
 398 |         return 'Stack(initial_data={!r})'.format(self.initial_data)
 399 |     
 400 |     def __len__(self):
 401 |         return len(self.stack)
 402 |     
 403 |     def __getitem__(self, i):
 404 |         return self.stack[i]
 405 |         
 406 |     @property
 407 |     def is_empty(self):
 408 |         return len(self.stack) == 0
 409 |     
 410 |     def push(self, data):
 411 |         self.stack.append(data)
 412 |         
 413 |     def pop(self):
 414 |         if not self.is_empty:
 415 |             return self.stack.pop()
 416 |         
 417 |     def peek(self):
 418 |         return self.stack[-1]
 419 | ```
 420 | Using Lists as Stacks
 421 | ```
 422 | >>> stack = [3, 4, 5]
 423 | >>> stack.append(6)
 424 | >>> stack.append(7)
 425 | >>> stack
 426 | [3, 4, 5, 6, 7]
 427 | >>> stack.pop()
 428 | 7
 429 | >>> stack
 430 | [3, 4, 5, 6]
 431 | >>> stack.pop()
 432 | 6
 433 | >>> stack.pop()
 434 | 5
 435 | >>> stack
 436 | [3, 4]
 437 | ```
 438 | 
 439 | # 佇列 Queue 
 440 | * 佇列是一種抽象資料型態，特性是先進先出（FIFO, first in first out）
 441 | * 在高階程式語言，容易用array、linked list來實作
 442 | 
 443 | ### 應用
 444 | * 多個程序的資源共享，例如CPU排程
 445 | * 非同步任務佇列，例如I/O Buffer
 446 | * 廣度優先搜尋演算法（Depth-First-Search）
 447 | 
 448 | ### 以Python實作
 449 | ```python
 450 | class Queue(object):
 451 |     def __init__(self, initial_data):
 452 |         self.queue = []
 453 |         self.initial_data = initial_data
 454 |         if isinstance(initial_data, Iterable):
 455 |             self.queue = list(initial_data)
 456 |         else:
 457 |             raise NotImplementedError('Inital with not iterable object')
 458 |             
 459 |     def __repr__(self):
 460 |         return 'Queue(initial_data={!r})'.format(self.initial_data)
 461 |     
 462 |     def __len__(self):
 463 |         return len(self.queue)
 464 |     
 465 |     def __getitem__(self, i):
 466 |         return self.queue[i]
 467 |         
 468 |     @property
 469 |     def is_empty(self):
 470 |         return len(self.queue) == 0
 471 |     
 472 |     def enqueue(self, data):
 473 |         return self.queue.append(data)
 474 |     
 475 |     def dequeue(self):
 476 |         return self.queue.pop(0)
 477 |     
 478 |     def peek(self):
 479 |         return self.queue[0]
 480 | ```
 481 | 參考
 482 | 
 483 | * [multiprocessing實作的的Queue](https://github.com/python/cpython/blob/master/Lib/multiprocessing/queues.py)
 484 | * Using Lists as Queues
 485 | ```python
 486 | >>> from collections import deque
 487 | >>> queue = deque(["Eric", "John", "Michael"])
 488 | >>> queue.append("Terry")           # Terry arrives
 489 | >>> queue.append("Graham")          # Graham arrives
 490 | >>> queue.popleft()                 # The first to arrive now leaves
 491 | 'Eric'
 492 | >>> queue.popleft()                 # The second to arrive now leaves
 493 | 'John'
 494 | >>> queue                           # Remaining queue in order of arrival
 495 | deque(['Michael', 'Terry', 'Graham'])
 496 | ```
 497 | 
 498 | # 二元搜尋樹 Binary Search Tree
 499 | 主要的優點就是時間複雜度能優化至O(logN)
 500 | 
 501 | * 每個節點最多有兩個子節點
 502 | * 子節點有左右之分
 503 | * 左子樹的節點小於根節點、右子樹的節點大於根節點
 504 | * 節點值不重複
 505 | 
 506 | |        | Average case | Worst case |
 507 | |--------|--------------|------------|
 508 | | insert | O(logN)      | O(N)       |
 509 | | delete | O(logN)      | O(N)       |
 510 | | search | O(logN)      | O(N)       |
 511 | 
 512 | 以Python實作insert, remove, search，執行結果請參考[gist](https://gist.github.com/travishen/c4cc5797f8905b2a5b90f2545c374a26)
 513 | ```python
 514 | class Node(object):
 515 |     def __init__(self, data):
 516 |         self._left, self._right = None, None
 517 |         self.data = int(data)
 518 |         
 519 |     def __repr__(self):
 520 |         return 'Node({})'.format(self.data)
 521 |    
 522 |     @property
 523 |     def left(self):
 524 |         return self._left
 525 |     
 526 |     @left.setter
 527 |     def left(self, node):
 528 |         self._left = node
 529 |     
 530 |     @property
 531 |     def right(self):
 532 |         return self._right
 533 |     
 534 |     @right.setter
 535 |     def right(self, node):
 536 |         self._right = node
 537 |     
 538 | class BinarySearchTree(object):        
 539 |     def __init__(self, root=None):
 540 |         self.root = root
 541 |         self.search_mode = 'in_order'
 542 |         
 543 |             
 544 |     # O(logN) time complexity if balanced, it could reduce to O(N)
 545 |     def insert(self, data, **kwargs):     
 546 |         """Insert from root"""
 547 |         BinarySearchTree.insert_node(self.root, data, **kwargs)
 548 |         
 549 |     # O(logN) time complexity if balanced, it could reduce to O(N)
 550 |     def remove(self, data):     
 551 |         """Insert from root"""
 552 |         BinarySearchTree.remove_node(self.root, data)
 553 |     
 554 |     @staticmethod
 555 |     def insert_node(node, data, **kwargs):
 556 |         node_consturctor = kwargs.get('node_constructor', None) or Node
 557 |         if node:
 558 |             if data < node.data:
 559 |                 if node.left is None:
 560 |                     node.left = node_consturctor(data)
 561 |                 else:
 562 |                     BinarySearchTree.insert_node(node.left, data, **kwargs)
 563 |             elif data > node.data:
 564 |                 if node.right is None:
 565 |                     node.right = node_consturctor(data)
 566 |                 else:
 567 |                     BinarySearchTree.insert_node(node.right, data, **kwargs)
 568 |         else:
 569 |             node.data = data
 570 |         return node
 571 |          
 572 |     @staticmethod
 573 |     def remove_node(node, data):            
 574 | 
 575 |         if not node:
 576 |             return None
 577 |         
 578 |         if data < node.data:
 579 |             node.left = BinarySearchTree.remove_node(node.left, data)
 580 |         elif data > node.data:
 581 |             node.right = BinarySearchTree.remove_node(node.right, data)
 582 |         else:
 583 |             if not (node.left and node.right):  # leaf
 584 |                 del node
 585 |                 return None
 586 |             if not node.left:
 587 |                 tmp = node.right
 588 |                 del node
 589 |                 return tmp
 590 |             if not node.right:
 591 |                 tmp = node.left
 592 |                 del node
 593 |                 return tmp
 594 |             predeccessor = BinarySearchTree.get_max_node(node.left)
 595 |             node.data = predeccessor.data
 596 |             node.left = BinarySearchTree.remove_node(node.left, predeccessor.data)
 597 |         return node
 598 |             
 599 |     def get_min(self):
 600 |         return self.get_min_node(self.root)
 601 |     
 602 |     @staticmethod
 603 |     def get_min_node(node):
 604 |         if node.left:
 605 |             return BinarySearchTree.get_max_node(node.left)
 606 |         return node
 607 |         
 608 |     def get_max(self):
 609 |         return self.get_max_node(self.root)
 610 |     
 611 |     @staticmethod
 612 |     def get_max_node(node):
 613 |         if node.right:
 614 |             return BinarySearchTree.get_max_node(node.right)
 615 |         return node
 616 |              
 617 |     def search_decorator(func):
 618 |         def interface(*args, **kwargs):
 619 |             res = func(*args, **kwargs)
 620 |             if isinstance(res, Node):
 621 |                 return res
 622 |             elif 'data' in kwargs:
 623 |                 for node in res:
 624 |                     if node.data == kwargs['data']:
 625 |                         return node   
 626 |             return res
 627 |         return interface
 628 |     
 629 |     @staticmethod
 630 |     @search_decorator
 631 |     def in_order(root, **kwargs):
 632 |         """left -> root -> right"""
 633 |         f = BinarySearchTree.in_order
 634 |         res = []
 635 |         if root:
 636 |             left = f(root.left, **kwargs)
 637 |             if isinstance(left, Node):
 638 |                 return left
 639 |             right = f(root.right, **kwargs)
 640 |             if isinstance(right, Node):
 641 |                 return right
 642 |             res = left + [root] + right
 643 |         return res
 644 | 
 645 |     @staticmethod
 646 |     @search_decorator
 647 |     def pre_order(root, **kwargs):
 648 |         """root -> left -> right"""
 649 |         f = BinarySearchTree.pre_order
 650 |         res = []
 651 |         if root:
 652 |             left = f(root.left, **kwargs)
 653 |             if isinstance(left, Node):
 654 |                 return left
 655 |             right = f(root.right, **kwargs)
 656 |             if isinstance(right, Node):
 657 |                 return right
 658 |             res = [root] + left + right      
 659 |         return res
 660 | 
 661 |     @staticmethod
 662 |     @search_decorator
 663 |     def post_order(root, **kwargs):
 664 |         """root -> right -> root"""
 665 |         f = BinarySearchTree.post_order
 666 |         res = []
 667 |         if root:
 668 |             left = f(root.left, **kwargs)
 669 |             if isinstance(left, Node):
 670 |                 return left
 671 |             right = f(root.right, **kwargs)
 672 |             if isinstance(right, Node):
 673 |                 return right
 674 |             res = left + right + [root]
 675 |         return res
 676 |     
 677 |     def traversal(self, 
 678 |                   order:"in_order|post_order|post_order"=None,
 679 |                   data=None):
 680 |         order = order or self.search_mode
 681 |         if order == 'in_order':
 682 |             return BinarySearchTree.in_order(self.root, data=data)
 683 |         elif order == 'pre_order':
 684 |             return BinarySearchTree.pre_order(self.root, data=data)
 685 |         elif order == 'post_order':
 686 |             return BinarySearchTree.post_order(self.root, data=data)
 687 |         else:
 688 |             raise NotImplementedError()
 689 |             
 690 |     def search(self, data, *args, **kwargs):
 691 |         return self.traversal(*args, data=data, **kwargs)
 692 | ```
 693 | 
 694 | ### BST現實中的應用
 695 | 
 696 | * OS file system
 697 | * 機器學習：決策樹
 698 | 
 699 | # 平衡二元搜尋樹 Balancing Binary Search Tree, AVL Tree
 700 | * 能保證O(logN)的時間複雜度
 701 | * 每次insert, delete都要檢查平衡，非平衡需要額外做rotation
 702 | * 判斷是否平衡： 
 703 |   - `左子樹高度 - 右子樹高度 > 1`: rotate to right 
 704 |   - `左子樹高度 - 右子樹高度 < -1`: rotate to left 
 705 |   - ![image](https://storage.googleapis.com/ssivart/super9-blog/not-balancing-tree.png)
 706 | 
 707 | |        | Average case | Worst case |
 708 | |--------|--------------|------------|
 709 | | insert | O(logN)      | O(logN)    |
 710 | | delete | O(logN)      | O(logN)    |
 711 | | search | O(logN)      | O(logN)    |
 712 | 
 713 | 不適合用在排序，時間複雜度為O(N*logN)
 714 | 
 715 | * 插入n個：O(N*logN)
 716 | * in-order迭代：O(N)
 717 | 
 718 | 繼承上面BST繼續往下實作，有bug請協助指正，執行結果請參考[gist](https://gist.github.com/travishen/c4cc5797f8905b2a5b90f2545c374a26)
 719 | 
 720 | * 任一節點設定完left或right，更新該節點height
 721 | * 每個insert的call stack檢查檢查節點是否平衡，不平衡則rotate
 722 | 
 723 | ```python
 724 | class HNode(Node):    
 725 |     def __init__(self, *args, **kwargs):
 726 |         super(HNode, self).__init__(*args, **kwargs)
 727 |         self._height = 0
 728 |         
 729 |     def __repr__(self):
 730 |         return 'HNode({})'.format(self.data)
 731 |     
 732 |     @property
 733 |     def height(self):
 734 |         return self._height
 735 |     
 736 |     def set_height(self):        
 737 |         if self.left is None and self.right is None:
 738 |             self._height = 0
 739 |         else:
 740 |             self._height = max(self.left_height, self.right_height) + 1
 741 |         return self._height
 742 | 
 743 | 
 744 |     @Node.left.setter
 745 |     def left(self, node):
 746 |         self._left = node
 747 |         self.set_height()
 748 |             
 749 |     @Node.right.setter
 750 |     def right(self, node):
 751 |         self._right = node
 752 |         self.set_height()
 753 |         
 754 |     @property
 755 |     def sub_diff(self):
 756 |         return self.left_height - self.right_height 
 757 |     
 758 |     @property
 759 |     def left_height(self):
 760 |         if self.left:
 761 |             return self.left.height
 762 |         return -1
 763 |     
 764 |     @property
 765 |     def right_height(self):
 766 |         if self.right:
 767 |             return self.right.height
 768 |         return -1
 769 |     
 770 |     @property
 771 |     def is_balance(self):
 772 |         return abs(self.sub_diff) <= 1        
 773 |         
 774 |     def balance(self, data):
 775 |         
 776 |         if self.sub_diff > 1:
 777 |             if data < self.left.data:  # left left heavy
 778 |                 return self.rotate('right')
 779 |             if data > self.left.data:  # left right heavy
 780 |                 self.left = self.left.rotate('left')
 781 |                 return self.rotate('right')
 782 |             
 783 |         if self.sub_diff < -1:
 784 |             if data > self.right.data:
 785 |                 return self.rotate('left')  # right right heavy
 786 |             if data < self.right.data:  # right left heavy
 787 |                 self.right = self.right.rotate('right')
 788 |                 return self.rotate('left')
 789 |             
 790 |         return self
 791 |         
 792 |     def rotate(self, to:"left|right"):
 793 |         if to == 'right':
 794 |             tmp = self.left
 795 |             tmp_right = tmp.right
 796 |             # update
 797 |             tmp.right = self
 798 |             self.left = tmp_right        
 799 |             print('Node {} right rotate to {}!'.format(self, tmp))
 800 |             return tmp  # return new root
 801 |         if to == 'left':
 802 |             tmp = self.right
 803 |             tmp_left = tmp.left
 804 |             # update
 805 |             tmp.left = self
 806 |             self.right = tmp_left
 807 |             print('Node {} left rotate to {}!'.format(self, tmp))
 808 |             return tmp  # return new root
 809 |         raise NotImplementedError()
 810 |             
 811 | class AVLTree(BinarySearchTree):    
 812 |     def __init__(self, *args, **kwargs):
 813 |         super(AVLTree, self).__init__(*args, **kwargs)
 814 |         
 815 |     def insert(self, data):    
 816 |         AVLTree.insert_node(self.root, data, tree=self)  # pass self as keyword argument to update self.root
 817 |         self.update_height()
 818 |         
 819 |     def remove(self, data):
 820 |         AVLTree.remove_node(self.root, data, tree=self)  # pass self as keyword argument to update self.root
 821 |         self.update_height()
 822 |     
 823 |     def rotate_decorator(func):
 824 |         def interface(*args, **kwargs):
 825 |             node = func(*args, **kwargs)
 826 |             
 827 |             data = args[1]
 828 |             tree = kwargs.get('tree')
 829 |                         
 830 |             new_root = node.balance(data)
 831 |             
 832 |             if node == tree.root:
 833 |                 tree.root = new_root
 834 |                     
 835 |         return interface
 836 |     
 837 |     def update_height(self):
 838 |         for n in self.traversal(order='in_order'):
 839 |             n.set_height()
 840 |     
 841 |     @property
 842 |     def is_balance(self):
 843 |         return self.root.is_balance
 844 |     
 845 |     @rotate_decorator
 846 |     def insert_node(*args, **kwargs):
 847 |         return BinarySearchTree.insert_node(*args, node_constructor=HNode, **kwargs)
 848 |    
 849 |     @rotate_decorator
 850 |     def remove_node(*args, **kwargs):
 851 |         return BinarySearchTree.remove_node(*args, **kwargs) 
 852 | ```
 853 | 
 854 | # 紅黑樹 Red-Black Tree
 855 | * 相較於AVL樹，紅黑樹犧牲了部分平衡性換取插入/刪除操作時更少的翻轉操作，整體效能較佳（插入、刪除快）
 856 | * 不像AVL樹的節點屬性用height來判斷是否須翻轉，而是用紅色/黑色來判斷
 857 |   - 根節點、末端節點（NULL）是黑色
 858 |   - 紅色節點的父節點和子節點是黑色
 859 |   - 每條路徑上黑色節點的數量相同
 860 |   - 每個新節點預設是紅色，若違反以上規則：
 861 |     - 翻轉，或
 862 |     - 更新節點顏色
 863 | 
 864 | ![image](https://storage.googleapis.com/ssivart/super9-blog/red-black-tree.png)
 865 | 
 866 | |        | Average case | Worst case |
 867 | |--------|--------------|------------|
 868 | | insert | O(logN)      | O(logN)    |
 869 | | delete | O(logN)      | O(logN)    |
 870 | | search | O(logN)      | O(logN)    |
 871 | 
 872 | github上用python實作的範例：[Red-Black-Tree](https://github.com/stanislavkozlovski/Red-Black-Tree/blob/master/rb_tree.py)
 873 | 
 874 | # 優先權佇列 Priority Queue
 875 | * 相較於Stack或Queue，對資料項目的取出順序是以權重(priority)來決定
 876 | * 常用heap來實作
 877 | 
 878 | # 二元堆積 Binary Heap
 879 | * 是一種二元樹資料結構，通常透過一維陣列(one dimension array)
 880 | * 根據排序行為分成`min`及`max`：
 881 |   - max heap: 父節點的值(value)或權重(key)大於子節點
 882 |   - min heap: 父節點的值(value)或權重(key)小於子節點
 883 | * 必須是完全(compelete)二元樹或近似完全二元樹
 884 | 
 885 | 註：
 886 | * heap資料結構跟heap memory沒有關聯
 887 | * 優勢在於取得最大權重或最小權重項目(root)，時間複雜度為O(1)
 888 | 
 889 | |        | time complexity                    |
 890 | |--------|------------------------------------|
 891 | | insert | O(N) + O(logN) reconsturct times   |
 892 | | delete | O(N) + O(logN) reconsturct times   |
 893 |  
 894 | ### 應用
 895 | * 堆積排序法（Heap Sort）
 896 | * 普林演算法（Prim's Algorithm）
 897 | * 戴克斯特拉演算法（Dijkstra's Algorithm）
 898 |  
 899 | ### 堆積排序 Heapsort
 900 | * 是一種比較排序法（Comparision Sort）
 901 | * 主要優勢在於能確保O(NlogN)的時間複雜度
 902 | * 屬於原地演算法(in-place algorithm)，缺點是每次排序都須重建heap——增加O(N)時間複雜度
 903 | * 在一維陣列起始位置為0的indexing:
 904 | 
 905 | ![image](https://storage.googleapis.com/ssivart/super9-blog/heap-indexing.png)
 906 | 
 907 |  操作可參考這篇文章：[Comparison Sort: Heap Sort(堆積排序法)
 908 | ](http://alrightchiu.github.io/SecondRound/comparison-sort-heap-sortdui-ji-pai-xu-fa.html)
 909 | 
 910 | 用Python實作Max Binary Heap，請參考[gist](https://gist.github.com/travishen/1230001923ddfac2e6bf5c752f4daa12)
 911 | ```python
 912 | class Heap(object):
 913 |     """Max Binary Heap"""
 914 |     
 915 |     def __init__(self, capacity=10):
 916 |         self._default = object()
 917 |         self.capacity = capacity
 918 |         self.heap = [self._default] * self.capacity
 919 |         
 920 |     def __len__(self):
 921 |         return len(self.heap) - self.heap.count(self._default)
 922 |     
 923 |     def __getitem__(self, i):
 924 |         return self.heap[i]
 925 |                   
 926 |     def insert(self, item):
 927 |         """O(1) + O(logN) time complexity"""
 928 |         if self.capacity == len(self):  # full
 929 |             return
 930 |         
 931 |         self.heap[len(self)] = item
 932 |         
 933 |         self.fix_up(self.heap.index(item))  # check item's validation
 934 |         
 935 |     def fix_up(self, index):
 936 |         """
 937 |         O(logN) time complexity
 938 |         Violate:
 939 |             1. child value > parent value
 940 |         """
 941 |         parent_index = (index-1)//2
 942 |         if index > 0 and self.heap[index] > self.heap[parent_index]: 
 943 |             # swap
 944 |             self.swap(index, parent_index)
 945 |             self.fix_up(parent_index)  # recursive
 946 |     
 947 |     def fix_down(self, index):
 948 |         """
 949 |         O(logN) time complexity
 950 |         Violate:
 951 |             1. child value > parent value
 952 |         """
 953 |         parent = self.heap[index]
 954 |         left_child_index = 2 * index + 1
 955 |         right_child_index = 2 * index + 2
 956 |         largest_index = index
 957 |         
 958 |         if left_child_index < len(self) and self.heap[left_child_index] > parent:
 959 |             largest_index = left_child_index
 960 |         
 961 |         if right_child_index < len(self) and self.heap[right_child_index] > self.heap[largest_index]: 
 962 |             largest_index = right_child_index
 963 |             
 964 |         if index != largest_index:
 965 |             self.swap(index, largest_index)
 966 |             self.fix_down(largest_index)  # recursive
 967 |             
 968 |     def heap_sort(self):
 969 |         """
 970 |         O(NlogN) time complixity
 971 |         """
 972 |         for i in range(0, len(self)):
 973 |             self.poll()           
 974 |         
 975 |     def swap(self, i1, i2):
 976 |         self.heap[i1], self.heap[i2] = self.heap[i2], self.heap[i1]
 977 |             
 978 |     def poll(self):
 979 |         max_ = self.max_
 980 |         
 981 |         self.swap(0, len(self) - 1)  # swap first and last
 982 |         self.heap[len(self) - 1] = self._default
 983 |         self.fix_down(0)
 984 |         
 985 |         return max_
 986 |     
 987 |     @property
 988 |     def max_(self):
 989 |         return self.heap[0]
 990 | ```
 991 | 
 992 | [python build-in heapq](https://gist.github.com/travishen/295ff80289bf54869d9842d285faa1a5)
 993 | 
 994 | # 關聯陣列/對映/字典 Associative Array/ Map/ Dictionary
 995 | * 鍵、值的配對(key-value)
 996 | * 相較於樹狀資料結構，劣勢在於排序困難
 997 | * 主要操作：
 998 |   - 新增、刪除、修改值
 999 |   - 搜尋已知的鍵
1000 |   
1001 | ![image](https://storage.googleapis.com/ssivart/super9-blog/assosiate-array.png)
1002 | 
1003 | ### hash function
1004 | * division method: modulo operator
1005 | 
1006 | > h(x) = n % m
1007 | 
1008 | > n: number of keys, m: number of buckets
1009 | 
1010 | #### Collision
1011 | 當多個key存取同一個bucket（slot），解決collision會導致時間複雜度提高
1012 | 
1013 | ```
1014 | h(26) = 26 mod 6 = 2
1015 | h(50) = 50 mod 6 = 2
1016 | ```
1017 | 
1018 | 解法：
1019 | 
1020 | * chaining: 在同一個slot用linked list存放多個關聯
1021 | * open addressing: 分配另一個空的slot
1022 |   - linear probing: 線性探測
1023 |   - quadratic probing: 二次方探測，如1, 2, 4, 8...
1024 |   - rehashing
1025 |   
1026 | Second Round皆有詳盡解說：
1027 | * [Hash Table：Open Addressing](http://alrightchiu.github.io/SecondRound/hash-tableopen-addressing.html)
1028 | * [Hash Table：Chaining](http://alrightchiu.github.io/SecondRound/hash-tablechaining.html)
1029 | 
1030 | #### Dynamic resizing
1031 | > load factor（佔用率）: n / m
1032 | 
1033 | * load factor會影響到存取的效能，因此須要根據使用率動態變更陣列大小；
1034 | * 舉例來說，Java觸發resize的時機點大約是佔用超過75%時、Python則約是66%
1035 | 
1036 | #### 應用
1037 | 
1038 | * 資料庫
1039 | * Network Routing
1040 | * Rabin-Karp演算法
1041 | * Hashing廣泛用於資料加密
1042 | 
1043 | 參考：
1044 | 
1045 | * http://www.globalsoftwaresupport.com/use-prime-numbers-hash-functions/
1046 | * http://alrightchiu.github.io/SecondRound/hash-tableintrojian-jie.html#collision
1047 | 
1048 | 以Python實作，請參考[gist](https://gist.github.com/travishen/f51365915ef7f178623a2cc9b2ede886)
1049 | 
1050 | ```python
1051 | from collections import Iterable
1052 | from functools import reduce
1053 | 
1054 | class HashTable(object):
1055 |     def __init__(self, size=10):
1056 |         self.size = 10
1057 |         self.keys = [None] * self.size
1058 |         self.values = [None] * self.size
1059 |         
1060 |     def __repr__(self):
1061 |         return 'HashTable(size={})'.format(self.size)
1062 |         
1063 |     def put(self, key, value):  
1064 |         index = self.hash(key)
1065 | 
1066 |         while self.keys[index] is not None:  # collision
1067 |             if self.keys[index] == key:  # update
1068 |                 self.values[index] = value  
1069 |                 return
1070 |             index = (index + 1) % self.size  # rehash
1071 |         
1072 |         self.keys[index] = key
1073 |         self.values[index] = value
1074 |         
1075 |     def get(self, key):
1076 |         if key in self.keys:
1077 |             return self.values[self.hash(key)]
1078 |         return None
1079 |     
1080 |     def hash(self, key):        
1081 |         if isinstance(key, Iterable):
1082 |             sum = reduce(lambda prev, n: prev + ord(n), key, 0)
1083 |         else:
1084 |             sum = key
1085 |             
1086 |         return sum % self.size
1087 | ```
1088 | 
1089 | |        | Average case | Worst case |
1090 | |--------|--------------|------------|
1091 | | insert | O(1)         | O(N)       |
1092 | | delete | O(1)         | O(N)       |
1093 | | search | O(1)         | O(N)       |
1094 | 
1095 | 
1096 | # 三元搜尋樹 Ternary Search Tree, TST
1097 | * 相較其他樹狀資料結構而言，佔用記憶體空間較小
1098 | * 只儲存string，不存NULL或其他物件
1099 | * 父節點可以有3個子節點：`left(less)`、`middle(equal)`、`right(greater)`
1100 | * 可以同時用來當作hashmap使用，也可以做排序
1101 | * 效能上比hashmap更佳，在解析key時是漸進式的（如`cat`若root沒有c就不用繼續找了）
1102 | 
1103 | ![image](https://storage.googleapis.com/ssivart/super9-blog/ternary-search-tree.png)
1104 | 
1105 | ### 應用
1106 | * autocompelete
1107 | * 拼字檢查
1108 | * 最近鄰居搜尋（Near-neighbor）
1109 | * WWW package routing
1110 | * 最長前綴匹配(perfix matching)
1111 | * Google Search
1112 | 
1113 | 以Python實作，請參考[gist](https://gist.github.com/travishen/cae7587e6d870d3f189fdcd70b96a8cc)
1114 | ```python
1115 | class Node(object):
1116 |     def __init__(self, char):
1117 |         self.char = char
1118 |         self.left = self.middle = self.right = None
1119 |         self.value = None
1120 |         
1121 | class TernarySearchTree(object):
1122 |     def __init__(self):
1123 |         self.root = None
1124 |         
1125 |     def __repr__(self):
1126 |         return 'TernarySearchTree()'
1127 |         
1128 |     def put(self, key, value):
1129 |         self.root = self.recursive(key, value)(self.root, 0)
1130 |         
1131 |     def get(self, key):
1132 |         node = self.recursive(key)(self.root, 0)
1133 |         if node:
1134 |             return node.value
1135 |         return -1
1136 |         
1137 |     def recursive(self, key, value=None):
1138 |         
1139 |         def putter(node, index): 
1140 |             char = key[index]
1141 |             
1142 |             if node is None:
1143 |                 node = Node(char)         
1144 |             if char < node.char:
1145 |                 node.left = putter(node.left, index)
1146 |             elif char > node.char:
1147 |                 node.right = putter(node.right, index)
1148 |             elif index < len(key) - 1:
1149 |                 node.middle = putter(node.middle, index+1)
1150 |             else:
1151 |                 node.value = value
1152 |                             
1153 |             return node
1154 |         
1155 |         def getter(node, index):
1156 |             char = key[index]
1157 |             
1158 |             if node is None:
1159 |                 return None
1160 |             
1161 |             if char < node.char:
1162 |                 return getter(node.left, index)
1163 |             elif char > node.char:
1164 |                 return getter(node.right, index)
1165 |             elif index < len(key) - 1:
1166 |                 return getter(node.middle, index+1)
1167 |             else:
1168 |                 return node
1169 |         
1170 |         if value:
1171 |             return putter
1172 |         else:
1173 |             return getter
1174 | ```
1175 | 


--------------------------------------------------------------------------------