├── .gitignore ├── LICENSE ├── README.md ├── _config.yml ├── basics ├── AlgorithmsBasics.html └── AlgorithmsBasics.ipynb ├── cellular-automata ├── CellularAutomata.html └── CellularAutomata.ipynb ├── chaotic-systems ├── ChaoticSystems.html └── ChaoticSystems.ipynb ├── divide-and-conquer ├── DivideAndConquer.html └── DivideAndConquer.ipynb ├── docs ├── CODE_OF_CONDUCT.md └── CONTRIBUTING.md ├── dynamic-programming ├── DynamicProgramming.html └── DynamicProgramming.ipynb ├── graphs ├── Graphs.html └── Graphs.ipynb ├── probabilistic-algorithms ├── ProbabilisticAlgorithms.html └── ProbabilisticAlgorithms.ipynb └── similarity-functions ├── SimilarityFunctions.html └── SimilarityFunctions.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | *-checkpoint.ipynb 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Andres Segura 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Python Algorithms Course 2 | ![version](https://img.shields.io/badge/version-Pro-blue) 3 | ![release](https://img.shields.io/badge/release-1.2.3-blue) 4 | ![language](https://img.shields.io/badge/language-Python_3.7%7C3.8-brightgreen) 5 | ![last-update](https://img.shields.io/badge/last_update-10/18/2023-orange) 6 | ![last-update](https://img.shields.io/badge/license-MIT-orange) 7 | 8 | Free hands-on course with the implementation (in Python) and description of several computational, mathematical and statistical algorithms. 9 | 10 | Although it is not intended to have the formal rigor of a book, we tried to be as faithful as possible to the original algorithms and methods, only adding variants, when these were necessary for didactic purposes. 11 | 12 | ## Quick Start 13 | The best way to get the most out of this course is to carefully read each selected problem, try to think of a possible solution (language independent) and then look at the proposed Python code and try to reproduce it in your favorite IDE. If you already have knowledge of the Python language, then you can go directly to programming your solution and then compare it with the one proposed in the course. 14 | 15 | If you want to play with these notebooks online without having to install any library or configure hardware, you can use the following service: 16 | - Open In Colab 17 | 18 | ## What is an algorithm? 19 | In mathematics and computer science, an algorithm is a finite sequence of well-defined, computer-implementable instructions, typically to solve a class of problems or to perform a computation. 20 | 21 | ## Contents 22 |
23 | 1. Algorithm's Basics 24 | 33 |
34 |
35 | 2. Divide and Conquer 36 | 42 |
43 |
44 | 3. Graphs 45 | 54 |
55 |
56 | 4. Dynamic Programming 57 | 65 |
66 |
67 | 5. Probabilistic Algorithms 68 | 76 |
77 |
78 | 6. Similarity Functions 79 | 84 |
85 |
86 | 7. Chaotic Systems 87 | 91 |
92 |
93 | 8. Cellular Automata 94 | 98 |
99 | 100 | ## Python Dependencies 101 | ``` console 102 | conda install -c anaconda numpy 103 | conda install -c anaconda pymc 104 | conda install -c anaconda networkx 105 | ``` 106 | 107 | ## Bibliography 108 | - G. Brassard, P. Bratley. (2006). *Fundamentals of Algorithmics*. Englewood Cliffs, New Jersey: Prentice-Hall, Inc. 109 | - R.C.T. Lee, S.S. Tseng, R.C. Chang, Y.T.Tsai. (2005). *Introduction to the Design and Analysis of Algorithms. A Strategic Approach*. Asia: McGraw-Hill Education. 110 | - K. Rosen. (2012). *Discrete Mathematics and Its Applications*. NewYork, NY: McGraw-Hill Education; Edición: 7th. 111 | 112 | ## Contributing and Feedback 113 | Any kind of feedback/suggestions would be greatly appreciated (algorithm design, documentation, improvement ideas, spelling mistakes, etc...). If you want to make a contribution to the course you can do it through a PR. 114 | 115 | ## Documentation 116 | Please read the [contributing](https://github.com/ansegura7/Algorithms/blob/main/docs/CONTRIBUTING.md) and [code of conduct](https://github.com/ansegura7/Algorithms/blob/main/docs/CODE_OF_CONDUCT.md) documentation. 117 | 118 | ## Author 119 | - Created by Andrés Segura-Tinoco 120 | - Created on May 17, 2019 121 | 122 | ## License 123 | This project is licensed under the terms of the MIT license. 124 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | title: Python Algorithms Course 2 | description: Free hands-on course with the implementation (in Python) and description of several computational, mathematical and statistical algorithms 3 | show_downloads: false 4 | google_analytics: 5 | theme: jekyll-theme-cayman 6 | -------------------------------------------------------------------------------- /cellular-automata/CellularAutomata.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 8. Cellular Automata" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "- **Created by Andrés Segura Tinoco**\n", 15 | "- **Created on Jul 08, 2019**\n", 16 | "- **Updated on Mar 19, 2021**" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "A **cellular automaton** (abbrev. CA) is a discrete model studied in computer science, mathematics, physics, complexity science, theoretical biology and microstructure modeling." 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "A **cellular automaton** consists of a regular grid of cells, each in one of a finite number of states, such as on and off. The grid can be in any finite number of dimensions. For each cell, a set of cells called its neighborhood is defined relative to the specified cell. An initial state (time t = 0) is selected by assigning a state for each cell. A new generation is created (advancing t by 1), according to some fixed rule (generally, a mathematical function) that determines the new state of each cell in terms of the current state of the cell and the states of the cells in its neighborhood. Typically, the rule for updating the state of cells is the same for each cell and does not change over time, and is applied to the whole grid simultaneously, though exceptions are known, such as the stochastic cellular automaton and asynchronous cellular automaton." 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 1, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "# Import Python libraries\n", 40 | "import numpy as np\n", 41 | "import matplotlib.pyplot as plt" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "## 8.1. Linear Cellular Automata (LCA)" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "### 8.1.1. Circular loop through an Automaton" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 2, 61 | "metadata": {}, 62 | "outputs": [], 63 | "source": [ 64 | "# Get automata cells \n", 65 | "def get_cells_range(automaton, ix_start, ix_end, ix_max):\n", 66 | " cells = ''\n", 67 | " for i in range(ix_start, ix_end + 1):\n", 68 | " ix = i % ix_max\n", 69 | " cells += str(automaton[ix])\n", 70 | " return cells" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 3, 76 | "metadata": {}, 77 | "outputs": [ 78 | { 79 | "name": "stdout", 80 | "output_type": "stream", 81 | "text": [ 82 | "( -1 , 1 ) = 102\n", 83 | "( 0 , 2 ) = 020\n", 84 | "( 1 , 3 ) = 200\n", 85 | "( 2 , 4 ) = 001\n", 86 | "( 3 , 5 ) = 010\n", 87 | "( 4 , 6 ) = 100\n", 88 | "( 5 , 7 ) = 003\n", 89 | "( 6 , 8 ) = 031\n", 90 | "( 7 , 9 ) = 310\n" 91 | ] 92 | } 93 | ], 94 | "source": [ 95 | "# Test get_cells_range function\n", 96 | "automaton = [0, 2, 0, 0, 1, 0, 0, 3, 1]\n", 97 | "n_cell = len(automaton)\n", 98 | "r = 1\n", 99 | "for i in range(n_cell):\n", 100 | " ix_start = i - r\n", 101 | " ix_end = i + r\n", 102 | " print('(', ix_start, ',', ix_end, ') =', get_cells_range(automaton, ix_start, ix_end, n_cell))" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "### 8.1.2. Apply evolution rule: 30" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 4, 115 | "metadata": {}, 116 | "outputs": [], 117 | "source": [ 118 | "# Evolve the LCA\n", 119 | "def evolve_lca_automaton(init_state, max_gen, rule, r):\n", 120 | " universe = [init_state]\n", 121 | " n_cell = len(init_state)\n", 122 | " \n", 123 | " for g in range(max_gen):\n", 124 | " new_automaton = []\n", 125 | " automaton = universe[g]\n", 126 | " \n", 127 | " for i in range(n_cell):\n", 128 | " ix_start = i - r\n", 129 | " ix_end = i + r\n", 130 | " gen = get_cells_range(automaton, ix_start, ix_end, n_cell)\n", 131 | " cell = rule[gen]\n", 132 | " new_automaton.append(cell)\n", 133 | " universe.append(new_automaton)\n", 134 | " \n", 135 | " return universe" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 5, 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [ 144 | "# Creates a LCA rule\n", 145 | "def create_lca_rule(byte, b_size = 8):\n", 146 | " rule = {}\n", 147 | " bits = bin(byte)[2:].zfill(b_size)\n", 148 | " \n", 149 | " for i in range(0, b_size):\n", 150 | " bin_key = bin(i)[2:].zfill(3)\n", 151 | " bin_value = bits[(b_size - i - 1)]\n", 152 | " rule[bin_key] = int(bin_value)\n", 153 | " \n", 154 | " return rule;" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 6, 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "data": { 164 | "text/plain": [ 165 | "{'000': 0,\n", 166 | " '001': 1,\n", 167 | " '010': 1,\n", 168 | " '011': 1,\n", 169 | " '100': 1,\n", 170 | " '101': 0,\n", 171 | " '110': 0,\n", 172 | " '111': 0}" 173 | ] 174 | }, 175 | "execution_count": 6, 176 | "metadata": {}, 177 | "output_type": "execute_result" 178 | } 179 | ], 180 | "source": [ 181 | "# Rule: bin(30) = 00011110\n", 182 | "rule = create_lca_rule(30)\n", 183 | "rule" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 7, 189 | "metadata": {}, 190 | "outputs": [], 191 | "source": [ 192 | "# Initial state of cellular automaton\n", 193 | "n_cell = 43\n", 194 | "init_state = list(np.zeros(n_cell).astype(int))\n", 195 | "init_state[n_cell // 2] = 1" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 8, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [ 204 | "# Evolve the linear cellular automaton\n", 205 | "max_generations = 21\n", 206 | "r = 1\n", 207 | "universe = evolve_lca_automaton(init_state, max_generations, rule, r)" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 9, 213 | "metadata": {}, 214 | "outputs": [], 215 | "source": [ 216 | "# Plot the Automaton evolution\n", 217 | "def plot_automata(universe):\n", 218 | " fig, ax = plt.subplots(figsize = (12, 12))\n", 219 | " ax.set_title(\"Automaton Evolution \", fontsize = 20)\n", 220 | " plt.imshow(universe, cmap='Greys', interpolation='nearest')\n", 221 | " plt.xlabel(\"Cells\", fontsize = 11)\n", 222 | " plt.ylabel(\"Generations\", fontsize = 11)\n", 223 | " plt.show()" 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": 10, 229 | "metadata": {}, 230 | "outputs": [ 231 | { 232 | "data": { 233 | "image/png": "\n", 234 | "text/plain": [ 235 | "
" 236 | ] 237 | }, 238 | "metadata": { 239 | "needs_background": "light" 240 | }, 241 | "output_type": "display_data" 242 | } 243 | ], 244 | "source": [ 245 | "# Plotting automaton\n", 246 | "plot_automata(universe)" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "### 8.1.3. Apply evolution rule: 90" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": 11, 259 | "metadata": {}, 260 | "outputs": [ 261 | { 262 | "data": { 263 | "text/plain": [ 264 | "{'000': 0,\n", 265 | " '001': 1,\n", 266 | " '010': 0,\n", 267 | " '011': 1,\n", 268 | " '100': 1,\n", 269 | " '101': 0,\n", 270 | " '110': 1,\n", 271 | " '111': 0}" 272 | ] 273 | }, 274 | "execution_count": 11, 275 | "metadata": {}, 276 | "output_type": "execute_result" 277 | } 278 | ], 279 | "source": [ 280 | "# Rule: bin(90) = 01011010\n", 281 | "rule = create_lca_rule(90)\n", 282 | "rule" 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": 12, 288 | "metadata": {}, 289 | "outputs": [], 290 | "source": [ 291 | "# Initial state of the linear cellular automaton\n", 292 | "n_cell = 257\n", 293 | "init_state = list(np.zeros(n_cell).astype(int))\n", 294 | "init_state[n_cell // 2] = 1" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": 13, 300 | "metadata": {}, 301 | "outputs": [], 302 | "source": [ 303 | "# Evolve the linear cellular automaton\n", 304 | "max_generations = 128\n", 305 | "r = 1\n", 306 | "universe = evolve_lca_automaton(init_state, max_generations, rule, r)" 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": 14, 312 | "metadata": {}, 313 | "outputs": [ 314 | { 315 | "data": { 316 | "image/png": "\n", 317 | "text/plain": [ 318 | "
" 319 | ] 320 | }, 321 | "metadata": { 322 | "needs_background": "light" 323 | }, 324 | "output_type": "display_data" 325 | } 326 | ], 327 | "source": [ 328 | "# Plotting automaton\n", 329 | "plot_automata(universe)" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": {}, 335 | "source": [ 336 | "## 8.2. Reversible Linear Cellular Automata (RLCA)" 337 | ] 338 | }, 339 | { 340 | "cell_type": "markdown", 341 | "metadata": {}, 342 | "source": [ 343 | "#### Util Functions: converts byte list to binary string and vice versa" 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 15, 349 | "metadata": {}, 350 | "outputs": [], 351 | "source": [ 352 | "# Convert a byte list to a bits string\n", 353 | "def byte_list_to_bit_string(byte_list):\n", 354 | " bit_string = ''\n", 355 | " for byte in byte_list:\n", 356 | " bit_string += bin(byte)[2:].zfill(8)\n", 357 | " return bit_string\n", 358 | "\n", 359 | "# Convert a bits string to a byte list\n", 360 | "def bit_string_to_byte_list(bit_string):\n", 361 | " byte_list = []\n", 362 | " n = len(bit_string) // 8\n", 363 | " for i in range(n):\n", 364 | " ix_start = i * 8\n", 365 | " ix_end = 8 + i * 8\n", 366 | " byte = bit_string[ix_start:ix_end]\n", 367 | " byte_list.append(int(byte, 2))\n", 368 | " return byte_list\n", 369 | "\n", 370 | "# Update a string by index\n", 371 | "def update_string(s, i, v):\n", 372 | " ns = s[:i] + v + s[i+1:]\n", 373 | " return ns" 374 | ] 375 | }, 376 | { 377 | "cell_type": "markdown", 378 | "metadata": {}, 379 | "source": [ 380 | "### 8.2.1. Reversible Linear Cellular Automata functions" 381 | ] 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "metadata": {}, 386 | "source": [ 387 | "**Totalistic:** A special class of cellular automata are totalistic cellular automata. The state of each cell in a totalistic cellular automaton is represented by a number (usually an integer value drawn from a finite set), and the value of a cell at time t depends only on the sum of the values of the cells in its neighborhood (possibly including the cell itself) at time t − 1." 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": 16, 393 | "metadata": {}, 394 | "outputs": [], 395 | "source": [ 396 | "# Creates a RLCA totalistic rule\n", 397 | "def create_rlca_rule(byte, r = 2):\n", 398 | " rule = {}\n", 399 | " r_size = 2 * r + 1\n", 400 | " bits = bin(byte)[2:].zfill(r_size)\n", 401 | " b_size = len(bits)\n", 402 | " \n", 403 | " for i in range(0, b_size):\n", 404 | " bin_value = bits[(b_size - i - 1)]\n", 405 | " rule[i] = int(bin_value)\n", 406 | " \n", 407 | " return rule" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 17, 413 | "metadata": {}, 414 | "outputs": [], 415 | "source": [ 416 | "# Get automata cells value\n", 417 | "def get_cells_value(automaton, ix, r, n):\n", 418 | " value = 0\n", 419 | " ix_start = max(ix - r, 0)\n", 420 | " ix_end = min(ix + r + 1, n)\n", 421 | " \n", 422 | " for i in range(ix_start, ix_end):\n", 423 | " if i != ix:\n", 424 | " value += int(automaton[i])\n", 425 | " \n", 426 | " return value" 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": 18, 432 | "metadata": {}, 433 | "outputs": [], 434 | "source": [ 435 | "# Apply Linear Cellular Automata rule to current automaton\n", 436 | "def apply_rlca_rule(automaton, rule, r, forward):\n", 437 | " n_cell = len(automaton)\n", 438 | " new_automaton = []\n", 439 | " curr_bit_string = (automaton + '.')[:-1]\n", 440 | " \n", 441 | " if forward:\n", 442 | " curr_range = range(0, n_cell, 1)\n", 443 | " else:\n", 444 | " curr_range = range(n_cell - 1, -1, -1)\n", 445 | " \n", 446 | " for i in curr_range:\n", 447 | " gen_ix = get_cells_value(curr_bit_string, i, r, n_cell)\n", 448 | " cell = rule[gen_ix]\n", 449 | " new_cell = str(int(not(int(curr_bit_string[i]) ^ cell)))\n", 450 | " \n", 451 | " if new_cell != curr_bit_string[i]:\n", 452 | " curr_bit_string = update_string(curr_bit_string, i, new_cell)\n", 453 | " \n", 454 | " # Code new automaton and return it\n", 455 | " new_automaton = bit_string_to_byte_list(curr_bit_string)\n", 456 | " return new_automaton" 457 | ] 458 | }, 459 | { 460 | "cell_type": "code", 461 | "execution_count": 19, 462 | "metadata": {}, 463 | "outputs": [], 464 | "source": [ 465 | "# Evolve the RLCA\n", 466 | "def evolve_rlca_automaton(init_state, max_gen, rule, r, forward):\n", 467 | " universe = [init_state]\n", 468 | " n_cell = len(init_state)\n", 469 | " \n", 470 | " for g in range(max_gen):\n", 471 | " automaton = universe[g]\n", 472 | " automaton_bits = byte_list_to_bit_string(automaton)\n", 473 | " new_automaton = apply_rlca_rule(automaton_bits, key, r, forward)\n", 474 | " universe.append(new_automaton)\n", 475 | " \n", 476 | " return universe" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "### 8.2.2. Run RLCA: Forward way" 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "execution_count": 20, 489 | "metadata": {}, 490 | "outputs": [], 491 | "source": [ 492 | "# Evolution params\n", 493 | "max_generations = 128\n", 494 | "r = 3" 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": 21, 500 | "metadata": {}, 501 | "outputs": [ 502 | { 503 | "data": { 504 | "text/plain": [ 505 | "{0: 0, 1: 1, 2: 1, 3: 1, 4: 0, 5: 0, 6: 0}" 506 | ] 507 | }, 508 | "execution_count": 21, 509 | "metadata": {}, 510 | "output_type": "execute_result" 511 | } 512 | ], 513 | "source": [ 514 | "# Totalistic Rule: bin(14) = 0001110\n", 515 | "key = create_rlca_rule(14, r)\n", 516 | "key" 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "execution_count": 22, 522 | "metadata": {}, 523 | "outputs": [], 524 | "source": [ 525 | "# Initial state of the linear cellular automaton\n", 526 | "n_cell = 257\n", 527 | "init_state = list(np.zeros(n_cell).astype(int))\n", 528 | "init_state[n_cell // 2] = 1" 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": 23, 534 | "metadata": {}, 535 | "outputs": [], 536 | "source": [ 537 | "# Evolve the reversible linear cellular automata\n", 538 | "forward = True\n", 539 | "universe = evolve_rlca_automaton(init_state, max_generations, rule, r, forward)" 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": 24, 545 | "metadata": {}, 546 | "outputs": [ 547 | { 548 | "data": { 549 | "image/png": "\n", 550 | "text/plain": [ 551 | "
" 552 | ] 553 | }, 554 | "metadata": { 555 | "needs_background": "light" 556 | }, 557 | "output_type": "display_data" 558 | } 559 | ], 560 | "source": [ 561 | "# Plotting RLCA\n", 562 | "plot_automata(universe)" 563 | ] 564 | }, 565 | { 566 | "cell_type": "markdown", 567 | "metadata": {}, 568 | "source": [ 569 | "### 8.2.3. Run RLCA: Backward way" 570 | ] 571 | }, 572 | { 573 | "cell_type": "code", 574 | "execution_count": 25, 575 | "metadata": {}, 576 | "outputs": [], 577 | "source": [ 578 | "# Initial state of the linear cellular automaton\n", 579 | "init_state2 = universe[max_generations - 1]" 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": 26, 585 | "metadata": {}, 586 | "outputs": [], 587 | "source": [ 588 | "# Evolve the reversible linear cellular automata\n", 589 | "forward = False\n", 590 | "universe2 = evolve_rlca_automaton(init_state2, max_generations, rule, r, forward)" 591 | ] 592 | }, 593 | { 594 | "cell_type": "code", 595 | "execution_count": 27, 596 | "metadata": {}, 597 | "outputs": [ 598 | { 599 | "data": { 600 | "image/png": "\n", 601 | "text/plain": [ 602 | "
" 603 | ] 604 | }, 605 | "metadata": { 606 | "needs_background": "light" 607 | }, 608 | "output_type": "display_data" 609 | } 610 | ], 611 | "source": [ 612 | "# Plotting RLCA\n", 613 | "plot_automata(universe2)" 614 | ] 615 | }, 616 | { 617 | "cell_type": "markdown", 618 | "metadata": {}, 619 | "source": [ 620 | "### 8.2.4. Validation of the Reversibility of the Automata" 621 | ] 622 | }, 623 | { 624 | "cell_type": "code", 625 | "execution_count": 28, 626 | "metadata": {}, 627 | "outputs": [ 628 | { 629 | "data": { 630 | "text/plain": [ 631 | "True" 632 | ] 633 | }, 634 | "execution_count": 28, 635 | "metadata": {}, 636 | "output_type": "execute_result" 637 | } 638 | ], 639 | "source": [ 640 | "universe[0] == universe2[max_generations - 1]" 641 | ] 642 | }, 643 | { 644 | "cell_type": "markdown", 645 | "metadata": {}, 646 | "source": [ 647 | "
\n", 648 | "

« Home

" 649 | ] 650 | } 651 | ], 652 | "metadata": { 653 | "kernelspec": { 654 | "display_name": "Python 3", 655 | "language": "python", 656 | "name": "python3" 657 | }, 658 | "language_info": { 659 | "codemirror_mode": { 660 | "name": "ipython", 661 | "version": 3 662 | }, 663 | "file_extension": ".py", 664 | "mimetype": "text/x-python", 665 | "name": "python", 666 | "nbconvert_exporter": "python", 667 | "pygments_lexer": "ipython3", 668 | "version": "3.8.5" 669 | }, 670 | "varInspector": { 671 | "cols": { 672 | "lenName": 16, 673 | "lenType": 16, 674 | "lenVar": 40 675 | }, 676 | "kernels_config": { 677 | "python": { 678 | "delete_cmd_postfix": "", 679 | "delete_cmd_prefix": "del ", 680 | "library": "var_list.py", 681 | "varRefreshCmd": "print(var_dic_list())" 682 | }, 683 | "r": { 684 | "delete_cmd_postfix": ") ", 685 | "delete_cmd_prefix": "rm(", 686 | "library": "var_list.r", 687 | "varRefreshCmd": "cat(var_dic_list()) " 688 | } 689 | }, 690 | "types_to_exclude": [ 691 | "module", 692 | "function", 693 | "builtin_function_or_method", 694 | "instance", 695 | "_Feature" 696 | ], 697 | "window_display": false 698 | } 699 | }, 700 | "nbformat": 4, 701 | "nbformat_minor": 2 702 | } 703 | -------------------------------------------------------------------------------- /docs/CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | We as members, contributors, and leaders pledge to make participation in our 6 | community a harassment-free experience for everyone, regardless of age, body 7 | size, visible or invisible disability, ethnicity, sex characteristics, gender 8 | identity and expression, level of experience, education, socio-economic status, 9 | nationality, personal appearance, race, religion, or sexual identity 10 | and orientation. 11 | 12 | We pledge to act and interact in ways that contribute to an open, welcoming, 13 | diverse, inclusive, and healthy community. 14 | 15 | ## Our Standards 16 | 17 | Examples of behavior that contributes to a positive environment for our 18 | community include: 19 | 20 | * Demonstrating empathy and kindness toward other people 21 | * Being respectful of differing opinions, viewpoints, and experiences 22 | * Giving and gracefully accepting constructive feedback 23 | * Accepting responsibility and apologizing to those affected by our mistakes, 24 | and learning from the experience 25 | * Focusing on what is best not just for us as individuals, but for the 26 | overall community 27 | 28 | Examples of unacceptable behavior include: 29 | 30 | * The use of sexualized language or imagery, and sexual attention or 31 | advances of any kind 32 | * Trolling, insulting or derogatory comments, and personal or political attacks 33 | * Public or private harassment 34 | * Publishing others' private information, such as a physical or email 35 | address, without their explicit permission 36 | * Other conduct which could reasonably be considered inappropriate in a 37 | professional setting 38 | 39 | ## Enforcement Responsibilities 40 | 41 | Community leaders are responsible for clarifying and enforcing our standards of 42 | acceptable behavior and will take appropriate and fair corrective action in 43 | response to any behavior that they deem inappropriate, threatening, offensive, 44 | or harmful. 45 | 46 | Community leaders have the right and responsibility to remove, edit, or reject 47 | comments, commits, code, wiki edits, issues, and other contributions that are 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation 49 | decisions when appropriate. 50 | 51 | ## Scope 52 | 53 | This Code of Conduct applies within all community spaces, and also applies when 54 | an individual is officially representing the community in public spaces. 55 | Examples of representing our community include using an official e-mail address, 56 | posting via an official social media account, or acting as an appointed 57 | representative at an online or offline event. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 62 | reported to the community leaders responsible for enforcement. 63 | 64 | All complaints will be reviewed and investigated promptly and fairly. 65 | 66 | All community leaders are obligated to respect the privacy and security of the 67 | reporter of any incident. 68 | 69 | ## Enforcement Guidelines 70 | 71 | Community leaders will follow these Community Impact Guidelines in determining 72 | the consequences for any action they deem in violation of this Code of Conduct: 73 | 74 | ### 1. Correction 75 | 76 | **Community Impact**: Use of inappropriate language or other behavior deemed 77 | unprofessional or unwelcome in the community. 78 | 79 | **Consequence**: A private, written warning from community leaders, providing 80 | clarity around the nature of the violation and an explanation of why the 81 | behavior was inappropriate. A public apology may be requested. 82 | 83 | ### 2. Warning 84 | 85 | **Community Impact**: A violation through a single incident or series 86 | of actions. 87 | 88 | **Consequence**: A warning with consequences for continued behavior. No 89 | interaction with the people involved, including unsolicited interaction with 90 | those enforcing the Code of Conduct, for a specified period of time. This 91 | includes avoiding interactions in community spaces as well as external channels 92 | like social media. Violating these terms may lead to a temporary or 93 | permanent ban. 94 | 95 | ### 3. Temporary Ban 96 | 97 | **Community Impact**: A serious violation of community standards, including 98 | sustained inappropriate behavior. 99 | 100 | **Consequence**: A temporary ban from any sort of interaction or public 101 | communication with the community for a specified period of time. No public or 102 | private interaction with the people involved, including unsolicited interaction 103 | with those enforcing the Code of Conduct, is allowed during this period. 104 | Violating these terms may lead to a permanent ban. 105 | 106 | ### 4. Permanent Ban 107 | 108 | **Community Impact**: Demonstrating a pattern of violation of community 109 | standards, including sustained inappropriate behavior, harassment of an 110 | individual, or aggression toward or disparagement of classes of individuals. 111 | 112 | **Consequence**: A permanent ban from any sort of public interaction within 113 | the community. 114 | 115 | ## Attribution 116 | 117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 118 | version 2.0, available at 119 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. 120 | 121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct 122 | enforcement ladder](https://github.com/mozilla/diversity). 123 | 124 | [homepage]: https://www.contributor-covenant.org 125 | 126 | For answers to common questions about this code of conduct, see the FAQ at 127 | https://www.contributor-covenant.org/faq. Translations are available at 128 | https://www.contributor-covenant.org/translations. 129 | -------------------------------------------------------------------------------- /docs/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | Welcome, and thank you for your interest in helping to improve and accelerate/ease the adoption of the Python Algorithms Course. 3 | 4 | There are many ways in which you can contribute, beyond writing code. The goal of this document is to provide a high-level overview of how you can get involved, and hopefully not feel intimidated. 5 | 6 | ## Asking Questions and Providing Feedback 7 | Have a question? Rather than emailing the author, open an issue. 8 | 9 | The community will be eager to assist you. Your well-worded question will serve as a resource to others searching for help. 10 | 11 | ## Reporting Issues and Ideas 12 | Have you identified an oversight or problem? Have a feature request? We want to hear about it! Here's how you can make reporting your issue as effective as possible. 13 | 14 | > **Note:** If you already know what you want to change, feel free to just fork/clone the repo, change it, and submit a pull request. No need to add overhead by creating an issue! 15 | 16 | ### Look For an Existing Issue 17 | Before you create a new issue, please do a search in [open issues](https://github.com/ansegura7/Algorithms/issues) to see if the issue or feature request has already been filed. 18 | 19 | If you cannot find an existing issue that describes your bug or feature, create a new issue using the guidelines below. 20 | 21 | ### Writing Good Bug Reports and Feature Requests 22 | File a single issue per problem and feature request. Do not enumerate multiple bugs or feature requests in the same issue. 23 | 24 | Below is some information you can provide. The more you can provide, the more likely someone will be successful at understanding and incorporating it. However be mindful of the cost/benefit for documenting vs simply implementing the change. 25 | 26 | Please include the following with each issue: 27 | - **Title** - Concise and clear to quickly identify the topic. 28 | - **Problem** - Summary of the issue/idea/feature. 29 | - **Possible Solution** - If a solution seems clear, share it as an option. 30 | - **Examples** - "What was expected" vs "What actually ocurred". 31 | - **Context** - External factors that restrict possible solutions. (stuff that can't be changed) 32 | > Example: Please upgrade the vehicle from 30mph to 60mph. Context: budget is $5000. 33 | 34 | # Thank You! 35 | Your contributions to the Python Algorithms Course, large or small, make great projects like this possible. Thank you for taking the time to contribute! 36 | -------------------------------------------------------------------------------- /dynamic-programming/DynamicProgramming.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 4. Dynamic Programming" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "- **Created by Andrés Segura Tinoco**\n", 15 | "- **Created on Jan 26, 2020**\n", 16 | "- **Updated on May 18, 2021**" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "**Dynamic programming** is an efficient technique for solving many combinatorial optimization problems in a polynomial time." 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "Dynamic programming is both a mathematical optimization method and a computer programming method. In both contexts it refers to simplifying a complicated problem by breaking it down into simpler sub-problems in a recursive manner [1]. There are two key attributes that a problem must have in order for dynamic programming to be applicable: optimal substructure and overlapping sub-problems." 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "### Principle of Optimality\n", 38 | "An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision [2]. (See Bellman, 1957, Chap. III.3.)" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "## 4.1. Binomial Coefficient" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "In mathematics, the **binomial coefficients** are the positive integers that occur as coefficients in the binomial theorem [3]. Commonly, a binomial coefficient is indexed by a pair of integers $ n ≥ k ≥ 0 $ and is written $ \\tbinom {n}{k} $. It is the coefficient of the $ x^k $ term in the polynomial expansion of the binomial power $ (1 + x)^n $, and it is given by the formula:\n", 53 | "\n", 54 | "$$ \\tbinom {n}{k} = \\frac {n!}{k!(n-k)!} \\tag{1}$$" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 1, 60 | "metadata": {}, 61 | "outputs": [], 62 | "source": [ 63 | "# Load the Python libraries\n", 64 | "import timeit\n", 65 | "import math\n", 66 | "import pandas as pd" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 2, 72 | "metadata": {}, 73 | "outputs": [], 74 | "source": [ 75 | "# Example values\n", 76 | "n = 25\n", 77 | "k = 15" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "### 4.1.1. Formula approach" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 3, 90 | "metadata": {}, 91 | "outputs": [], 92 | "source": [ 93 | "# Binomial coefficient from the mathematical formula\n", 94 | "def bin_coef_1(n, k):\n", 95 | " return int(math.factorial(n) / (math.factorial(k) * math.factorial(n - k)))" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 4, 101 | "metadata": {}, 102 | "outputs": [ 103 | { 104 | "name": "stdout", 105 | "output_type": "stream", 106 | "text": [ 107 | "3268760\n", 108 | ">> elapsed time 0.3633999999999027 ms\n" 109 | ] 110 | } 111 | ], 112 | "source": [ 113 | "start_time = timeit.default_timer()\n", 114 | "print(bin_coef_1(n, k))\n", 115 | "print('>> elapsed time', (timeit.default_timer() - start_time) * 1000, 'ms')" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "### 4.1.2. Simple approach" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 5, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "# The recursive natural solution\n", 132 | "def bin_coef_2(n, k):\n", 133 | " if k == 0 or k == n:\n", 134 | " return 1\n", 135 | " return bin_coef_2(n - 1, k - 1) + bin_coef_2(n - 1, k)" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 6, 141 | "metadata": {}, 142 | "outputs": [ 143 | { 144 | "name": "stdout", 145 | "output_type": "stream", 146 | "text": [ 147 | "3268760\n", 148 | ">> elapsed time 1602.7969 ms\n" 149 | ] 150 | } 151 | ], 152 | "source": [ 153 | "start_time = timeit.default_timer()\n", 154 | "print(bin_coef_2(n, k))\n", 155 | "print('>> elapsed time', (timeit.default_timer() - start_time) * 1000, 'ms')" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "### 4.1.3. Dynamic Programming" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 7, 168 | "metadata": {}, 169 | "outputs": [], 170 | "source": [ 171 | "# Solution with dynamic programming (supported by a table)\n", 172 | "def bin_coef_3(n, k):\n", 173 | " c = 0\n", 174 | " v = [1] * (k + 1)\n", 175 | " \n", 176 | " for i in range(n + 1):\n", 177 | " for j in range(k, 0, -1):\n", 178 | " if j < i:\n", 179 | " v[j] = v[j - 1] + v[j]\n", 180 | " \n", 181 | " return v[k]" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 8, 187 | "metadata": {}, 188 | "outputs": [ 189 | { 190 | "name": "stdout", 191 | "output_type": "stream", 192 | "text": [ 193 | "3268760\n", 194 | ">> elapsed time 1.0575999999997698 ms\n" 195 | ] 196 | } 197 | ], 198 | "source": [ 199 | "start_time = timeit.default_timer()\n", 200 | "print(bin_coef_3(n, k))\n", 201 | "print('>> elapsed time', (timeit.default_timer() - start_time) * 1000, 'ms')" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "With time complexity of $ \\Theta(nk) $ and a space complexity of $ \\Theta(k) $." 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "## 4.2. World Championship problem" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": 9, 221 | "metadata": {}, 222 | "outputs": [], 223 | "source": [ 224 | "# Example values\n", 225 | "n = 10\n", 226 | "p = 0.55\n", 227 | "q = 1 - p" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "### 4.2.1. Simple approach" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": 10, 240 | "metadata": {}, 241 | "outputs": [], 242 | "source": [ 243 | "# The recursive natural solution\n", 244 | "def WCP(i, j):\n", 245 | " if i == 0:\n", 246 | " return 1\n", 247 | " elif j == 0:\n", 248 | " return 0\n", 249 | " return p * WCP(i - 1, j) + q * WCP(i, j - 1)" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": 11, 255 | "metadata": {}, 256 | "outputs": [ 257 | { 258 | "name": "stdout", 259 | "output_type": "stream", 260 | "text": [ 261 | "0.6710359124216079\n", 262 | ">> elapsed time 152.84509999999952 ms\n" 263 | ] 264 | } 265 | ], 266 | "source": [ 267 | "start_time = timeit.default_timer()\n", 268 | "print(WCP(n, n))\n", 269 | "print('>> elapsed time', (timeit.default_timer() - start_time) * 1000, 'ms')" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "### 4.2.2. Dynamic Programming" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": 12, 282 | "metadata": {}, 283 | "outputs": [], 284 | "source": [ 285 | "# Solution with dynamic programming (supported by a table)\n", 286 | "def WCP2(n, p):\n", 287 | " n = n + 1\n", 288 | " q = 1 - p\n", 289 | " prob = [[0] * n for i in range(n)]\n", 290 | " \n", 291 | " for s in range(n):\n", 292 | " prob[0][s] = 1\n", 293 | " for k in range(1, s):\n", 294 | " prob[k][s - k] = p * prob[k - 1][s - k] + q * prob[k][s - k - 1]\n", 295 | " \n", 296 | " for s in range(1, n):\n", 297 | " for k in range(0, n - s):\n", 298 | " prob[s + k][n - k - 1] = p * prob[s + k - 1][n - k - 1] + q * prob[s + k][n - k - 2]\n", 299 | " \n", 300 | " return prob[n - 1][n - 1]" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": 13, 306 | "metadata": {}, 307 | "outputs": [ 308 | { 309 | "name": "stdout", 310 | "output_type": "stream", 311 | "text": [ 312 | "0.6710359124216079\n", 313 | ">> elapsed time 3.256300000000323 ms\n" 314 | ] 315 | } 316 | ], 317 | "source": [ 318 | "start_time = timeit.default_timer()\n", 319 | "print(WCP2(n, p))\n", 320 | "print('>> elapsed time', (timeit.default_timer() - start_time) * 1000, 'ms')" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "With time complexity of $ \\Theta(n^2) $ and a space complexity of $ \\Theta(n^2) $." 328 | ] 329 | }, 330 | { 331 | "cell_type": "markdown", 332 | "metadata": {}, 333 | "source": [ 334 | "## 4.3. Coin Change problem" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": {}, 340 | "source": [ 341 | "The **coin-change problem** or change-making problem addresses the question of finding the minimum number of coins (of certain denominations) that add up to a given amount of money. It is a special case of the integer knapsack problem, and has applications wider than just currency [4]." 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": {}, 347 | "source": [ 348 | "#### Returns all possible combinations of coins change with Dynamic Programming\n", 349 | "Version with unlimited supply of coins." 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": 14, 355 | "metadata": {}, 356 | "outputs": [], 357 | "source": [ 358 | "def calc_coin_change(N, d):\n", 359 | " n = len(d)\n", 360 | " matrix = [[0] * (N + 1) for i in range(n)]\n", 361 | " \n", 362 | " for i in range(0, n):\n", 363 | " for j in range(1, N + 1):\n", 364 | " if i == 0 and j < d[i]:\n", 365 | " matrix[i][j] = math.inf\n", 366 | " elif i == 0:\n", 367 | " matrix[i][j] = 1 + matrix[0][j - d[0]]\n", 368 | " elif j < d[i]:\n", 369 | " matrix[i][j] = matrix[i - 1][j]\n", 370 | " else:\n", 371 | " matrix[i][j] = min(matrix[i - 1][j], 1 + matrix[i][j - d[i]])\n", 372 | " \n", 373 | " return matrix" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": 15, 379 | "metadata": {}, 380 | "outputs": [ 381 | { 382 | "data": { 383 | "text/html": [ 384 | "
\n", 385 | "\n", 398 | "\n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | "
012345678
1012345678
4012312342
6012312122
\n", 452 | "
" 453 | ], 454 | "text/plain": [ 455 | " 0 1 2 3 4 5 6 7 8\n", 456 | "1 0 1 2 3 4 5 6 7 8\n", 457 | "4 0 1 2 3 1 2 3 4 2\n", 458 | "6 0 1 2 3 1 2 1 2 2" 459 | ] 460 | }, 461 | "execution_count": 15, 462 | "metadata": {}, 463 | "output_type": "execute_result" 464 | } 465 | ], 466 | "source": [ 467 | "# Example values\n", 468 | "N = 8\n", 469 | "d = [1, 4, 6]\n", 470 | "\n", 471 | "# Showing results\n", 472 | "dp_table = calc_coin_change(N, d)\n", 473 | "pd.DataFrame(dp_table, index=d)" 474 | ] 475 | }, 476 | { 477 | "cell_type": "markdown", 478 | "metadata": {}, 479 | "source": [ 480 | "With time complexity of $ \\Theta(nN) $ and a space complexity of $ \\Theta(n(N + 1)) $." 481 | ] 482 | }, 483 | { 484 | "cell_type": "markdown", 485 | "metadata": {}, 486 | "source": [ 487 | "#### Calculate the list of coins needed to give change\n", 488 | "Greedy approach" 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": 16, 494 | "metadata": {}, 495 | "outputs": [], 496 | "source": [ 497 | "def get_coins_list(c, d, N, verbose=False):\n", 498 | " coins_list = []\n", 499 | " i = len(d) - 1\n", 500 | " j = N\n", 501 | " \n", 502 | " while i > -1 and j > -1:\n", 503 | " if verbose:\n", 504 | " print(i, j)\n", 505 | " \n", 506 | " if i - 1 >= 0 and c[i][j] == c[i - 1][j]:\n", 507 | " i = i - 1\n", 508 | " elif j - d[i] >= 0 and c[i][j] == 1 + c[i][j - d[i]]:\n", 509 | " coins_list.append(d[i])\n", 510 | " j = j - d[i]\n", 511 | " else:\n", 512 | " break\n", 513 | " \n", 514 | " return coins_list" 515 | ] 516 | }, 517 | { 518 | "cell_type": "code", 519 | "execution_count": 17, 520 | "metadata": {}, 521 | "outputs": [ 522 | { 523 | "name": "stdout", 524 | "output_type": "stream", 525 | "text": [ 526 | "0 -> []\n", 527 | "1 -> [1]\n", 528 | "2 -> [1, 1]\n", 529 | "3 -> [1, 1, 1]\n", 530 | "4 -> [4]\n", 531 | "5 -> [4, 1]\n", 532 | "6 -> [6]\n", 533 | "7 -> [6, 1]\n", 534 | "8 -> [4, 4]\n" 535 | ] 536 | } 537 | ], 538 | "source": [ 539 | "# List of coins for each scenario\n", 540 | "for j in range(0, N + 1):\n", 541 | " print(j, '->', get_coins_list(dp_table, d, j))" 542 | ] 543 | }, 544 | { 545 | "cell_type": "markdown", 546 | "metadata": {}, 547 | "source": [ 548 | "With time complexity of $ \\Theta(n + c[n, N]) $ and a space complexity of $ \\Theta(n(N + 1)) $." 549 | ] 550 | }, 551 | { 552 | "cell_type": "markdown", 553 | "metadata": {}, 554 | "source": [ 555 | "## 4.4. The Knapsack problem" 556 | ] 557 | }, 558 | { 559 | "cell_type": "markdown", 560 | "metadata": {}, 561 | "source": [ 562 | "The **knapsack problem** or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit **W** and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items [5]." 563 | ] 564 | }, 565 | { 566 | "cell_type": "markdown", 567 | "metadata": {}, 568 | "source": [ 569 | "#### Get best items combination with Dynamic Programming" 570 | ] 571 | }, 572 | { 573 | "cell_type": "code", 574 | "execution_count": 18, 575 | "metadata": {}, 576 | "outputs": [], 577 | "source": [ 578 | "def calc_best_knapsack(w, v, W):\n", 579 | " n = len(v)\n", 580 | " matrix = [[0] * (W + 1) for i in range(n)]\n", 581 | " \n", 582 | " for i in range(0, n):\n", 583 | " for j in range(1, W + 1):\n", 584 | " if i == 0 and j < w[i]:\n", 585 | " matrix[i][j] = -math.inf\n", 586 | " elif i == 0:\n", 587 | " matrix[i][j] = v[i]\n", 588 | " elif j < w[i]:\n", 589 | " matrix[i][j] = matrix[i - 1][j]\n", 590 | " else:\n", 591 | " matrix[i][j] = max(matrix[i - 1][j], matrix[i - 1][j - w[i]] + v[i])\n", 592 | " \n", 593 | " return matrix" 594 | ] 595 | }, 596 | { 597 | "cell_type": "code", 598 | "execution_count": 19, 599 | "metadata": {}, 600 | "outputs": [ 601 | { 602 | "data": { 603 | "text/html": [ 604 | "
\n", 605 | "\n", 618 | "\n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | "
01234567891011
w:1, v:1011111111111
w:2, v:6016777777777
w:5, v:180167718192425252525
w:6, v:220167718222428292940
w:7, v:280167718222829343540
\n", 714 | "
" 715 | ], 716 | "text/plain": [ 717 | " 0 1 2 3 4 5 6 7 8 9 10 11\n", 718 | "w:1, v:1 0 1 1 1 1 1 1 1 1 1 1 1\n", 719 | "w:2, v:6 0 1 6 7 7 7 7 7 7 7 7 7\n", 720 | "w:5, v:18 0 1 6 7 7 18 19 24 25 25 25 25\n", 721 | "w:6, v:22 0 1 6 7 7 18 22 24 28 29 29 40\n", 722 | "w:7, v:28 0 1 6 7 7 18 22 28 29 34 35 40" 723 | ] 724 | }, 725 | "execution_count": 19, 726 | "metadata": {}, 727 | "output_type": "execute_result" 728 | } 729 | ], 730 | "source": [ 731 | "# Example values\n", 732 | "w = [1, 2, 5, 6, 7]\n", 733 | "v = [1, 6, 18, 22, 28]\n", 734 | "max_weight = 11\n", 735 | "\n", 736 | "# Run algorithm\n", 737 | "dp_table = calc_best_knapsack(w, v, max_weight)\n", 738 | "df_index = [\"w:\" + str(w[i]) + \", v:\" + str(v[i]) for i in range(len(v))]\n", 739 | "pd.DataFrame(dp_table, index=df_index)" 740 | ] 741 | }, 742 | { 743 | "cell_type": "markdown", 744 | "metadata": {}, 745 | "source": [ 746 | "With time complexity of $ \\Theta(nW) $ and a space complexity of $ \\Theta(n(W + 1)) $." 747 | ] 748 | }, 749 | { 750 | "cell_type": "markdown", 751 | "metadata": {}, 752 | "source": [ 753 | "#### Calculate the list of items needed to fill the backpack\n", 754 | "Greedy approach" 755 | ] 756 | }, 757 | { 758 | "cell_type": "code", 759 | "execution_count": 20, 760 | "metadata": {}, 761 | "outputs": [], 762 | "source": [ 763 | "def get_items_list(values, v, w, W, verbose=False):\n", 764 | " item_list = []\n", 765 | " i = len(w) - 1\n", 766 | " j = W\n", 767 | " \n", 768 | " while i > -1 and j > -1:\n", 769 | " if verbose:\n", 770 | " print(i, j)\n", 771 | " \n", 772 | " if i - 1 >= 0 and values[i][j] == values[i - 1][j]:\n", 773 | " i = i - 1\n", 774 | " elif i - 1 >= 0 and j - w[i] >= 0 and values[i][j] == values[i - 1][j - w[i]] + v[i]:\n", 775 | " item = { \"w\": w[i], \"v\": v[i] }\n", 776 | " item_list.append(item)\n", 777 | " j = j - w[i]\n", 778 | " i = i - 1\n", 779 | " elif i == 0 and values[i][j] == v[i]:\n", 780 | " item = { \"w\": w[i], \"v\": v[i] }\n", 781 | " item_list.append(item)\n", 782 | " break\n", 783 | " else:\n", 784 | " break\n", 785 | " \n", 786 | " return item_list" 787 | ] 788 | }, 789 | { 790 | "cell_type": "code", 791 | "execution_count": 21, 792 | "metadata": {}, 793 | "outputs": [ 794 | { 795 | "name": "stdout", 796 | "output_type": "stream", 797 | "text": [ 798 | "0 -> []\n", 799 | "1 -> [{'w': 1, 'v': 1}]\n", 800 | "2 -> [{'w': 2, 'v': 6}]\n", 801 | "3 -> [{'w': 2, 'v': 6}, {'w': 1, 'v': 1}]\n", 802 | "4 -> [{'w': 2, 'v': 6}, {'w': 1, 'v': 1}]\n", 803 | "5 -> [{'w': 5, 'v': 18}]\n", 804 | "6 -> [{'w': 6, 'v': 22}]\n", 805 | "7 -> [{'w': 7, 'v': 28}]\n", 806 | "8 -> [{'w': 7, 'v': 28}, {'w': 1, 'v': 1}]\n", 807 | "9 -> [{'w': 7, 'v': 28}, {'w': 2, 'v': 6}]\n", 808 | "10 -> [{'w': 7, 'v': 28}, {'w': 2, 'v': 6}, {'w': 1, 'v': 1}]\n", 809 | "11 -> [{'w': 6, 'v': 22}, {'w': 5, 'v': 18}]\n" 810 | ] 811 | } 812 | ], 813 | "source": [ 814 | "# List of coins for each scenario\n", 815 | "for j in range(0, max_weight + 1):\n", 816 | " print(j, '->', get_items_list(dp_table, v, w, j))" 817 | ] 818 | }, 819 | { 820 | "cell_type": "markdown", 821 | "metadata": {}, 822 | "source": [ 823 | "With time complexity of $ \\Theta(n + W) $ and a space complexity of $ \\Theta(n(W + 1)) $." 824 | ] 825 | }, 826 | { 827 | "cell_type": "markdown", 828 | "metadata": {}, 829 | "source": [ 830 | "## 4.5. Longest Common Subsequence (LCS) problem" 831 | ] 832 | }, 833 | { 834 | "cell_type": "markdown", 835 | "metadata": {}, 836 | "source": [ 837 | "The **longest common subsequence** (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring problem: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences [6].\n", 838 | "\n", 839 | "The longest common subsequence problem is a classic computer science problem, the basis of data comparison programs such as the diff utility, and has applications in computational linguistics and bioinformatics. It is also widely used by revision control systems such as Git for reconciling multiple changes made to a revision-controlled collection of files." 840 | ] 841 | }, 842 | { 843 | "cell_type": "markdown", 844 | "metadata": {}, 845 | "source": [ 846 | "#### Get the Longest Common Subsequence with Dynamic Programming" 847 | ] 848 | }, 849 | { 850 | "cell_type": "code", 851 | "execution_count": 22, 852 | "metadata": {}, 853 | "outputs": [], 854 | "source": [ 855 | "def calc_lcs(a, b):\n", 856 | " n = len(a)\n", 857 | " m = len(b)\n", 858 | " matrix = [[0] * (m + 1) for i in range(n + 1)]\n", 859 | " \n", 860 | " for i in range(1, n + 1):\n", 861 | " for j in range(1, m + 1):\n", 862 | " if a[i - 1] == b[j - 1]:\n", 863 | " matrix[i][j] = 1 + matrix[i - 1][j - 1]\n", 864 | " else:\n", 865 | " matrix[i][j] = max(matrix[i - 1][j], matrix[i][j - 1])\n", 866 | " \n", 867 | " return matrix" 868 | ] 869 | }, 870 | { 871 | "cell_type": "code", 872 | "execution_count": 23, 873 | "metadata": {}, 874 | "outputs": [ 875 | { 876 | "data": { 877 | "text/html": [ 878 | "
\n", 879 | "\n", 892 | "\n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | " \n", 954 | " \n", 955 | " \n", 956 | " \n", 957 | " \n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | " \n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | "
-MZJAWXU
-00000000
X00000011
M01111111
J01122222
Y01122222
A01123333
U01123334
Z01223334
\n", 997 | "
" 998 | ], 999 | "text/plain": [ 1000 | " - M Z J A W X U\n", 1001 | "- 0 0 0 0 0 0 0 0\n", 1002 | "X 0 0 0 0 0 0 1 1\n", 1003 | "M 0 1 1 1 1 1 1 1\n", 1004 | "J 0 1 1 2 2 2 2 2\n", 1005 | "Y 0 1 1 2 2 2 2 2\n", 1006 | "A 0 1 1 2 3 3 3 3\n", 1007 | "U 0 1 1 2 3 3 3 4\n", 1008 | "Z 0 1 2 2 3 3 3 4" 1009 | ] 1010 | }, 1011 | "execution_count": 23, 1012 | "metadata": {}, 1013 | "output_type": "execute_result" 1014 | } 1015 | ], 1016 | "source": [ 1017 | "# Example values\n", 1018 | "a = ['X', 'M', 'J', 'Y', 'A', 'U', 'Z']\n", 1019 | "b = ['M', 'Z', 'J', 'A', 'W', 'X', 'U']\n", 1020 | "\n", 1021 | "# Run algorithm\n", 1022 | "dp_table = calc_lcs(a, b)\n", 1023 | "pd.DataFrame(dp_table, index=['-'] + a, columns=['-'] + b)" 1024 | ] 1025 | }, 1026 | { 1027 | "cell_type": "markdown", 1028 | "metadata": {}, 1029 | "source": [ 1030 | "With time complexity of $ \\Theta(nm) $ and a space complexity of $ \\Theta((n + 1)(m + 1)) $." 1031 | ] 1032 | }, 1033 | { 1034 | "cell_type": "markdown", 1035 | "metadata": {}, 1036 | "source": [ 1037 | "#### Calculate the Longest Common Subsequence\n", 1038 | "Greedy approach" 1039 | ] 1040 | }, 1041 | { 1042 | "cell_type": "code", 1043 | "execution_count": 24, 1044 | "metadata": {}, 1045 | "outputs": [], 1046 | "source": [ 1047 | "def get_lcs(matrix, a, b, verbose=False):\n", 1048 | " lc_seq = []\n", 1049 | " i = len(a)\n", 1050 | " j = len(b)\n", 1051 | " \n", 1052 | " while i > -1 and j > -1:\n", 1053 | " if verbose:\n", 1054 | " print(i, j)\n", 1055 | " \n", 1056 | " if i > 0 and j > 0 and a[i - 1] == b[j - 1]:\n", 1057 | " lc_seq.append(a[i - 1])\n", 1058 | " i = i - 1\n", 1059 | " j = j - 1\n", 1060 | " elif j > 0 and (i == 0 or matrix[i][j - 1] >= matrix[i - 1][j]):\n", 1061 | " j = j - 1\n", 1062 | " elif i > 0 and (j == 0 or matrix[i][j - 1] < matrix[i - 1][j]):\n", 1063 | " i = i - 1\n", 1064 | " else:\n", 1065 | " break\n", 1066 | " \n", 1067 | " return list(reversed(lc_seq))" 1068 | ] 1069 | }, 1070 | { 1071 | "cell_type": "code", 1072 | "execution_count": 25, 1073 | "metadata": {}, 1074 | "outputs": [ 1075 | { 1076 | "data": { 1077 | "text/plain": [ 1078 | "['M', 'J', 'A', 'U']" 1079 | ] 1080 | }, 1081 | "execution_count": 25, 1082 | "metadata": {}, 1083 | "output_type": "execute_result" 1084 | } 1085 | ], 1086 | "source": [ 1087 | "# This function gets the longest common subsequence\n", 1088 | "get_lcs(dp_table, a, b)" 1089 | ] 1090 | }, 1091 | { 1092 | "cell_type": "markdown", 1093 | "metadata": {}, 1094 | "source": [ 1095 | "With time complexity of $ \\Theta(n + m) $ and a space complexity of $ \\Theta((n + 1)(m + 1)) $." 1096 | ] 1097 | }, 1098 | { 1099 | "cell_type": "markdown", 1100 | "metadata": {}, 1101 | "source": [ 1102 | "## 4.6. Sequence Alignment problem" 1103 | ] 1104 | }, 1105 | { 1106 | "cell_type": "markdown", 1107 | "metadata": {}, 1108 | "source": [ 1109 | "In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences [7]. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns.\n", 1110 | "\n", 1111 | "Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data." 1112 | ] 1113 | }, 1114 | { 1115 | "cell_type": "markdown", 1116 | "metadata": {}, 1117 | "source": [ 1118 | "### 4.6.1. The Needleman-Wunsch algorithm will be used" 1119 | ] 1120 | }, 1121 | { 1122 | "cell_type": "markdown", 1123 | "metadata": {}, 1124 | "source": [ 1125 | "It was one of the first applications of dynamic programming to compare biological sequences. The algorithm essentially divides a large problem (e.g. the full sequence) into a series of smaller problems, and it uses the solutions to the smaller problems to find an optimal solution to the larger problem [8]." 1126 | ] 1127 | }, 1128 | { 1129 | "cell_type": "code", 1130 | "execution_count": 26, 1131 | "metadata": {}, 1132 | "outputs": [], 1133 | "source": [ 1134 | "# Function to measure the performance of an alignment\n", 1135 | "def s(x, y):\n", 1136 | " if x == '-' or y == '-':\n", 1137 | " # Payment for Gap\n", 1138 | " return -1\n", 1139 | " elif x == y:\n", 1140 | " # Payment for Match\n", 1141 | " return 1\n", 1142 | " # Payment for Mismatch\n", 1143 | " return -1" 1144 | ] 1145 | }, 1146 | { 1147 | "cell_type": "code", 1148 | "execution_count": 27, 1149 | "metadata": {}, 1150 | "outputs": [], 1151 | "source": [ 1152 | "# Needleman–Wunsch algorithm to calculate the sequence alignment\n", 1153 | "def calc_seq_align(a, b):\n", 1154 | " m = len(a)\n", 1155 | " n = len(b)\n", 1156 | " matrix = [[0] * (m + 1) for i in range(n + 1)]\n", 1157 | " \n", 1158 | " for i in range(n + 1):\n", 1159 | " matrix[i][0] = i * s('-', b[i - 1])\n", 1160 | " \n", 1161 | " for j in range(m + 1):\n", 1162 | " matrix[0][j] = j * s(a[j - 1], '-')\n", 1163 | " \n", 1164 | " for i in range(1, n + 1):\n", 1165 | " for j in range(1, m + 1):\n", 1166 | " matrix[i][j] = max(matrix[i - 1][j - 1] + s(a[j - 1], b[i - 1]),\n", 1167 | " matrix[i - 1][j] + s('-', b[i - 1]),\n", 1168 | " matrix[i][j - 1] + s(a[j - 1], '-'))\n", 1169 | " \n", 1170 | " return matrix" 1171 | ] 1172 | }, 1173 | { 1174 | "cell_type": "code", 1175 | "execution_count": 28, 1176 | "metadata": {}, 1177 | "outputs": [ 1178 | { 1179 | "data": { 1180 | "text/html": [ 1181 | "
\n", 1182 | "\n", 1195 | "\n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | " \n", 1239 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | " \n", 1255 | " \n", 1256 | " \n", 1257 | " \n", 1258 | " \n", 1259 | " \n", 1260 | " \n", 1261 | " \n", 1262 | " \n", 1263 | " \n", 1264 | " \n", 1265 | " \n", 1266 | " \n", 1267 | " \n", 1268 | " \n", 1269 | " \n", 1270 | " \n", 1271 | " \n", 1272 | " \n", 1273 | " \n", 1274 | " \n", 1275 | " \n", 1276 | " \n", 1277 | " \n", 1278 | " \n", 1279 | " \n", 1280 | " \n", 1281 | " \n", 1282 | " \n", 1283 | " \n", 1284 | " \n", 1285 | " \n", 1286 | " \n", 1287 | " \n", 1288 | " \n", 1289 | " \n", 1290 | " \n", 1291 | " \n", 1292 | " \n", 1293 | " \n", 1294 | " \n", 1295 | " \n", 1296 | " \n", 1297 | " \n", 1298 | " \n", 1299 | " \n", 1300 | " \n", 1301 | " \n", 1302 | " \n", 1303 | " \n", 1304 | " \n", 1305 | " \n", 1306 | " \n", 1307 | " \n", 1308 | "
-GCATGCUA
-0-1-2-3-4-5-6-7-8
G-110-1-2-3-4-5-6
A-20010-1-2-3-4
T-3-1-10210-1-2
T-4-2-2-1110-1-2
A-5-3-3-1000-10
C-6-4-2-2-1-110-1
A-7-5-3-1-2-2001
\n", 1309 | "
" 1310 | ], 1311 | "text/plain": [ 1312 | " - G C A T G C U A\n", 1313 | "- 0 -1 -2 -3 -4 -5 -6 -7 -8\n", 1314 | "G -1 1 0 -1 -2 -3 -4 -5 -6\n", 1315 | "A -2 0 0 1 0 -1 -2 -3 -4\n", 1316 | "T -3 -1 -1 0 2 1 0 -1 -2\n", 1317 | "T -4 -2 -2 -1 1 1 0 -1 -2\n", 1318 | "A -5 -3 -3 -1 0 0 0 -1 0\n", 1319 | "C -6 -4 -2 -2 -1 -1 1 0 -1\n", 1320 | "A -7 -5 -3 -1 -2 -2 0 0 1" 1321 | ] 1322 | }, 1323 | "execution_count": 28, 1324 | "metadata": {}, 1325 | "output_type": "execute_result" 1326 | } 1327 | ], 1328 | "source": [ 1329 | "# Example values\n", 1330 | "a = list('GCATGCUA')\n", 1331 | "b = list('GATTACA')\n", 1332 | "\n", 1333 | "# Run algorithm\n", 1334 | "dp_table = calc_seq_align(a, b)\n", 1335 | "pd.DataFrame(dp_table, index=['-'] + b, columns=['-'] + a)" 1336 | ] 1337 | }, 1338 | { 1339 | "cell_type": "markdown", 1340 | "metadata": {}, 1341 | "source": [ 1342 | "With time complexity of $ \\Theta(nm) $ and a space complexity of $ \\Theta((n + 1)(m + 1)) $." 1343 | ] 1344 | }, 1345 | { 1346 | "cell_type": "markdown", 1347 | "metadata": {}, 1348 | "source": [ 1349 | "#### Calculate the Sequence Alignment result\n", 1350 | "Greedy approach" 1351 | ] 1352 | }, 1353 | { 1354 | "cell_type": "code", 1355 | "execution_count": 29, 1356 | "metadata": {}, 1357 | "outputs": [], 1358 | "source": [ 1359 | "def get_seq_align(matrix, a, b, verbose=False):\n", 1360 | " alignmentA = \"\"\n", 1361 | " alignmentB = \"\"\n", 1362 | " j = len(a)\n", 1363 | " i = len(b)\n", 1364 | " \n", 1365 | " while i > -1 and j > -1:\n", 1366 | " if verbose:\n", 1367 | " print(i, j)\n", 1368 | " \n", 1369 | " if i > 0 and j > 0 and matrix[i][j] == matrix[i - 1][j - 1] + s(a[j - 1], b[i - 1]):\n", 1370 | " alignmentA = a[j - 1] + alignmentA\n", 1371 | " alignmentB = b[i - 1] + alignmentB\n", 1372 | " i = i - 1\n", 1373 | " j = j - 1\n", 1374 | " elif i > 0 and matrix[i][j] == matrix[i - 1][j] + s('-', b[i - 1]):\n", 1375 | " alignmentA = \"-\" + alignmentA\n", 1376 | " alignmentB = b[i - 1] + alignmentB\n", 1377 | " i = i - 1\n", 1378 | " elif j > 0 and matrix[i][j] == matrix[i][j - 1] + s(a[j - 1], '-'):\n", 1379 | " alignmentA = a[j - 1] + alignmentA\n", 1380 | " alignmentB = \"-\" + alignmentB\n", 1381 | " j = j - 1\n", 1382 | " else:\n", 1383 | " break\n", 1384 | " \n", 1385 | " return (alignmentA, alignmentB)" 1386 | ] 1387 | }, 1388 | { 1389 | "cell_type": "code", 1390 | "execution_count": 30, 1391 | "metadata": {}, 1392 | "outputs": [ 1393 | { 1394 | "data": { 1395 | "text/plain": [ 1396 | "('GCA-TGCUA', 'G-ATTAC-A')" 1397 | ] 1398 | }, 1399 | "execution_count": 30, 1400 | "metadata": {}, 1401 | "output_type": "execute_result" 1402 | } 1403 | ], 1404 | "source": [ 1405 | "# This function gets the Sequence Alignment\n", 1406 | "get_seq_align(dp_table, a, b)" 1407 | ] 1408 | }, 1409 | { 1410 | "cell_type": "markdown", 1411 | "metadata": {}, 1412 | "source": [ 1413 | "With time complexity of $ \\Theta(n + m) $ and a space complexity of $ \\Theta((n + 1)(m + 1)) $." 1414 | ] 1415 | }, 1416 | { 1417 | "cell_type": "markdown", 1418 | "metadata": {}, 1419 | "source": [ 1420 | "## 4.7. All-Pairs Shortest Path" 1421 | ] 1422 | }, 1423 | { 1424 | "cell_type": "markdown", 1425 | "metadata": {}, 1426 | "source": [ 1427 | "The all-pairs shortest path problem is the determination of the shortest graph distances between every pair of vertices in a given graph. The problem can be solved using n applications of Dijkstra's algorithm or all at once using the Floyd-Warshall algorithm [9]." 1428 | ] 1429 | }, 1430 | { 1431 | "cell_type": "markdown", 1432 | "metadata": {}, 1433 | "source": [ 1434 | "- The Dijkstra's algorithm has a time complexity of $ \\Theta(n^2) $ and a space complexity of $ \\Theta(n) $.\n", 1435 | "- The Floyd-Warshall algorithm has a time complexity of $ \\Theta(n^3) $ and a space complexity of $ \\Theta(n^2) $." 1436 | ] 1437 | }, 1438 | { 1439 | "cell_type": "markdown", 1440 | "metadata": {}, 1441 | "source": [ 1442 | "Please click here to see an example of both algorithms in the Graphs section. The second one is solved with dynamic programming." 1443 | ] 1444 | }, 1445 | { 1446 | "cell_type": "markdown", 1447 | "metadata": {}, 1448 | "source": [ 1449 | "## Reference" 1450 | ] 1451 | }, 1452 | { 1453 | "cell_type": "markdown", 1454 | "metadata": {}, 1455 | "source": [ 1456 | "[1] Wikipedia - Dynamic Programming. \n", 1457 | "[2] Wikipedia - Principle of Optimality. \n", 1458 | "[3] Wikipedia - Binomial coefficient. \n", 1459 | "[4] Wikipedia - Change-making problem. \n", 1460 | "[5] Wikipedia - Knapsack problem. \n", 1461 | "[6] Wikipedia - Longest common subsequence problem. \n", 1462 | "[7] Wikipedia - Sequence alignment problem. \n", 1463 | "[8] Wikipedia - Needleman-Wunsch algorithm. \n", 1464 | "[9] Wikipedia - All-Pairs Shortest Path. " 1465 | ] 1466 | }, 1467 | { 1468 | "cell_type": "markdown", 1469 | "metadata": {}, 1470 | "source": [ 1471 | "---\n", 1472 | "« Home" 1473 | ] 1474 | } 1475 | ], 1476 | "metadata": { 1477 | "kernelspec": { 1478 | "display_name": "Python 3", 1479 | "language": "python", 1480 | "name": "python3" 1481 | }, 1482 | "language_info": { 1483 | "codemirror_mode": { 1484 | "name": "ipython", 1485 | "version": 3 1486 | }, 1487 | "file_extension": ".py", 1488 | "mimetype": "text/x-python", 1489 | "name": "python", 1490 | "nbconvert_exporter": "python", 1491 | "pygments_lexer": "ipython3", 1492 | "version": "3.8.5" 1493 | }, 1494 | "varInspector": { 1495 | "cols": { 1496 | "lenName": 16, 1497 | "lenType": 16, 1498 | "lenVar": 40 1499 | }, 1500 | "kernels_config": { 1501 | "python": { 1502 | "delete_cmd_postfix": "", 1503 | "delete_cmd_prefix": "del ", 1504 | "library": "var_list.py", 1505 | "varRefreshCmd": "print(var_dic_list())" 1506 | }, 1507 | "r": { 1508 | "delete_cmd_postfix": ") ", 1509 | "delete_cmd_prefix": "rm(", 1510 | "library": "var_list.r", 1511 | "varRefreshCmd": "cat(var_dic_list()) " 1512 | } 1513 | }, 1514 | "types_to_exclude": [ 1515 | "module", 1516 | "function", 1517 | "builtin_function_or_method", 1518 | "instance", 1519 | "_Feature" 1520 | ], 1521 | "window_display": false 1522 | } 1523 | }, 1524 | "nbformat": 4, 1525 | "nbformat_minor": 2 1526 | } 1527 | -------------------------------------------------------------------------------- /similarity-functions/SimilarityFunctions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 6. Similarity Functions" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "- **Created by Andrés Segura Tinoco**\n", 15 | "- **Created on May 20, 2019**\n", 16 | "- **Updated on Mar 19, 2021**" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "In statistics and related fields, a **similarity measure** or similarity function is a real-valued function that quantifies the similarity between two objects. In short, a similarity function quantifies how much alike two data objects are [1]." 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## 6.1. Common similarity functions" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 1, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "# Load the Python libraries\n", 40 | "from math import *\n", 41 | "from decimal import Decimal\n", 42 | "from scipy import stats as ss\n", 43 | "import sklearn.metrics.pairwise as sm\n", 44 | "import math" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "\\begin{align}\n", 52 | " similarity(X, Y) = d(X, Y) = \\sqrt{\\sum_{i=1}^n (X_i - Y_i)^2} \\tag{1}\n", 53 | "\\end{align}" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 2, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "# (1) Euclidean distance function\n", 63 | "def euclidean_distance(x, y):\n", 64 | " return sqrt(sum(pow(a-b,2) for a, b in zip(x, y)))" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "\\begin{align}\n", 72 | " similarity(X, Y) = d(X, Y) = \\sum_{i=1}^n |X_i - Y_i| \\tag{2}\n", 73 | "\\end{align}" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 3, 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "# (2) manhattan distance function\n", 83 | "def manhattan_distance(x, y):\n", 84 | " return sum(abs(a-b) for a,b in zip(x,y))" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "\\begin{align}\n", 92 | " similarity(X, Y) = d(X, Y) = (\\sum_{i=1}^n |X_i - Y_i|^p)^\\frac{1}{p} \\tag{3}\n", 93 | "\\end{align}" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 4, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [ 102 | "# (3) Minkowski distance function\n", 103 | "def _nth_root(value, n_root):\n", 104 | " root_value = 1/float(n_root)\n", 105 | " return round(Decimal(value) ** Decimal(root_value),3)\n", 106 | "\n", 107 | "def minkowski_distance(x, y, p = 3):\n", 108 | " return float(_nth_root(sum(pow(abs(a-b), p) for a,b in zip(x, y)), p))" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "\\begin{align}\n", 116 | " similarity(X, Y) = cos(\\theta) = \\frac{\\vec{X}.\\vec{Y}}{\\|\\vec{X}\\|.\\|\\vec{Y}\\|} = \\frac{\\sum_{i=1}^n X_i.Y_i}{\\sqrt{\\sum_{i=1}^n X_i^2}.\\sqrt{\\sum_{i=1}^n Y_i^2}} \\tag{4}\n", 117 | "\\end{align}" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 5, 123 | "metadata": {}, 124 | "outputs": [], 125 | "source": [ 126 | "# (4) Cosine similarity function\n", 127 | "def _square_rooted(x):\n", 128 | " return round(sqrt(sum([a*a for a in x])),3)\n", 129 | "\n", 130 | "def cosine_similarity(x, y):\n", 131 | " numerator = sum(a*b for a,b in zip(x,y))\n", 132 | " denominator = _square_rooted(x) * _square_rooted(y)\n", 133 | " return round(numerator/float(denominator),3)" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "\\begin{align}\n", 141 | " similarity(X, Y) = \\frac{cov(X, Y)}{\\sigma_X . \\sigma_Y} = \\frac{\\sum_{i=1}^n (X_i - \\bar{X}).(Y_i - \\bar{Y})}{\\sqrt{\\sum_{i=1}^n (X_i - \\bar{X})^2 . (Y_i - \\bar{Y})^2}} \\tag{5}\n", 142 | "\\end{align}" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 6, 148 | "metadata": {}, 149 | "outputs": [], 150 | "source": [ 151 | "# (5) Pearson similarity function\n", 152 | "def _avg(x):\n", 153 | " assert len(x) > 0\n", 154 | " return float(sum(x)) / len(x)\n", 155 | "\n", 156 | "def pearson_similarity(x, y):\n", 157 | " assert len(x) == len(y)\n", 158 | " n = len(x)\n", 159 | " assert n > 0\n", 160 | " avg_x = _avg(x)\n", 161 | " avg_y = _avg(y)\n", 162 | " diffprod = 0\n", 163 | " xdiff2 = 0\n", 164 | " ydiff2 = 0\n", 165 | " for idx in range(n):\n", 166 | " xdiff = x[idx] - avg_x\n", 167 | " ydiff = y[idx] - avg_y\n", 168 | " diffprod += xdiff * ydiff\n", 169 | " xdiff2 += xdiff * xdiff\n", 170 | " ydiff2 += ydiff * ydiff\n", 171 | "\n", 172 | " return diffprod / math.sqrt(xdiff2 * ydiff2)" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "\\begin{align}\n", 180 | " similarity(X, Y) = J(X, Y) = \\frac{|X \\cap Y|}{|X \\cup Y|} = \\frac{|X \\cap Y|}{|X| + |Y| - |X \\cap Y|} \\tag{6}\n", 181 | "\\end{align}" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 7, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "# (6) Jaccard similarity function\n", 191 | "def jaccard_similarity(x, y):\n", 192 | " intersection_cardinality = len(set.intersection(*[set(x), set(y)]))\n", 193 | " union_cardinality = len(set.union(*[set(x), set(y)]))\n", 194 | " return intersection_cardinality / float(union_cardinality)" 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "## 6.2. Manual examples" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": 8, 207 | "metadata": {}, 208 | "outputs": [], 209 | "source": [ 210 | "# Vectors\n", 211 | "x = [-4.593481, -5.478033, 1.127111, 1.252885, -2.286953] # Messi\n", 212 | "y = [-4.080334, -3.406618, 4.334073, -0.485612, -2.817897] # CR\n", 213 | "z = [-4.048185, -5.546171, 0.505673, 0.616553, -1.730906] # Neymar" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "### Euclidean distance" 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": 9, 226 | "metadata": {}, 227 | "outputs": [ 228 | { 229 | "data": { 230 | "text/plain": [ 231 | "4.259455195846412" 232 | ] 233 | }, 234 | "execution_count": 9, 235 | "metadata": {}, 236 | "output_type": "execute_result" 237 | } 238 | ], 239 | "source": [ 240 | "euclidean_distance(x, y)" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": 10, 246 | "metadata": {}, 247 | "outputs": [ 248 | { 249 | "data": { 250 | "text/plain": [ 251 | "1.1841800466723797" 252 | ] 253 | }, 254 | "execution_count": 10, 255 | "metadata": {}, 256 | "output_type": "execute_result" 257 | } 258 | ], 259 | "source": [ 260 | "euclidean_distance(x, z)" 261 | ] 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": {}, 266 | "source": [ 267 | "### Manhattan distance" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": 11, 273 | "metadata": {}, 274 | "outputs": [ 275 | { 276 | "data": { 277 | "text/plain": [ 278 | "8.060965" 279 | ] 280 | }, 281 | "execution_count": 11, 282 | "metadata": {}, 283 | "output_type": "execute_result" 284 | } 285 | ], 286 | "source": [ 287 | "manhattan_distance(x, y)" 288 | ] 289 | }, 290 | { 291 | "cell_type": "code", 292 | "execution_count": 12, 293 | "metadata": {}, 294 | "outputs": [ 295 | { 296 | "data": { 297 | "text/plain": [ 298 | "2.4272509999999996" 299 | ] 300 | }, 301 | "execution_count": 12, 302 | "metadata": {}, 303 | "output_type": "execute_result" 304 | } 305 | ], 306 | "source": [ 307 | "manhattan_distance(x, z)" 308 | ] 309 | }, 310 | { 311 | "cell_type": "markdown", 312 | "metadata": {}, 313 | "source": [ 314 | "### Minkowski distance" 315 | ] 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": 13, 320 | "metadata": {}, 321 | "outputs": [ 322 | { 323 | "data": { 324 | "text/plain": [ 325 | "3.619" 326 | ] 327 | }, 328 | "execution_count": 13, 329 | "metadata": {}, 330 | "output_type": "execute_result" 331 | } 332 | ], 333 | "source": [ 334 | "minkowski_distance(x, y)" 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": 14, 340 | "metadata": {}, 341 | "outputs": [ 342 | { 343 | "data": { 344 | "text/plain": [ 345 | "0.941" 346 | ] 347 | }, 348 | "execution_count": 14, 349 | "metadata": {}, 350 | "output_type": "execute_result" 351 | } 352 | ], 353 | "source": [ 354 | "minkowski_distance(x, z)" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "metadata": {}, 360 | "source": [ 361 | "### Cosine similarity" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 15, 367 | "metadata": {}, 368 | "outputs": [ 369 | { 370 | "data": { 371 | "text/plain": [ 372 | "0.842" 373 | ] 374 | }, 375 | "execution_count": 15, 376 | "metadata": {}, 377 | "output_type": "execute_result" 378 | } 379 | ], 380 | "source": [ 381 | "cosine_similarity(x, y)" 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": 16, 387 | "metadata": {}, 388 | "outputs": [ 389 | { 390 | "data": { 391 | "text/plain": [ 392 | "0.99" 393 | ] 394 | }, 395 | "execution_count": 16, 396 | "metadata": {}, 397 | "output_type": "execute_result" 398 | } 399 | ], 400 | "source": [ 401 | "cosine_similarity(x, z)" 402 | ] 403 | }, 404 | { 405 | "cell_type": "markdown", 406 | "metadata": {}, 407 | "source": [ 408 | "### Pearson similarity" 409 | ] 410 | }, 411 | { 412 | "cell_type": "code", 413 | "execution_count": 17, 414 | "metadata": {}, 415 | "outputs": [ 416 | { 417 | "data": { 418 | "text/plain": [ 419 | "0.8214001476231276" 420 | ] 421 | }, 422 | "execution_count": 17, 423 | "metadata": {}, 424 | "output_type": "execute_result" 425 | } 426 | ], 427 | "source": [ 428 | "pearson_similarity(x, y)" 429 | ] 430 | }, 431 | { 432 | "cell_type": "code", 433 | "execution_count": 18, 434 | "metadata": {}, 435 | "outputs": [ 436 | { 437 | "data": { 438 | "text/plain": [ 439 | "0.9888645775446726" 440 | ] 441 | }, 442 | "execution_count": 18, 443 | "metadata": {}, 444 | "output_type": "execute_result" 445 | } 446 | ], 447 | "source": [ 448 | "pearson_similarity(x, z)" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "### Jaccard similarity" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 19, 461 | "metadata": {}, 462 | "outputs": [], 463 | "source": [ 464 | "a = [0, 1, 2, 3, 4, 5]\n", 465 | "b = [-1, 1, 2, 0, 3, 5]" 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "execution_count": 20, 471 | "metadata": {}, 472 | "outputs": [ 473 | { 474 | "data": { 475 | "text/plain": [ 476 | "0.7142857142857143" 477 | ] 478 | }, 479 | "execution_count": 20, 480 | "metadata": {}, 481 | "output_type": "execute_result" 482 | } 483 | ], 484 | "source": [ 485 | "jaccard_similarity(a, b)" 486 | ] 487 | }, 488 | { 489 | "cell_type": "markdown", 490 | "metadata": {}, 491 | "source": [ 492 | "## 6.3. Sklearn examples" 493 | ] 494 | }, 495 | { 496 | "cell_type": "code", 497 | "execution_count": 21, 498 | "metadata": {}, 499 | "outputs": [ 500 | { 501 | "data": { 502 | "text/plain": [ 503 | "4.259455195846413" 504 | ] 505 | }, 506 | "execution_count": 21, 507 | "metadata": {}, 508 | "output_type": "execute_result" 509 | } 510 | ], 511 | "source": [ 512 | "corr = sm.euclidean_distances([x], [y])\n", 513 | "float(corr[0])" 514 | ] 515 | }, 516 | { 517 | "cell_type": "code", 518 | "execution_count": 22, 519 | "metadata": {}, 520 | "outputs": [ 521 | { 522 | "data": { 523 | "text/plain": [ 524 | "8.060965" 525 | ] 526 | }, 527 | "execution_count": 22, 528 | "metadata": {}, 529 | "output_type": "execute_result" 530 | } 531 | ], 532 | "source": [ 533 | "corr = sm.manhattan_distances([x], [y])\n", 534 | "float(corr[0])" 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "execution_count": 23, 540 | "metadata": {}, 541 | "outputs": [ 542 | { 543 | "data": { 544 | "text/plain": [ 545 | "0.841904969009294" 546 | ] 547 | }, 548 | "execution_count": 23, 549 | "metadata": {}, 550 | "output_type": "execute_result" 551 | } 552 | ], 553 | "source": [ 554 | "corr = sm.cosine_similarity([x], [y])\n", 555 | "float(corr[0])" 556 | ] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "execution_count": 24, 561 | "metadata": {}, 562 | "outputs": [ 563 | { 564 | "data": { 565 | "text/plain": [ 566 | "0.8214001476231275" 567 | ] 568 | }, 569 | "execution_count": 24, 570 | "metadata": {}, 571 | "output_type": "execute_result" 572 | } 573 | ], 574 | "source": [ 575 | "corr, p_value = ss.pearsonr(x, y)\n", 576 | "corr" 577 | ] 578 | }, 579 | { 580 | "cell_type": "markdown", 581 | "metadata": {}, 582 | "source": [ 583 | "## Reference" 584 | ] 585 | }, 586 | { 587 | "cell_type": "markdown", 588 | "metadata": {}, 589 | "source": [ 590 | "[1] Wikipedia - Similarity measure. " 591 | ] 592 | }, 593 | { 594 | "cell_type": "markdown", 595 | "metadata": {}, 596 | "source": [ 597 | "---\n", 598 | "« Home" 599 | ] 600 | } 601 | ], 602 | "metadata": { 603 | "kernelspec": { 604 | "display_name": "Python 3", 605 | "language": "python", 606 | "name": "python3" 607 | }, 608 | "language_info": { 609 | "codemirror_mode": { 610 | "name": "ipython", 611 | "version": 3 612 | }, 613 | "file_extension": ".py", 614 | "mimetype": "text/x-python", 615 | "name": "python", 616 | "nbconvert_exporter": "python", 617 | "pygments_lexer": "ipython3", 618 | "version": "3.8.5" 619 | }, 620 | "varInspector": { 621 | "cols": { 622 | "lenName": 16, 623 | "lenType": 16, 624 | "lenVar": 40 625 | }, 626 | "kernels_config": { 627 | "python": { 628 | "delete_cmd_postfix": "", 629 | "delete_cmd_prefix": "del ", 630 | "library": "var_list.py", 631 | "varRefreshCmd": "print(var_dic_list())" 632 | }, 633 | "r": { 634 | "delete_cmd_postfix": ") ", 635 | "delete_cmd_prefix": "rm(", 636 | "library": "var_list.r", 637 | "varRefreshCmd": "cat(var_dic_list()) " 638 | } 639 | }, 640 | "types_to_exclude": [ 641 | "module", 642 | "function", 643 | "builtin_function_or_method", 644 | "instance", 645 | "_Feature" 646 | ], 647 | "window_display": false 648 | } 649 | }, 650 | "nbformat": 4, 651 | "nbformat_minor": 2 652 | } 653 | --------------------------------------------------------------------------------