├── 0_prerequisites ├── 00_overview.ipynb ├── 01_introduction_to_colab.ipynb ├── 02_conditionals.ipynb ├── 03_loops.ipynb ├── 04_functions.ipynb ├── 05_parameter_passing.ipynb ├── 06_strings.ipynb ├── 07_recursive_functions.ipynb └── 08_debugging.ipynb ├── 1_fundamental_data_structures ├── 10_overview.ipynb ├── 11_arrays.ipynb ├── 12_sets.ipynb ├── 13_classes.ipynb ├── 14_maps.ipynb └── 15_linked_lists.ipynb ├── 2_further_data_structures ├── 20_overview.ipynb ├── 21_doubly_linked_lists.ipynb ├── 22_stacks.ipynb ├── 23_queues.ipynb ├── 24_priority_queues.ipynb └── 25_choosing_the_appropriate_data_structure.ipynb ├── 3_hash_tables ├── 30_overview.ipynb ├── 31_hash_functions.ipynb ├── 32_hash_collisions.ipynb ├── 33_hash_collision_resolution.ipynb ├── 34_hash_table_implementation.ipynb └── 35_hash_tables_in_practice.ipynb ├── 4_introduction_to_algorithms ├── 40_overview.ipynb ├── 41_big_o.ipynb ├── 42_calculating_complexity.ipynb ├── 43_analyzing_complexity.ipynb ├── 44_big_o_math_deep_dive.ipynb ├── 45_calculating_summary_statistics.ipynb └── 46_numerical_algorithms.ipynb ├── 5_sorting_algorithms ├── 50_overview.ipynb ├── 51_bubble_sort.ipynb ├── 52_insertion_sort.ipynb ├── 53_selection_sort.ipynb ├── 54_merge_sort.ipynb ├── 55_quicksort.ipynb └── 56_comparing_sorting_algorithms.ipynb ├── 6_search_algorithms ├── 60_overview.ipynb ├── 61_linear_search.ipynb └── 62_binary_search.ipynb ├── 7_graphs ├── 70_overview.ipynb ├── 71_graph_properties.ipynb ├── 72_graph_representations.ipynb ├── 73_undirected_directed_weighted_graphs.ipynb ├── 74_graph_types.ipynb ├── 75_breadth_first_search.ipynb └── 76_depth_first_search.ipynb ├── 8_trees ├── 80_overview.ipynb ├── 81_tree_properties.ipynb ├── 82_tree_traversal_methods.ipynb ├── 83_binary_trees.ipynb ├── 84_binary_search_trees.ipynb └── 85_tries.ipynb ├── CONTRIBUTING.md ├── LICENSE └── README.md /0_prerequisites/00_overview.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "1scBnc9XYdcB" 7 | }, 8 | "source": [ 9 | "# Prerequisites" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "qTWh0qEqwF2I" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "ue2Thbv0Yevl" 25 | }, 26 | "source": [ 27 | "Before starting with the core *Data Structures and Algorithms* material, we recommend completing a few prerequisite lessons. If you already feel comfortable with any of these topics, you can skip the corresponding lessons." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "id": "oBCJWyKGRrAy" 34 | }, 35 | "source": [ 36 | "The following topics are by no means an exhaustive list of prerequisites, but feeling comfortable with these topics is necessary for the core material to come.\n", 37 | "\n", 38 | "- Functions\n", 39 | "- Strings\n", 40 | "- Parameter passing\n", 41 | "- Recursive functions" 42 | ] 43 | } 44 | ], 45 | "metadata": { 46 | "colab": { 47 | "collapsed_sections": [], 48 | "name": "prerequisites_overview.ipynb", 49 | "private_outputs": true, 50 | "provenance": [ ] 51 | }, 52 | "kernelspec": { 53 | "display_name": "Python 3", 54 | "name": "python3" 55 | } 56 | }, 57 | "nbformat": 4, 58 | "nbformat_minor": 0 59 | } 60 | -------------------------------------------------------------------------------- /0_prerequisites/02_conditionals.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "Pic8r3n4AWWu" 7 | }, 8 | "source": [ 9 | "# Conditionals" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "40klIjfB3g0H" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "8WWPuZ6JxE5L" 25 | }, 26 | "source": [ 27 | "A well-implemented program may have points where functionality needs to branch down multiple paths, depending on the input. Rather than build a completely separate program for each possible path, we can add ways to allow the code to branch down these paths via **conditionals**, or code constructs designed to change program flow based on some criteria." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "id": "24FzSdqzmP3q" 34 | }, 35 | "source": [ 36 | "### If statements" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "id": "0udwObR3YNs9" 43 | }, 44 | "source": [ 45 | "Just like in English, an **if statement** can be used to express a conditional, which will allow additional (or different) code to be executed in the case that the conditional's criteria is met. Generally, an `if` statement is a block of code with some criteria that executes if that criteria evaluates to `True`. Take this code, for example, which uses the [Fahrenheit](https://en.wikipedia.org/wiki/Fahrenheit) temperature scale:" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": null, 51 | "metadata": { 52 | "id": "n_uFmpISmfeO" 53 | }, 54 | "outputs": [], 55 | "source": [ 56 | "freezer_temperature = 40 # Fahrenheit\n", 57 | "if freezer_temperature \u003e 32:\n", 58 | " print('My ice cream is melting!')" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": { 64 | "id": "raGs97Lbm5gu" 65 | }, 66 | "source": [ 67 | "In this code snippet, we print `'My ice cream is melting!'` if the temperature of the freezer is higher than the freezing point of water (32 degrees Fahrenheit). Our conditional criteria is `freezer_temperature \u003e 32`, which evaluates to `True` since `freezer_temperature` is 40. If the freezer temperature is lower than 32, there's no problem; the ice cream is safe. Programs can have more than one `if` statement, as well; let's try that with this example, which uses the [Celsius](https://en.wikipedia.org/wiki/Celsius) temperature scale:" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": null, 73 | "metadata": { 74 | "id": "XzCjGPwsC4-I" 75 | }, 76 | "outputs": [], 77 | "source": [ 78 | "freezer_temperature = -300 # Celcius\n", 79 | "if freezer_temperature \u003e 0:\n", 80 | " print('My ice cream is melting!')\n", 81 | "if freezer_temperature \u003c -273.15: # Absolute zero, in Celsius.\n", 82 | " print('The freezer has achieved a temperature below absolute zero!')" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": { 88 | "id": "cQ_fFf0wDHO_" 89 | }, 90 | "source": [ 91 | "In this case, since the freezer temperature is not above 0, it bypasses the first `if` statement (which checks to see if the temperature is greater than 0). The next `if` statement checks to see if the current temperature is below absolute zero, and prints a warning statement if it is. `if` statements are not mutually exclusive, however; multiple consecutive `if` statements can be activated during program flow. This example uses the [Rankine](https://en.wikipedia.org/wiki/Rankine_scale) temperature scale to show how multiple consecutive `if` statements can activate if their criteria are both met, even if they reference the same variable:" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": null, 97 | "metadata": { 98 | "id": "l0l-ET14DkSd" 99 | }, 100 | "outputs": [], 101 | "source": [ 102 | "freezer_temperature = 474.67 # Rankine\n", 103 | "if freezer_temperature \u003c 491.67:\n", 104 | " print('My ice cream is cold, I think? Is almost 500 degrees cold?')\n", 105 | "if freezer_temperature \u003c 479.67:\n", 106 | " print('Yeah, I am pretty sure my ice cream is cold.')" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": { 112 | "id": "mo6StVKMD-l-" 113 | }, 114 | "source": [ 115 | "Here, the freezer temperature is below both thresholds set by the two `if` statements, and so both statements will print. `if` statements are often used to add extra functionality or flexibility to code or to skip over sections that should not execute in certain circumstances." 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": { 121 | "id": "CoZnoYhYqFYr" 122 | }, 123 | "source": [ 124 | "### Else statements" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": { 130 | "id": "b7uCyqP5uQWp" 131 | }, 132 | "source": [ 133 | "An `if` statement alone isn't necessarily enough to create two completely mutually exclusive segments of code. Say we were building a program to let students know there was a snow day (an event in which school is cancelled due to snow making most roads impassable). If there are ten inches of snow or more outside, then we cancel school. This code may seem like it works, but try running it to see the result:" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": null, 139 | "metadata": { 140 | "id": "ooZGcB9nFk0E" 141 | }, 142 | "outputs": [], 143 | "source": [ 144 | "inches_of_snow = 5\n", 145 | "if inches_of_snow \u003e= 10:\n", 146 | " print('School is cancelled! Snow day!')\n", 147 | "print('School is still on!')" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": { 153 | "id": "mDDB_bmx1t3e" 154 | }, 155 | "source": [ 156 | "When looking at that code, there's some temptation to say that since it prints the correct statement when `inches_of_snow` is greater than or equal to 10, that it works correctly. However, try changing `inches_of_snow` to 15 and re-running the code." 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": null, 162 | "metadata": { 163 | "id": "Tk0adUh6GEK0" 164 | }, 165 | "outputs": [], 166 | "source": [ 167 | "inches_of_snow = 15\n", 168 | "if inches_of_snow \u003e= 10:\n", 169 | " print('School is cancelled! Snow day!')\n", 170 | "print('School is still on!')" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": { 176 | "id": "3bP02GIxGFYW" 177 | }, 178 | "source": [ 179 | "Our program now prints *both* statements, rather than just the correct snow day statement. This is where an `else` statement will come in handy; it allows us to move the second `print` statement into its own segment of code, so that it only runs if `inches_of_snow` is less than 10 (so the first `if` statement's condition isn't met). Generally, an **else statement** is a block of code that is paired with an `if` statement and executes if the `if` statement's criteria evaluates to `False`." 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": null, 185 | "metadata": { 186 | "id": "2-DwHjHqGJMF" 187 | }, 188 | "outputs": [], 189 | "source": [ 190 | "inches_of_snow = 15\n", 191 | "if inches_of_snow \u003e= 10:\n", 192 | " print('School is cancelled! Snow day!')\n", 193 | "else:\n", 194 | " print('School is still on!')" 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": { 200 | "id": "bf80tccDHAHu" 201 | }, 202 | "source": [ 203 | "Changing `inches_of_snow` to 5 or 15 means that in either case, only one statement will print. That's the expected functionality. `else` statements must always be paired with a corresponding `if` statement, otherwise it's considered a syntax error." 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": { 209 | "id": "yJPGGvQy6y_c" 210 | }, 211 | "source": [ 212 | "### Elif statements" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": { 218 | "id": "kzFXD49a68P4" 219 | }, 220 | "source": [ 221 | "The final type of conditional statement that Python supports is an `elif` statement. Typically, this is used for conditionals that have more than two cases, such as this one:\n" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": null, 227 | "metadata": { 228 | "id": "vpETZEn-M_3p" 229 | }, 230 | "outputs": [], 231 | "source": [ 232 | "inches_of_snow = 5\n", 233 | "if inches_of_snow \u003e= 10:\n", 234 | " print('School is cancelled! Snow day!')\n", 235 | "elif inches_of_snow \u003e= 5:\n", 236 | " print('School is delayed two hours to allow the snow time to melt.')\n", 237 | "else:\n", 238 | " print('School is still on!')" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": { 244 | "id": "Y1RkJlAiNJb2" 245 | }, 246 | "source": [ 247 | "This code allows for three cases: `inches_of_snow` is 10 or more, more than 5 but less than 10, or less than 5. This produces three different outcomes" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": { 253 | "id": "GH9OWx3wjhTE" 254 | }, 255 | "source": [ 256 | "## Question 1" 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "metadata": { 262 | "id": "YjLrLVfvikwZ" 263 | }, 264 | "source": [ 265 | "Consider the following code." 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": { 271 | "id": "IpMTvcMoiWud" 272 | }, 273 | "source": [ 274 | "```python\n", 275 | "value = 5\n", 276 | "if value \u003e 2:\n", 277 | " value += 4\n", 278 | "elif value \u003c 0:\n", 279 | " value = -1 * value\n", 280 | "else:\n", 281 | " value = 3\n", 282 | "if value == 9:\n", 283 | " value -= 9\n", 284 | "else:\n", 285 | " value = 3\n", 286 | "value += 5\n", 287 | "```" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": { 293 | "id": "ootP2MVBis6I" 294 | }, 295 | "source": [ 296 | "What will `value` equal after the code's completion?" 297 | ] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "metadata": { 302 | "id": "5NdGphqGi6vz" 303 | }, 304 | "source": [ 305 | "**a)** 9\n", 306 | "\n", 307 | "**b)** 8\n", 308 | "\n", 309 | "**c)** 6\n", 310 | "\n", 311 | "**d)** 5\n", 312 | "\n", 313 | "**e)** 0" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": { 319 | "id": "FkNJ2Wghi6v0" 320 | }, 321 | "source": [ 322 | "### Solution" 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": { 328 | "id": "Kn3LHu90i6v0" 329 | }, 330 | "source": [ 331 | "The correct answer is **d)**.\n", 332 | "\n", 333 | "**a)** After the first `if` statement, `value` is indeed equal to 9. However, after the second `if` statement it will be set to 0 and then incremented to 5.\n", 334 | "\n", 335 | "**b)** This is almost correct. After the final `if` statement, if `value` had not been 9, it would have been set to 3 and incremented to 8. However, since `value` was equal to 9, it was set to 0 and then incremented to 5.\n", 336 | "\n", 337 | "**c)** No matter what, `value` will increase by 5 at the end of this block of code. To end up as 6, `value` would have to be 1 after the second `if` statement, and it can only be 0 or 3.\n", 338 | "\n", 339 | "**e)** This would be correct, but the final line of the program is `value += 5`, which will increment `value` (currently 0) to 5." 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": { 345 | "id": "2FUqsgnyqgGU" 346 | }, 347 | "source": [ 348 | "## Question 2" 349 | ] 350 | }, 351 | { 352 | "cell_type": "markdown", 353 | "metadata": { 354 | "id": "Tw9YdiHsMorw" 355 | }, 356 | "source": [ 357 | "Consider the following code." 358 | ] 359 | }, 360 | { 361 | "cell_type": "markdown", 362 | "metadata": { 363 | "id": "m3-PUWTyMor8" 364 | }, 365 | "source": [ 366 | "```python\n", 367 | "number = 1\n", 368 | "if number \u003e 1:\n", 369 | " number = 1\n", 370 | " if number == 1:\n", 371 | " number = 5\n", 372 | "if number == 1:\n", 373 | " number += 4\n", 374 | "elif number == 5:\n", 375 | " number -= 4\n", 376 | "else:\n", 377 | " number = 2\n", 378 | "if number == 5:\n", 379 | " number *= 2\n", 380 | " if number \u003e 5:\n", 381 | " number -= 7\n", 382 | " else:\n", 383 | " number += 2\n", 384 | "else:\n", 385 | " number = 6\n", 386 | "```" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": { 392 | "id": "lhlBkYlGMor8" 393 | }, 394 | "source": [ 395 | "What is the value of `number` after this code completes?" 396 | ] 397 | }, 398 | { 399 | "cell_type": "markdown", 400 | "metadata": { 401 | "id": "cg6KiBDuMor8" 402 | }, 403 | "source": [ 404 | "**a)** 1\n", 405 | "\n", 406 | "**b)** 3\n", 407 | "\n", 408 | "**c)** 6\n", 409 | "\n", 410 | "**d)** 10" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": { 416 | "id": "2qIy2BsrMor8" 417 | }, 418 | "source": [ 419 | "### Solution" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": { 425 | "id": "xUxf2PY6Mor8" 426 | }, 427 | "source": [ 428 | "The correct answer is **b)**.\n", 429 | "\n", 430 | "**a)** `number` starts at 1, but subsequent `if` and `else` statements will change that.\n", 431 | "\n", 432 | "**c)** This is a likely outcome, since if `number` isn't 5 by the end of the code's execution, it will be set to 6. Since `number` is equal to 5 at the start of the final `if` statement, though, this `else` case is not activated.\n", 433 | "\n", 434 | "**d)** `number` is 10 towards the end of the code's execution, but the line `if number \u003e 5:` gets activated, causing `number` to decrease by 7." 435 | ] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "metadata": { 440 | "id": "IDyirdfwb6Yd" 441 | }, 442 | "source": [ 443 | "## Question 3" 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": { 449 | "id": "_aP2c14kb6Yp" 450 | }, 451 | "source": [ 452 | "On the website you're designing, you want to include a special message to the millionth customer beyond your normal 'Thank you for shopping!' message. Modify your `thank_customer` function to display a special message: `'Congratulations! You are our one millionth customer!'`. \n", 453 | "\n", 454 | "If you're not familiar with function syntax, don't worry! You'll add code above the existing code, but below `def thank_customer(customer_number)`." 455 | ] 456 | }, 457 | { 458 | "cell_type": "code", 459 | "execution_count": null, 460 | "metadata": { 461 | "id": "SMyg5XEab6Yq" 462 | }, 463 | "outputs": [], 464 | "source": [ 465 | "def thank_customer(customer_number):\n", 466 | " # Fill in your code here.\n", 467 | " # customer_number will be available for you to use in your code.\n", 468 | " # Don't forget to indent by two spaces! Line your code up with these comments.\n", 469 | " print('Thank you for shopping!')\n", 470 | " " 471 | ] 472 | }, 473 | { 474 | "cell_type": "markdown", 475 | "metadata": { 476 | "id": "kURD_M15b6Yq" 477 | }, 478 | "source": [ 479 | "### Hint" 480 | ] 481 | }, 482 | { 483 | "cell_type": "markdown", 484 | "metadata": { 485 | "id": "L4aqmHAXb6Yq" 486 | }, 487 | "source": [ 488 | "Keep in mind that you still want to thank the customer for shopping every time, so you won't need to use an `else` statement, here." 489 | ] 490 | }, 491 | { 492 | "cell_type": "markdown", 493 | "metadata": { 494 | "id": "mwCXqZaAb6Yr" 495 | }, 496 | "source": [ 497 | "### Unit Tests\n", 498 | "\n", 499 | "Run the following cell to check your answer against some unit tests." 500 | ] 501 | }, 502 | { 503 | "cell_type": "code", 504 | "execution_count": null, 505 | "metadata": { 506 | "id": "Ks9E8gUrb6Yr" 507 | }, 508 | "outputs": [], 509 | "source": [ 510 | "thank_customer(1000000)\n", 511 | "# Should print:\n", 512 | "# Congratulations! You are our one millionth customer!\n", 513 | "# Thank you for shopping!\n", 514 | "\n", 515 | "thank_customer(10)\n", 516 | "# Should print:\n", 517 | "# Thank you for shopping!\n", 518 | "\n", 519 | "thank_customer(999999)\n", 520 | "# Should print:\n", 521 | "# Thank you for shopping!" 522 | ] 523 | }, 524 | { 525 | "cell_type": "markdown", 526 | "metadata": { 527 | "id": "GtzPbNGXb6Yr" 528 | }, 529 | "source": [ 530 | "### Solution" 531 | ] 532 | }, 533 | { 534 | "cell_type": "markdown", 535 | "metadata": { 536 | "id": "o2j_sOSDb6Yr" 537 | }, 538 | "source": [ 539 | "We chose a generic message, but you can write whatever you want. Just make sure that the `if` statement doesn't include the 'Thank you for shopping!'." 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": null, 545 | "metadata": { 546 | "id": "SdoYVxJfb6Ys" 547 | }, 548 | "outputs": [], 549 | "source": [ 550 | "def thank_customer(customer_number):\n", 551 | " if customer_number == 1000000:\n", 552 | " print('Congratulations! You are our one millionth customer!')\n", 553 | " print('Thank you for shopping!')" 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": { 559 | "id": "oP1jK3mkdgK-" 560 | }, 561 | "source": [ 562 | "## Question 4" 563 | ] 564 | }, 565 | { 566 | "cell_type": "markdown", 567 | "metadata": { 568 | "id": "9YqAudhuSxJg" 569 | }, 570 | "source": [ 571 | "In Python, the **modulo** operator allows you to calculate the remainder of integer division of two numbers. For instance, `5 % 3` is equal to 2, the remainder of `5 // 3` (where `//` is the integer division operator, equal to the greatest integer less than or equal to `5/3`). We can use that to create a way to determine if a number is even or odd, as any integer % 2 is either 0 (no remainder, so it is even) or 1 (it is odd). Use this to fill in an `odd_or_even` function that, given a number, will print `'even'` or `'odd'`. \n", 572 | "\n", 573 | "If you're not familiar with function syntax, don't worry! Just fill in the code below `def odd_or_even(number)` and assume that `number` is the number to be checked." 574 | ] 575 | }, 576 | { 577 | "cell_type": "code", 578 | "execution_count": null, 579 | "metadata": { 580 | "id": "fpem4YpVCm4i" 581 | }, 582 | "outputs": [], 583 | "source": [ 584 | "def odd_or_even(number):\n", 585 | " # Fill in your code here.\n", 586 | " # number will be available for you to use in your code.\n", 587 | " # Don't forget to indent by two spaces! Line your code up with these comments.\n", 588 | " " 589 | ] 590 | }, 591 | { 592 | "cell_type": "markdown", 593 | "metadata": { 594 | "id": "nPptDV2feEs2" 595 | }, 596 | "source": [ 597 | "### Hint" 598 | ] 599 | }, 600 | { 601 | "cell_type": "markdown", 602 | "metadata": { 603 | "id": "QyjfzTDJeDOT" 604 | }, 605 | "source": [ 606 | "It can be challenging to use the modulo operator if you're not familiar with it; here's an example of how to use it in practice." 607 | ] 608 | }, 609 | { 610 | "cell_type": "code", 611 | "execution_count": null, 612 | "metadata": { 613 | "id": "tjj-VKkQUvo8" 614 | }, 615 | "outputs": [], 616 | "source": [ 617 | "number = 55\n", 618 | "if number % 10 == 5:\n", 619 | " print('The remainder of this division is 5!')" 620 | ] 621 | }, 622 | { 623 | "cell_type": "markdown", 624 | "metadata": { 625 | "id": "KBNIQ3UtTsBQ" 626 | }, 627 | "source": [ 628 | "### Unit Tests\n", 629 | "\n", 630 | "Run the following cell to check your answer against some unit tests." 631 | ] 632 | }, 633 | { 634 | "cell_type": "code", 635 | "execution_count": null, 636 | "metadata": { 637 | "id": "z1KV2VTVTsBS" 638 | }, 639 | "outputs": [], 640 | "source": [ 641 | "odd_or_even(5)\n", 642 | "# Should print: odd\n", 643 | "\n", 644 | "odd_or_even(10)\n", 645 | "# Should print: even\n", 646 | "\n", 647 | "odd_or_even(0)\n", 648 | "# Should print: even" 649 | ] 650 | }, 651 | { 652 | "cell_type": "markdown", 653 | "metadata": { 654 | "id": "opB3JcMWeZQy" 655 | }, 656 | "source": [ 657 | "### Solution" 658 | ] 659 | }, 660 | { 661 | "cell_type": "markdown", 662 | "metadata": { 663 | "id": "V7VW50KYUi2j" 664 | }, 665 | "source": [ 666 | "If you want to be explicit, you can write `elif number % 2 == 1` instead of using an `else` statement, but given that there are only two cases to check for, an `else` statement is perfectly fine." 667 | ] 668 | }, 669 | { 670 | "cell_type": "code", 671 | "execution_count": null, 672 | "metadata": { 673 | "id": "ef2LL4VST_9Z" 674 | }, 675 | "outputs": [], 676 | "source": [ 677 | "def odd_or_even(number):\n", 678 | " if number % 2 == 0:\n", 679 | " print('even')\n", 680 | " else:\n", 681 | " print('odd')" 682 | ] 683 | }, 684 | { 685 | "cell_type": "markdown", 686 | "metadata": { 687 | "id": "Vj9ye95gdg1a" 688 | }, 689 | "source": [ 690 | "## Question 5" 691 | ] 692 | }, 693 | { 694 | "cell_type": "markdown", 695 | "metadata": { 696 | "id": "hfMyidfBdWYl" 697 | }, 698 | "source": [ 699 | "You're working on a piece of code that dispenses tickets at an arcade. For a given machine (like [skee-ball](https://en.wikipedia.org/wiki/Skee-Ball)), you'll dispense tickets based on the player's score, as follows:\n", 700 | "\n", 701 | "\n", 702 | "* 500 points or more: 10 tickets\n", 703 | "* 400 - 499 points: 8 tickets\n", 704 | "* 350 - 399 points: 6 tickets\n", 705 | "* 300 - 349 points: 5 tickets\n", 706 | "* 200 - 299 points: 4 tickets\n", 707 | "* 100 - 199 points: 3 tickets\n", 708 | "* Less than 100 points: 2 tickets\n", 709 | "\n", 710 | "Given these restrictions, fill in the `skee_ball_tickets` function. Again, if you're not familiar with function syntax, that's fine! You can assume that `score` is going to be a positive integer." 711 | ] 712 | }, 713 | { 714 | "cell_type": "code", 715 | "execution_count": null, 716 | "metadata": { 717 | "id": "XLoDnatBlrgi" 718 | }, 719 | "outputs": [], 720 | "source": [ 721 | "def skee_ball_tickets(score):\n", 722 | " # Fill in your code here.\n", 723 | " # score will be available for you to use in your code.\n", 724 | " # Don't forget to indent by two spaces! Line your code up with these comments.\n" 725 | ] 726 | }, 727 | { 728 | "cell_type": "markdown", 729 | "metadata": { 730 | "id": "fsgodUMDaEpQ" 731 | }, 732 | "source": [ 733 | "### Hint" 734 | ] 735 | }, 736 | { 737 | "cell_type": "markdown", 738 | "metadata": { 739 | "id": "ae9NrhCSaEpc" 740 | }, 741 | "source": [ 742 | "This is the perfect place to use `elif`! Start from the highest value and work your way down; you don't need to use the higher end of the boundary, because the previous `if` or `elif` will cover that, for you." 743 | ] 744 | }, 745 | { 746 | "cell_type": "code", 747 | "execution_count": null, 748 | "metadata": { 749 | "id": "JaKSWkPnaEpc" 750 | }, 751 | "outputs": [], 752 | "source": [ 753 | "number = 5\n", 754 | "if number \u003e= 10:\n", 755 | " print('10 or more!')\n", 756 | "elif number \u003e 7:\n", 757 | " print('Almost 10!')\n", 758 | "elif number \u003e 4:\n", 759 | " print('Not quite 10!')" 760 | ] 761 | }, 762 | { 763 | "cell_type": "markdown", 764 | "metadata": { 765 | "id": "SPuxUKU5aaIn" 766 | }, 767 | "source": [ 768 | "See how only one statement is printed?" 769 | ] 770 | }, 771 | { 772 | "cell_type": "markdown", 773 | "metadata": { 774 | "id": "tzmXQAk9aEpc" 775 | }, 776 | "source": [ 777 | "### Unit Tests\n", 778 | "\n", 779 | "Run the following cell to check your answer against some unit tests." 780 | ] 781 | }, 782 | { 783 | "cell_type": "code", 784 | "execution_count": null, 785 | "metadata": { 786 | "id": "vJom1KjyaEpd" 787 | }, 788 | "outputs": [], 789 | "source": [ 790 | "skee_ball_tickets(1000)\n", 791 | "# Should print: 10\n", 792 | "\n", 793 | "skee_ball_tickets(10)\n", 794 | "# Should print: 2\n", 795 | "\n", 796 | "skee_ball_tickets(499)\n", 797 | "# Should print: 8" 798 | ] 799 | }, 800 | { 801 | "cell_type": "markdown", 802 | "metadata": { 803 | "id": "WP3PGajGdwhn" 804 | }, 805 | "source": [ 806 | "### Solution" 807 | ] 808 | }, 809 | { 810 | "cell_type": "markdown", 811 | "metadata": { 812 | "id": "GqeKLocLA9y7" 813 | }, 814 | "source": [ 815 | "You'll need to use one large block of `if`, `elif` and `else` statements, but it should work. If you wanted to be robust against errors, you could use `elif score \u003e= 0` instead of `else` to protect your code against negative scores, but it's not required." 816 | ] 817 | }, 818 | { 819 | "cell_type": "code", 820 | "execution_count": null, 821 | "metadata": { 822 | "id": "6V6L8ynXaxj-" 823 | }, 824 | "outputs": [], 825 | "source": [ 826 | "def skee_ball_tickets(score):\n", 827 | " if score \u003e= 500:\n", 828 | " print(10)\n", 829 | " elif score \u003e= 400:\n", 830 | " print(8)\n", 831 | " elif score \u003e= 350:\n", 832 | " print(6)\n", 833 | " elif score \u003e= 300:\n", 834 | " print(5)\n", 835 | " elif score \u003e= 200:\n", 836 | " print(4)\n", 837 | " elif score \u003e= 100:\n", 838 | " print(3)\n", 839 | " else:\n", 840 | " print(2)" 841 | ] 842 | } 843 | ], 844 | "metadata": { 845 | "colab": { 846 | "collapsed_sections": [ 847 | "FkNJ2Wghi6v0", 848 | "2qIy2BsrMor8", 849 | "kURD_M15b6Yq", 850 | "mwCXqZaAb6Yr", 851 | "GtzPbNGXb6Yr", 852 | "nPptDV2feEs2", 853 | "KBNIQ3UtTsBQ", 854 | "opB3JcMWeZQy", 855 | "fsgodUMDaEpQ", 856 | "tzmXQAk9aEpc", 857 | "WP3PGajGdwhn" 858 | ], 859 | "name": "Conditionals", 860 | "provenance": [ ] 861 | }, 862 | "kernelspec": { 863 | "display_name": "Python 3", 864 | "name": "python3" 865 | } 866 | }, 867 | "nbformat": 4, 868 | "nbformat_minor": 0 869 | } 870 | -------------------------------------------------------------------------------- /1_fundamental_data_structures/10_overview.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "SG7CsT_5oPoh" 7 | }, 8 | "source": [ 9 | "# Fundamental Data Structures" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "MJPswJzqvawo" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "gNTHr_HpoTke" 25 | }, 26 | "source": [ 27 | "\u003e A **data structure** is any object that stores data in an organized way.\n", 28 | "\n", 29 | "Different data structures have different organizations and representations. This unit introduces you to the most fundamental data structures:\n", 30 | "\n", 31 | "- Arrays (also known as lists)\n", 32 | "- Sets\n", 33 | "- Classes (also known as structs)\n", 34 | "- Maps (also known as dictionaries)\n", 35 | "- Linked lists" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": { 41 | "id": "RM3r7YoBR-LU" 42 | }, 43 | "source": [ 44 | "You have probably encountered data structures before, either in computer science or in life. For example:\n", 45 | "\n", 46 | "- A phone book could be stored in a **map**\n", 47 | "- A line at Disneyland could be stored in a **linked list**\n", 48 | "- The items in your grocery bag could be stored in an **array**" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": { 54 | "id": "NB_slb29v-jW" 55 | }, 56 | "source": [ 57 | "Data are any piece of observed and recorded information. Examples of data include:\n", 58 | "\n", 59 | "- The temperature on a given day\n", 60 | "- A person's DNA sequence\n", 61 | "- The number of people attending a football match\n", 62 | "- Whether or not you use the word \"data\" on a given day" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": { 68 | "id": "4xKsb_Y8SOxE" 69 | }, 70 | "source": [ 71 | "### Data types" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": { 77 | "id": "ecrzd9AFSJnV" 78 | }, 79 | "source": [ 80 | "\u003e **Data type** is an attribute of data that tells the computer the values the data can take and the operations that can be performed on the data.\n", 81 | "\n", 82 | "Most languages use the following core data types:\n", 83 | "\n", 84 | "- The **integer** (`int`) type represents whole numbers, for example `1` or `-100`.\n", 85 | "- The **float** (`float`) type represents any real number, for example `-123.456` or `3.14159`.\n", 86 | "- The **string** (`str`) type represents words and sentence, for example `'cat'` or `'101 Dalmations!'`.\n", 87 | "- The **boolean** (`bool`) type can only be `True` or `False`." 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": { 93 | "id": "AE27St44SM2m" 94 | }, 95 | "source": [ 96 | "The most appropriate data type to use depends on the data you are working with. For example, you can divide two integers or two floats, but not two strings or two booleans.\n", 97 | "\n", 98 | "- The weather on a given day should use a **float** (assuming the weather is recorded to decimal places).\n", 99 | "- A person's DNA sequence should use a **string** since it is a series of letters, e.g. `'CATGAGTC'`.\n", 100 | "- The number of people attending a football match should use an **integer** since this is always a whole number.\n", 101 | "- Whether or not you use the word \"data\" on a given day should use a **boolean** since it is either true or false." 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": { 107 | "id": "lEZ959ESwWnI" 108 | }, 109 | "source": [ 110 | "### The difference between data structures and data types" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": { 116 | "id": "kNFO1REpJ45t" 117 | }, 118 | "source": [ 119 | "Data structures are different ways of *organizing* data. Data types are different ways of *representing* data. Data structures store data, and that data is encoded using a data type." 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": { 125 | "id": "jUqrg9XDwlTX" 126 | }, 127 | "source": [ 128 | "For example, you could have:\n", 129 | "\n", 130 | "- An array of integers\n", 131 | "- A linked list of strings\n", 132 | "- A set of floats\n", 133 | "\n", 134 | "Data structures can also store other data structures! You could have:\n", 135 | "\n", 136 | "- An array of maps storing strings\n", 137 | "- A class with linked lists of integers\n", 138 | "- A linked list of arrays of sets containing floats" 139 | ] 140 | } 141 | ], 142 | "metadata": { 143 | "colab": { 144 | "collapsed_sections": [], 145 | "name": "fundamental_data_sructures_overview.ipynb", 146 | "private_outputs": true, 147 | "provenance": [ ] 148 | }, 149 | "kernelspec": { 150 | "display_name": "Python 3", 151 | "name": "python3" 152 | } 153 | }, 154 | "nbformat": 4, 155 | "nbformat_minor": 0 156 | } 157 | -------------------------------------------------------------------------------- /2_further_data_structures/20_overview.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "kWKxmO4ka-lL" 7 | }, 8 | "source": [ 9 | "# Further Data Structures" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "heLEQC1EwzKe" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "BkagupgObKBr" 25 | }, 26 | "source": [ 27 | "In the *Fundamental Data Structures* unit, we introduced the following data structures:\n", 28 | "\n", 29 | "- Arrays (also known as lists)\n", 30 | "- Sets\n", 31 | "- Classes (also known as structs)\n", 32 | "- Maps (also known as dictionaries)\n", 33 | "- Linked lists" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": { 39 | "id": "TcVGgM_hw4bk" 40 | }, 41 | "source": [ 42 | "While these five data structures are the most low-level, versatile, and ubiquitous, in some contexts a more specific data structure is more applicable. For example, suppose you needed a linked list that allows you to iterate forward *and* backward. This is possible with an extension of a linked list, called a *doubly linked list*. And suppose you wanted to access elements of a data structure not just based on order but also based on a *priority*. This is possible with a *priority queue*, which is implemented using either an array or a linked list.\n", 43 | "\n", 44 | "The following lessons cover the following data structures, which are all based on or implemented with the fundamental data structures already covered:\n", 45 | "\n", 46 | "- Doubly linked lists (based on linked lists)\n", 47 | "- Stacks (implemented using arrays or linked lists)\n", 48 | "- Queues (implemented using arrays or linked lists)\n", 49 | "- Priority queues (based on queues)" 50 | ] 51 | } 52 | ], 53 | "metadata": { 54 | "colab": { 55 | "collapsed_sections": [], 56 | "name": "further_data_structures_overview.ipynb", 57 | "private_outputs": true, 58 | "provenance": [ ] 59 | }, 60 | "kernelspec": { 61 | "display_name": "Python 3", 62 | "name": "python3" 63 | } 64 | }, 65 | "nbformat": 4, 66 | "nbformat_minor": 0 67 | } 68 | -------------------------------------------------------------------------------- /3_hash_tables/30_overview.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "CjP9pevm98KL" 7 | }, 8 | "source": [ 9 | "# Hash Tables" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "rNVaCvIExERY" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "0oVPWEKX-6KT" 25 | }, 26 | "source": [ 27 | "In a previous lesson, we introduced *maps* (known in Python as *dictionaries*). A map is a data structure that stores unique *keys* that map to *values*.\n", 28 | "\n", 29 | "In practice, almost all languages use a *hash table* as the backend implementation for a map. Hash tables, like maps, are used to store data such that a given value can be found by looking up a key." 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": { 35 | "id": "JLq2aPU0xHsJ" 36 | }, 37 | "source": [ 38 | "\u003e A **hash table** is a data structure that maps unique *keys* to *values*. Values are stored in *hash buckets*, computed using a *hash function*.\n", 39 | "\n", 40 | "The implementation of a hash table involves the following concepts, which are introduced in the following lessons:\n", 41 | "\n", 42 | "- Hash functions\n", 43 | "- Hash collision resolution\n", 44 | "- Resizing" 45 | ] 46 | } 47 | ], 48 | "metadata": { 49 | "colab": { 50 | "collapsed_sections": [], 51 | "name": "hash_tables_overview.ipynb", 52 | "private_outputs": true, 53 | "provenance": [ ] 54 | }, 55 | "kernelspec": { 56 | "display_name": "Python 3", 57 | "name": "python3" 58 | } 59 | }, 60 | "nbformat": 4, 61 | "nbformat_minor": 0 62 | } 63 | -------------------------------------------------------------------------------- /3_hash_tables/32_hash_collisions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "NYEjBqiKwurw" 7 | }, 8 | "source": [ 9 | "# Hash Collisions" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "jPDVz0YszDK6" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "E00OTzwskygJ" 25 | }, 26 | "source": [ 27 | "The reason hash functions are so useful is that we can store complex data types and values as integer-valued hash buckets. However, issues arise if a hash function maps two distinct values to the same bucket; in that case, we can't distinguish between these two values." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "id": "WYfGXfGS0sfD" 34 | }, 35 | "source": [ 36 | "\u003e A **hash collision** is when a hash function maps two different values to the same bucket.\n", 37 | "\n", 38 | "For a hash function $H$, a collision exists for two values $a$ and $b$ if $a \\neq b$ but $H(a) = H(b)$.\n", 39 | "\n", 40 | "When there are more input values than buckets, the hash function is guaranteed to have collisions. Even when there are fewer input values than buckets, it is still possible to have collisions." 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": { 46 | "id": "4dkWITS3LWUN" 47 | }, 48 | "source": [ 49 | "## Question 1" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": { 55 | "id": "8WldYt9uLYNy" 56 | }, 57 | "source": [ 58 | "Which of the following statements about a hash collision are true? There may be more than one correct response.\n" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": { 64 | "id": "qf82vSAZw6PC" 65 | }, 66 | "source": [ 67 | "**a)** A hash collision occurs when a hash function maps two different inputs to the same bucket.\n", 68 | "\n", 69 | "**b)** If a hash function has more input values than buckets, a collision cannot be avoided.\n", 70 | "\n", 71 | "**c)** If a hash function has fewer input values than buckets, a collision cannot occur." 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": { 77 | "id": "J_UCWRRJwh6l" 78 | }, 79 | "source": [ 80 | "### Solution" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": { 86 | "id": "6ia_wGE0wkew" 87 | }, 88 | "source": [ 89 | "The correct answers are **a)** and **b)**. \n", 90 | "\n", 91 | "**c)** A collision can still occur even when there are more buckets than input values." 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": { 97 | "id": "y87y3nkdDDKf" 98 | }, 99 | "source": [ 100 | "## Question 2" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": { 106 | "id": "EwsIDzrKk46r" 107 | }, 108 | "source": [ 109 | "You have data coming in that can be any lower-case character. Will the following hash function have any potential collisions? If so, for what values and buckets?" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": null, 115 | "metadata": { 116 | "id": "rtGHi6OXfs_p" 117 | }, 118 | "outputs": [], 119 | "source": [ 120 | "import string\n", 121 | "\n", 122 | "def hash_bucket(char):\n", 123 | " # Raise an error if the character is not a lower-case character.\n", 124 | " if len(char) != 1 or not char.islower():\n", 125 | " raise ValueError('Input must be a single lower-case letter.')\n", 126 | "\n", 127 | " # string.ascii_lowercase.index returns the position of a letter in the\n", 128 | " # alphabet. For example:\n", 129 | " # - string.ascii_lowercase.index('a') = 0\n", 130 | " # - string.ascii_lowercase.index('e') = 4\n", 131 | " return string.ascii_lowercase.index(char) % 25" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": null, 137 | "metadata": { 138 | "id": "O48dPxEJcup7" 139 | }, 140 | "outputs": [], 141 | "source": [ 142 | "#freetext" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": { 148 | "id": "sPS__ZTgdKzr" 149 | }, 150 | "source": [ 151 | "### Solution" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": { 157 | "id": "hUcs6qevk9HC" 158 | }, 159 | "source": [ 160 | "This hash function has 25 buckets, since the `return` statement ends with `% 25`. In general, whenever a hash function ends with `% n` it has $n$ buckets, since `x % n` is always an integer between 0 and $n-1$.\n", 161 | "\n", 162 | "There are 26 possible input values, `'a'` through `'z'`. Since there are more input values than buckets, this function is guaranteed to have collisions.\n", 163 | "\n", 164 | "The collision occurs for `'a'` and `'z'`.\n", 165 | "\n", 166 | "```python\n", 167 | "hash_bucket('a') = 0 % 25\n", 168 | " = 0\n", 169 | "hash_bucket('z') = 25 % 25\n", 170 | " = 0\n", 171 | "```" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": { 178 | "id": "U1SQZFw1icDh" 179 | }, 180 | "outputs": [], 181 | "source": [ 182 | "import string\n", 183 | "\n", 184 | "def hash_bucket(char):\n", 185 | " # Raise an error if the character is not a lower-case character.\n", 186 | " if len(char) != 1 or not char.islower():\n", 187 | " raise ValueError('Input must be a single lower-case letter.')\n", 188 | "\n", 189 | " # string.ascii_lowercase.index returns the position of a letter in the\n", 190 | " # alphabet. For example:\n", 191 | " # - string.ascii_lowercase.index('a') = 0\n", 192 | " # - string.ascii_lowercase.index('e') = 4\n", 193 | " return string.ascii_lowercase.index(char) % 25" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": null, 199 | "metadata": { 200 | "id": "v2tbzmqmeSt0" 201 | }, 202 | "outputs": [], 203 | "source": [ 204 | "print(hash_bucket('a'))\n", 205 | "print(hash_bucket('z'))" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": { 211 | "id": "fa9Koe1WDFi_" 212 | }, 213 | "source": [ 214 | "## Question 3" 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "metadata": { 220 | "id": "XW9-sa-nk61z" 221 | }, 222 | "source": [ 223 | "Consider the following hash function for integers." 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": null, 229 | "metadata": { 230 | "id": "lbwwPpCOj5rn" 231 | }, 232 | "outputs": [], 233 | "source": [ 234 | "def hash_bucket(i):\n", 235 | " return i**2 % 10" 236 | ] 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "metadata": { 241 | "id": "PvHybKztvRkq" 242 | }, 243 | "source": [ 244 | "Below are the data you need to store using this hash function." 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": null, 250 | "metadata": { 251 | "id": "RBgT9Wo9vtU5" 252 | }, 253 | "outputs": [], 254 | "source": [ 255 | "data = [0, 6, -6, 3, 9, -5, 2, 1]" 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": { 261 | "id": "CXykbbMNvxYl" 262 | }, 263 | "source": [ 264 | "For this data and hash function, do you expect any collisions? If so, for what values and buckets?" 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": null, 270 | "metadata": { 271 | "id": "yfan7tCFu87s" 272 | }, 273 | "outputs": [], 274 | "source": [ 275 | "# TODO(you): Write code to check for collisions." 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": { 281 | "id": "Ym4wUECAdNP_" 282 | }, 283 | "source": [ 284 | "### Solution" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": { 290 | "id": "iWF4lrkPk9_m" 291 | }, 292 | "source": [ 293 | "This hash function has 10 buckets and there are 8 data points. Therefore, we are not *guaranteed* to have collisions, but there may be some anyway.\n", 294 | "\n", 295 | "The easiest way to check if there are collisions is to compute the hash bucket for every input value. (Note that realistically, this may not always be possible; for example when you have billions of inputs.) This can be accomplished with a `for` loop." 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": null, 301 | "metadata": { 302 | "id": "h7l4OeTavobX" 303 | }, 304 | "outputs": [], 305 | "source": [ 306 | "def hash_bucket(i):\n", 307 | " return i**2 % 10\n", 308 | "\n", 309 | "data = [0, 6, -6, 3, 9, -5, 2, 1]" 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": null, 315 | "metadata": { 316 | "id": "mhei7B1Av9Lv" 317 | }, 318 | "outputs": [], 319 | "source": [ 320 | "for i in data:\n", 321 | " print(\"input: %d, hash_bucket: %d\" % (i, hash_bucket(i)))" 322 | ] 323 | }, 324 | { 325 | "cell_type": "markdown", 326 | "metadata": { 327 | "id": "hB_fpW4-Le-0" 328 | }, 329 | "source": [ 330 | "Both 6 and -6 map to 6, and both 1 and 9 map to 1. Therefore, two buckets (6 and 1) have collisions, with both containing two data entries." 331 | ] 332 | }, 333 | { 334 | "cell_type": "markdown", 335 | "metadata": { 336 | "id": "UP54-GJoXXjk" 337 | }, 338 | "source": [ 339 | "## Question 4" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": { 345 | "id": "q-yPvfzwXXjk" 346 | }, 347 | "source": [ 348 | "Consider the following hash function for integers." 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": null, 354 | "metadata": { 355 | "id": "QNYh2bz3XXjk" 356 | }, 357 | "outputs": [], 358 | "source": [ 359 | "def hash_bucket(i):\n", 360 | " return i**2 % 10" 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": { 366 | "id": "rBulpw3xXXjk" 367 | }, 368 | "source": [ 369 | "Below are the data you need to store using this hash function." 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": null, 375 | "metadata": { 376 | "id": "gShbHoFNXXjk" 377 | }, 378 | "outputs": [], 379 | "source": [ 380 | "data = [0, 6, -6, 3, 9, -5, 2, 1]" 381 | ] 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "metadata": { 386 | "id": "25YI5yNtXXjk" 387 | }, 388 | "source": [ 389 | "In the previous question, we showed that both 6 and -6 map to 6, and both 1 and 9 map to 1. What changes might you make to this hash function in order to reduce the number of collisions to 0?" 390 | ] 391 | }, 392 | { 393 | "cell_type": "code", 394 | "execution_count": null, 395 | "metadata": { 396 | "id": "6pskJM3AvH7R" 397 | }, 398 | "outputs": [], 399 | "source": [ 400 | "def hash_bucket(i):\n", 401 | " # TODO(you): Edit this function to avoid collisions.\n", 402 | " return i**2 % 10" 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "metadata": { 408 | "id": "hfqtbc3JXXjk" 409 | }, 410 | "source": [ 411 | "### Solution" 412 | ] 413 | }, 414 | { 415 | "cell_type": "markdown", 416 | "metadata": { 417 | "id": "t9tFWciFXXjk" 418 | }, 419 | "source": [ 420 | "This is not the only correct solution, but given that negative numbers are handled differently than positive numbers by Python's modulo operator, you can change `hash_bucket` to cube the number, rather than square it." 421 | ] 422 | }, 423 | { 424 | "cell_type": "code", 425 | "execution_count": null, 426 | "metadata": { 427 | "id": "hBQW0RcFXXjk" 428 | }, 429 | "outputs": [], 430 | "source": [ 431 | "def hash_bucket(i):\n", 432 | " return i**3 % 10\n", 433 | "\n", 434 | "data = [0, 6, -6, 3, 9, -5, 2, 1]" 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": null, 440 | "metadata": { 441 | "id": "HCDiqfqJXXjk" 442 | }, 443 | "outputs": [], 444 | "source": [ 445 | "for i in data:\n", 446 | " print(\"input: %d, hash_bucket: %d\" % (i, hash_bucket(i)))" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": { 452 | "id": "hUSwhAHgXXjk" 453 | }, 454 | "source": [ 455 | "Now, no two inputs map to the same output." 456 | ] 457 | }, 458 | { 459 | "cell_type": "markdown", 460 | "metadata": { 461 | "id": "dIDpnruXechr" 462 | }, 463 | "source": [ 464 | "## Question 5" 465 | ] 466 | }, 467 | { 468 | "cell_type": "markdown", 469 | "metadata": { 470 | "id": "ygltL2kkrX_C" 471 | }, 472 | "source": [ 473 | "Your coworker needs some help with a hash function, which is being used to store employee identification values. The company has 1,000 employees. Each employee's ID is a random integer between 0 and 99,999. The hash function used by your coworker has 2,000 buckets. Since there are more hash buckets than input values, your coworker doesn't know for sure if there are any collisions." 474 | ] 475 | }, 476 | { 477 | "cell_type": "markdown", 478 | "metadata": { 479 | "id": "oJaF40XVOjnI" 480 | }, 481 | "source": [ 482 | "Instead of going through each value one by one, can you automate the checking of collisions? Assume:\n", 483 | "\n", 484 | "- The hash function has the signature `def hash_bucket(employee_id)`, where:\n", 485 | " - `employee_id` is a string\n", 486 | " - `hash_bucket` returns an integer between 0 and 1,999\n", 487 | "\n", 488 | "- The employee identification values are stored in a list called `employee_ids`." 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": null, 494 | "metadata": { 495 | "id": "k_SEE_6ZfUDE" 496 | }, 497 | "outputs": [], 498 | "source": [ 499 | "def check_collisions(hash_bucket, employee_ids):\n", 500 | " \"\"\"Checks if a hash bucket function has collisions for a list of employee IDs.\n", 501 | "\n", 502 | " Args:\n", 503 | " hash_bucket: A hash function of employee ID to integer.\n", 504 | " employee_ids: A list of employee_ids.\n", 505 | "\n", 506 | " Returns:\n", 507 | " A list of buckets that have multiple entries, in the form of a tuple. The\n", 508 | " first element of the tuple is the hash bucket. The second element is a list\n", 509 | " of all the employee IDs in that bucket.\n", 510 | " \"\"\"\n", 511 | " # TODO(you): Implement\n", 512 | " print(\"This function has not been implemented.\")" 513 | ] 514 | }, 515 | { 516 | "cell_type": "markdown", 517 | "metadata": { 518 | "id": "LKaPv9pEfYNx" 519 | }, 520 | "source": [ 521 | "### Unit Tests\n", 522 | "\n", 523 | "Run the following cell to check your answer against some unit tests." 524 | ] 525 | }, 526 | { 527 | "cell_type": "code", 528 | "execution_count": null, 529 | "metadata": { 530 | "id": "1LPVvqEhshE5" 531 | }, 532 | "outputs": [], 533 | "source": [ 534 | "# Get some random employee IDs in the range 0 to 99.\n", 535 | "import random\n", 536 | "# Set a seed for consistent results.\n", 537 | "# If you want random results, comments the following line out.\n", 538 | "random.seed(1)\n", 539 | "\n", 540 | "employee_ids = []\n", 541 | "n_employees = 10\n", 542 | "for _ in range(n_employees):\n", 543 | " employee_ids.append(random.randrange(100))\n", 544 | "\n", 545 | "print(check_collisions(lambda x: x % 20, employee_ids))\n", 546 | "# Should print: [(17, [17, 97, 97, 57]), (12, [72, 32])]" 547 | ] 548 | }, 549 | { 550 | "cell_type": "markdown", 551 | "metadata": { 552 | "id": "nPi2_i4feg9z" 553 | }, 554 | "source": [ 555 | "### Solution" 556 | ] 557 | }, 558 | { 559 | "cell_type": "markdown", 560 | "metadata": { 561 | "id": "8UOigqtOtNM2" 562 | }, 563 | "source": [ 564 | "This can be automated in a `for` loop. Note that this is not the only solution." 565 | ] 566 | }, 567 | { 568 | "cell_type": "code", 569 | "execution_count": null, 570 | "metadata": { 571 | "id": "Xig98K9Rqipu" 572 | }, 573 | "outputs": [], 574 | "source": [ 575 | "def check_collisions(hash_bucket, employee_ids):\n", 576 | " \"\"\"Checks if a hash bucket function has collisions for a list of employee IDs.\n", 577 | "\n", 578 | " Args:\n", 579 | " hash_bucket: A hash function of employee ID to integer.\n", 580 | " employee_ids: A list of employee_ids.\n", 581 | "\n", 582 | " Returns:\n", 583 | " A list of buckets that have multiple entries, in the form of a tuple. The\n", 584 | " first element of the tuple is the hash bucket. The second element is a list\n", 585 | " of all the employee IDs in that bucket.\n", 586 | " \"\"\"\n", 587 | " # Since we need to check if a bucket has already been hit, it makes sense to\n", 588 | " # use a dictionary, mapping a bucket to all of its entries.\n", 589 | " buckets = {}\n", 590 | "\n", 591 | " # Loop through all the employee_ids.\n", 592 | " for id in employee_ids:\n", 593 | " bucket = hash_bucket(id)\n", 594 | " if bucket in buckets:\n", 595 | " buckets[bucket].append(id)\n", 596 | " else:\n", 597 | " buckets[bucket] = [id]\n", 598 | "\n", 599 | " # Return the buckets that have more than one entry.\n", 600 | " output = []\n", 601 | " for key, val in buckets.items():\n", 602 | " if len(val) \u003e 1:\n", 603 | " output.append((key, val))\n", 604 | "\n", 605 | " return output" 606 | ] 607 | }, 608 | { 609 | "cell_type": "markdown", 610 | "metadata": { 611 | "id": "LobSE5m5eeeW" 612 | }, 613 | "source": [ 614 | "## Question 6" 615 | ] 616 | }, 617 | { 618 | "cell_type": "markdown", 619 | "metadata": { 620 | "id": "bX3AlNJzX1rc" 621 | }, 622 | "source": [ 623 | "Design a hash function with 10 buckets that satisfies the following map. The hash function only accepts non-negative integers." 624 | ] 625 | }, 626 | { 627 | "cell_type": "markdown", 628 | "metadata": { 629 | "id": "yIfYj_2ke5ab" 630 | }, 631 | "source": [ 632 | "Input | Hash Bucket\n", 633 | "----- | -----------\n", 634 | "0 | 1\n", 635 | "3 | 8\n", 636 | "4 | 6\n", 637 | "6 | 4\n", 638 | "7 | 8\n", 639 | "10 | 4" 640 | ] 641 | }, 642 | { 643 | "cell_type": "code", 644 | "execution_count": null, 645 | "metadata": { 646 | "id": "u9WW5Q9Hu0cK" 647 | }, 648 | "outputs": [], 649 | "source": [ 650 | "def hash_bucket(i):\n", 651 | " if not isinstance(i, int) or i \u003c 0:\n", 652 | " raise ValueError(\"Input must be a non-negative integer.\")\n", 653 | " # TODO(you): Implement\n", 654 | " print(\"This function has not been implemented.\")" 655 | ] 656 | }, 657 | { 658 | "cell_type": "markdown", 659 | "metadata": { 660 | "id": "pZu7Uz5-pRrm" 661 | }, 662 | "source": [ 663 | "### Hint" 664 | ] 665 | }, 666 | { 667 | "cell_type": "markdown", 668 | "metadata": { 669 | "id": "T_VFA16EpUCU" 670 | }, 671 | "source": [ 672 | "Notice that 0 is the only input that has an odd bucket. What function produces even results for all positive integers, but 1 for 0?" 673 | ] 674 | }, 675 | { 676 | "cell_type": "markdown", 677 | "metadata": { 678 | "id": "ayJoY5wzgnFY" 679 | }, 680 | "source": [ 681 | "### Unit Tests\n", 682 | "\n", 683 | "Run the following cell to check your answer against some unit tests." 684 | ] 685 | }, 686 | { 687 | "cell_type": "code", 688 | "execution_count": null, 689 | "metadata": { 690 | "id": "QYRWTTj0gpWA" 691 | }, 692 | "outputs": [], 693 | "source": [ 694 | "inputs = [0, 3, 4, 6, 7, 10]\n", 695 | "buckets = []\n", 696 | "\n", 697 | "for i in inputs:\n", 698 | " buckets.append(hash_bucket(i))\n", 699 | "\n", 700 | "print(buckets)\n", 701 | "# Should print: [1, 8, 6, 4, 8, 4]" 702 | ] 703 | }, 704 | { 705 | "cell_type": "markdown", 706 | "metadata": { 707 | "id": "8pOpmhKcejR4" 708 | }, 709 | "source": [ 710 | "### Solution" 711 | ] 712 | }, 713 | { 714 | "cell_type": "markdown", 715 | "metadata": { 716 | "id": "gDcl8y55X9-Z" 717 | }, 718 | "source": [ 719 | "There may be several solutions. Here is just one." 720 | ] 721 | }, 722 | { 723 | "cell_type": "code", 724 | "execution_count": null, 725 | "metadata": { 726 | "id": "zy0pzMkqvJMk" 727 | }, 728 | "outputs": [], 729 | "source": [ 730 | "def hash_bucket(i):\n", 731 | " if not isinstance(i, int) or i \u003c 0:\n", 732 | " raise ValueError(\"Input must be a non-negative integer.\")\n", 733 | " return 2**i % 10" 734 | ] 735 | }, 736 | { 737 | "cell_type": "markdown", 738 | "metadata": { 739 | "id": "PYZLfd6KvT-p" 740 | }, 741 | "source": [ 742 | "We can check our answers using a `for` loop." 743 | ] 744 | }, 745 | { 746 | "cell_type": "code", 747 | "execution_count": null, 748 | "metadata": { 749 | "id": "-D2QQ-UPvTPh" 750 | }, 751 | "outputs": [], 752 | "source": [ 753 | "inputs = [0, 3, 4, 6, 7, 10]\n", 754 | "buckets = []\n", 755 | "\n", 756 | "for i in inputs:\n", 757 | " buckets.append(hash_bucket(i))\n", 758 | "\n", 759 | "print(buckets)" 760 | ] 761 | }, 762 | { 763 | "cell_type": "markdown", 764 | "metadata": { 765 | "id": "yJfjAS76qk3a" 766 | }, 767 | "source": [ 768 | "## Question 7" 769 | ] 770 | }, 771 | { 772 | "cell_type": "markdown", 773 | "metadata": { 774 | "id": "SJA9T5MhKcPo" 775 | }, 776 | "source": [ 777 | "Your colleague has written the following hash function, that maps a word to integer buckets." 778 | ] 779 | }, 780 | { 781 | "cell_type": "code", 782 | "execution_count": null, 783 | "metadata": { 784 | "id": "Q4INswCGKXqY" 785 | }, 786 | "outputs": [], 787 | "source": [ 788 | "def hash_bucket(word):\n", 789 | " return len(word) % 100" 790 | ] 791 | }, 792 | { 793 | "cell_type": "markdown", 794 | "metadata": { 795 | "id": "c9nEi9DAZ6XO" 796 | }, 797 | "source": [ 798 | "They then use this hash function to store the words in the following sentence ([source](https://en.wikipedia.org/wiki/Collision_(computer_science))) into integers.\n", 799 | "\n", 800 | "\u003e *In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value.*\n", 801 | "\n", 802 | "However, they notice that there are some collisions, even though the number of buckets (100) is much greater than the number of unique words in the sentence (22).\n", 803 | "\n", 804 | "Can you explain to your colleague why this might be a suboptimal choice of hash function, for this use case?" 805 | ] 806 | }, 807 | { 808 | "cell_type": "code", 809 | "execution_count": null, 810 | "metadata": { 811 | "id": "dR10mL-cfG0s" 812 | }, 813 | "outputs": [], 814 | "source": [ 815 | "#freetext" 816 | ] 817 | }, 818 | { 819 | "cell_type": "markdown", 820 | "metadata": { 821 | "id": "ZWeVEU1FrCUg" 822 | }, 823 | "source": [ 824 | "### Solution" 825 | ] 826 | }, 827 | { 828 | "cell_type": "markdown", 829 | "metadata": { 830 | "id": "3PKBCfjKLBex" 831 | }, 832 | "source": [ 833 | "The problem with this hash function is that while there are theoretically 100 buckets, in order for a word to be put in bucket 99, it needs to have 99 characters in it. This is basically never true in English. Therefore, while there are 100 buckets, the *distribution* of values into buckets is not uniform. Most words will fall into buckets 1-15, with almost none falling in buckets above 20." 834 | ] 835 | }, 836 | { 837 | "cell_type": "code", 838 | "execution_count": null, 839 | "metadata": { 840 | "id": "D5whysoxw2dX" 841 | }, 842 | "outputs": [], 843 | "source": [ 844 | "words = [\"In\", \"computer\", \"science\", \"a\", \"collision\", \"or\", \"clash\", \"is\",\n", 845 | " \"a\", \"situation\", \"that\", \"occurs\", \"when\", \"two\", \"distinct\",\n", 846 | " \"pieces\", \"of\", \"data\", \"have\", \"the\", \"same\", \"hash\", \"value\"]\n", 847 | "\n", 848 | "unique_words = set(words)\n", 849 | "\n", 850 | "def hash_bucket(string):\n", 851 | " return len(string) % 100\n", 852 | "\n", 853 | "for i in unique_words:\n", 854 | " print(\"'%s' hashes to: %d\" % (i, hash_bucket(i)))" 855 | ] 856 | }, 857 | { 858 | "cell_type": "markdown", 859 | "metadata": { 860 | "id": "8LcP3Z1vLc81" 861 | }, 862 | "source": [ 863 | "Words with the same length always have the same hash bucket." 864 | ] 865 | } 866 | ], 867 | "metadata": { 868 | "colab": { 869 | "collapsed_sections": [ 870 | "J_UCWRRJwh6l", 871 | "sPS__ZTgdKzr", 872 | "Ym4wUECAdNP_", 873 | "hfqtbc3JXXjk", 874 | "LKaPv9pEfYNx", 875 | "nPi2_i4feg9z", 876 | "pZu7Uz5-pRrm", 877 | "ayJoY5wzgnFY", 878 | "8pOpmhKcejR4", 879 | "ZWeVEU1FrCUg" 880 | ], 881 | "name": "hash_collisions.ipynb", 882 | "provenance": [ ] 883 | }, 884 | "kernelspec": { 885 | "display_name": "Python 3", 886 | "name": "python3" 887 | } 888 | }, 889 | "nbformat": 4, 890 | "nbformat_minor": 0 891 | } 892 | -------------------------------------------------------------------------------- /4_introduction_to_algorithms/40_overview.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "aPukXT6cQL6l" 7 | }, 8 | "source": [ 9 | "# Introduction To Algorithms" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "enOxEpehxPww" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "1gHBwKoeQNdF" 25 | }, 26 | "source": [ 27 | "\u003e An **algorithm** is a sequence of well-defined processes. In the context of computer science, each single process should be executable by a computer." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "id": "62D-fqitxVZ-" 34 | }, 35 | "source": [ 36 | "An algorithm can be thought of like a cooking recipe or a gym workout, but for a computer. These are all processes that have step-by-step instructions.\n", 37 | "\n", 38 | "- An algorithm may have inputs (as a cooking recipe has ingredients).\n", 39 | "- An algorithm does not need to have inputs (as a gym workout may include weights but may just involve running and body weight).\n", 40 | "- An algorithm may have outputs (as a cooking recipe produces food).\n", 41 | "- An algorithm does not need to have outputs (as a gym workout does not have instantly visible results), but the processes are usually recorded (as a gym workout can be tracked)." 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": { 47 | "id": "So0VtgDXxnNn" 48 | }, 49 | "source": [ 50 | "### Example of an algorithm" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": { 56 | "id": "REzBfKRYxa9Z" 57 | }, 58 | "source": [ 59 | "In a previous lesson, we introduced a function that returns whether or not a positive integer is prime, by checking whether it has any divisors. (Remember a prime number is a positive integer that has no divisors except 1 and itself.)" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": { 65 | "id": "eUhS1ov6xrES" 66 | }, 67 | "source": [ 68 | "```python\n", 69 | "def is_prime(n):\n", 70 | " \"\"\"Returns True if n is a prime number.\"\"\"\n", 71 | " if not isinstance(n, int) or n \u003c 1:\n", 72 | " raise ValueError(\"Input must be a positive integer.\")\n", 73 | "\n", 74 | " if n == 1:\n", 75 | " return False\n", 76 | " \n", 77 | " for i in range(2, n // 2 + 1):\n", 78 | " if n % i == 0:\n", 79 | " return False\n", 80 | " \n", 81 | " return True\n", 82 | "```\n", 83 | "\n", 84 | "The function contains an algorithm that checks whether an input `n` is a prime number. The algorithm can be described as:\n", 85 | "\n", 86 | "1. Check that `n` is a positive integer. If not, raise an error, and exit the algorithm.\n", 87 | "1. If `n` is 1, return `False` and exit the algorithm.\n", 88 | "1. Iterate through integers between 2 and `n // 2` inclusive. If the remainder of `n` divided by *any* of these integers is zero, return `False` and exit the algorithm.\n", 89 | "1. Return `True` and exit and algorithm." 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": { 95 | "id": "1DBJoA1xxxRp" 96 | }, 97 | "source": [ 98 | "---" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": { 104 | "id": "ABX9H_hwxx6q" 105 | }, 106 | "source": [ 107 | "An algorithm is not defined by *what* it does, but by *how* it does it. The previous algorithm is an iterative algorithm to determine whether a number is prime. Below is a recursive function that does the same thing.\n", 108 | "\n", 109 | "```python\n", 110 | "def is_prime(n, i=2):\n", 111 | " \"\"\"Returns True if n is a prime number.\"\"\"\n", 112 | " if not isinstance(n, int) or n \u003c 1:\n", 113 | " raise ValueError(\"Input must be a positive integer.\")\n", 114 | " \n", 115 | " if n == 1:\n", 116 | " return False\n", 117 | "\n", 118 | " if 2 * i \u003e n:\n", 119 | " return True\n", 120 | "\n", 121 | " if n % i == 0:\n", 122 | " return False\n", 123 | " else:\n", 124 | " return is_prime(n, i + 1)\n", 125 | "```\n", 126 | "\n", 127 | "This algorithm has exactly the same input and output as the iterative algorithm, however it is not the same algorithm." 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": { 133 | "id": "v1IjMubCx8rE" 134 | }, 135 | "source": [ 136 | "### Inputs and outputs" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": { 142 | "id": "BMoUDlqlx9ti" 143 | }, 144 | "source": [ 145 | "The example algorithm that checks whether a number is prime has inputs (a positive integer) and outputs (a boolean indicating whether the input is prime). In general, algorithms do not need to have an input and output." 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": { 151 | "id": "AriLiPcJx_vg" 152 | }, 153 | "source": [ 154 | "The following code is also an algorithm.\n", 155 | "\n", 156 | "```python\n", 157 | "print('Hello.')\n", 158 | "print('World.')\n", 159 | "print('Over.')\n", 160 | "```\n", 161 | "\n", 162 | "Almost any code that you write that has a purpose is an algorithm. The following lessons are an introduction to algorithms, how to write, interpret, and analyze them." 163 | ] 164 | } 165 | ], 166 | "metadata": { 167 | "colab": { 168 | "collapsed_sections": [], 169 | "name": "introduction_to_algorithms_overview.ipynb", 170 | "private_outputs": true, 171 | "provenance": [ ] 172 | }, 173 | "kernelspec": { 174 | "display_name": "Python 3", 175 | "name": "python3" 176 | } 177 | }, 178 | "nbformat": 4, 179 | "nbformat_minor": 0 180 | } 181 | -------------------------------------------------------------------------------- /4_introduction_to_algorithms/44_big_o_math_deep_dive.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "Y55tIGD1XuhA" 7 | }, 8 | "source": [ 9 | "# Big-O Math Deep Dive" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "ubnskqLBXx6j" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "08x79eK9vahh" 25 | }, 26 | "source": [ 27 | "*NOTE: **This lesson is entirely optional.** The content covers the mathematics behind much of the complexity analysis in the standard material. The questions are extremely difficult and out of scope.*" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "id": "9H1fVKKbX7Zd" 34 | }, 35 | "source": [ 36 | "**Limit notation**\n", 37 | "\n", 38 | "When analyzing the efficiency of an algorithm, it is important to understand how the time and space requirements of the algorithm change as it handles more data. For example, suppose you have an algorithm for sorting integers that you want to deploy to production. You should know how long it takes and how much memory it requires both for small $n$ (where $n$ is the number of integers to sort) and also for increasingly large $n$, or as $n \\to \\infty$.\n", 39 | "\n", 40 | "You may have seen **limit notation** such as this before. If you haven't, don't worry. Writing $n \\to \\infty$ is just shorthand for writing \"as $n$ gets larger and larger\".\n", 41 | "\n", 42 | "\u003e A statement $S(n)$ is true as $n \\to \\infty$ if there exists an $N$ such that $S(n)$ is true for all $n \\geq N$.\n", 43 | "\n", 44 | "Remember that it is not sufficient for the statement to be true *at* a large value of $N$, but for all values of $n \\geq N$." 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": { 50 | "id": "V7kjFxujewwV" 51 | }, 52 | "source": [ 53 | "**Conceptual definition**\n", 54 | "\n", 55 | "In a mathematical context, big-O notation is used to compare the growth of two functions. The growth of a function is, conceptually, how the function behaves as the input increases towards infinity.\n", 56 | "\n", 57 | "\u003e $f(n) \\in O(g(n))$ if $f(n)$ grows at most as quickly as $g(n)$, as $n \\to \\infty$.\n", 58 | "\n", 59 | "$O(g(n))$ is the set of all of the functions that grow at most as quickly as $g(n)$. The $\\in$ notation is used to denote membership in a set, and $f(n)$ is one of the functions in the set.\n", 60 | "\n", 61 | "In most contexts in computer science, it is more common to write $f(n) = O(g(n))$ than $f(n) \\in O(g(n))$. Therefore throughout this lesson, we will use $=$ instead of $\\in$ for big-O comparisons." 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": { 67 | "id": "SothqOd-Ewz9" 68 | }, 69 | "source": [ 70 | "**Mathematical definition 1**\n", 71 | "\n", 72 | "The mathematical definition of big-O is just a formalization of the conceptual definition above.\n", 73 | "\n", 74 | "\u003e $f(n) \\in O(g(n))$ as $n \\to \\infty$ if, for any given $N$, there exists a positive number $M$ such that $|f(n)| \\leq Mg(n)$ for all $n \\geq N$.\n", 75 | "\n", 76 | "Using this definition, $n^2 = O(2^n)$ because, for any $N$, you can find an $M$ such that $n^2 \\leq M \\cdot 2^n$ for all $n \\geq N$. For example:\n", 77 | "\n", 78 | "- If $N = 3$, you can choose $M = 2$ so that $n^2 \\leq 2 \\cdot 2^n$ for all $n \\geq 3$. (In fact, there are smaller values of $M$ that you could choose. $M = 1.5$ would work too, as would any value $M \\geq \\frac{9}{8}$.)\n", 79 | "- For any $N \\geq 4$, you can choose $M = 1$ so that $n^2 \\leq 1 \\cdot 2^n$ for all $n \\geq 4$." 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": { 85 | "id": "zvVYlFtneppg" 86 | }, 87 | "source": [ 88 | "**Mathematical definition 2**\n", 89 | "\n", 90 | "This alternate mathematical definition of big-O may be simpler to understand, but contains more rigorous notation.\n", 91 | "\n", 92 | "\u003e $f(n) \\in O(g(n))$ if $\\lim\\limits_{n \\to \\infty} \\frac{f(n)}{g(n)} \u003c \\infty$.\n", 93 | "\n", 94 | "- $\\lim\\limits_{n \\to \\infty} \\frac{f(n)}{g(n)}$ denotes the value that $\\frac{f(n)}{g(n)}$ approaches or becomes closer and closer to as $n \\to \\infty$.\n", 95 | "- $\u003c \\infty$ just means that the limit should be a finite number.\n", 96 | "\n", 97 | "Here's another way to write this:\n", 98 | "\n", 99 | "\u003e For any $N$, there exists an $M$ such that $\\frac{f(n)}{g(n)} \u003c M$ for all $n \u003e N$.\n", 100 | "\n", 101 | "This definition means that the ratio of $f(n)$ to $g(n)$ must *not* grow towards infinity as $n \\to \\infty$. For example, $\\lim\\limits_{n \\to \\infty} \\frac{n^2}{2^n} = 0$, so $n^2 = O(2^n)$.\n", 102 | "\n", 103 | "But the limit of the ratio does not need to be zero. Even if $f(n) \u003e g(n)$ for all $n$, it can still be true that $f(n) = O(g(n))$, as long as $f(n)$ does not *grow* faster than $g(n)$. For example, $100n = O(n)$, since $\\lim\\limits_{n \\to \\infty} \\frac{100n}{n} = \\lim\\limits_{n \\to \\infty} 100 = 100$. This can also be shown using the initial definition of big-O by choosing $M = 100$.\n", 104 | "\n", 105 | "As a general rule, constants can be ignored when applying big-O notation. More formally, $M f(n)$ always has the same big-O properties as $f(n)$ itself, so you can ignore $M$." 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": { 111 | "id": "MDGsWpmuxfZc" 112 | }, 113 | "source": [ 114 | "**Using derivatives to compare growth**\n", 115 | "\n", 116 | "One of the most effective ways to examine a function's growth is through its **derivative**. The derivative of a function $f(n)$ is itself a function, denoted $f'(n)$ that tells you the rate of change of $f(n)$. ([Here is a quick guide](https://www.mathsisfun.com/calculus/derivatives-rules.html) to derivatives of common functions.) For example, if $f(n) = n^2$ and $g(n) = 2^n$,\n", 117 | "\n", 118 | "\\begin{align*}\n", 119 | "f'(n) \u0026= 2n \\\\\n", 120 | "g'(n) \u0026= \\ln(2) \\cdot 2^n \\\\\n", 121 | "\\end{align*}\n", 122 | "\n", 123 | "where $\\ln$ is the [natural logarithm](https://en.wikipedia.org/wiki/Natural_logarithm). Remember that constants like $\\ln(2)$ can be ignored in big-O analysis. \n", 124 | "\n", 125 | "While it may be harder to tell by inspection that $g(n)$ grows faster than $f(n)$, it should be more straightforward to see that $g'(n) \u003e f'(n)$ for all large $n$, and therefore $f(n) = O(g(n))$." 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": { 131 | "id": "aKYtlMwg1sqv" 132 | }, 133 | "source": [ 134 | "## Assessment Questions" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": { 140 | "id": "RgXlyU0KSbcu" 141 | }, 142 | "source": [ 143 | "## Question 1" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": { 149 | "id": "gl0X3GW_2e10" 150 | }, 151 | "source": [ 152 | "Using derivatives, relate $f(n) = \\sqrt n$ and $g(n) = \\log_2(n)$ using big-O notation. See *Understanding Limiting Behavior* above for an example." 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": { 158 | "id": "9r1Qpr4JSsVm" 159 | }, 160 | "source": [ 161 | "### Hint" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": { 167 | "id": "u-rzcxVrStlN" 168 | }, 169 | "source": [ 170 | "Remember that $\\sqrt n = n^{\\frac{1}{2}}$." 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": { 176 | "id": "Yl7NrtHw3a7G" 177 | }, 178 | "source": [ 179 | "### Solution" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": { 185 | "id": "17t14S4B3hnw" 186 | }, 187 | "source": [ 188 | "Using [this guide for calculating derivatives](https://www.mathsisfun.com/calculus/derivatives-rules.html):\n", 189 | "\n", 190 | "\\begin{align*}\n", 191 | "f'(n) \u0026= \\frac{1}{2\\sqrt n} \\\\\n", 192 | "g'(n) \u0026= \\frac{1}{n \\ln(2)} \\\\\n", 193 | "\\end{align*}\n", 194 | "\n", 195 | "In general for large $n$, $\\sqrt{n} \u003c n$, so $\\frac{1}{\\sqrt n} \u003e \\frac{1}{n}$. As always, constants like $\\frac{1}{2}$ and $\\frac{1}{\\ln(2)}$ can be ignored. Therefore, since $f'(n) \u003e g'(n)$ for all large $n$, $g(n) = O(f(n))$." 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": { 201 | "id": "yv91ZRZ7cjU4" 202 | }, 203 | "source": [ 204 | "## Question 2" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": { 210 | "id": "g2YN_Pz-Di5_" 211 | }, 212 | "source": [ 213 | "Use any method to show that $f(n) = n^{\\frac{3}{2}}$ grows slower than $g(n) = n\\log_2(n)$." 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": { 219 | "id": "5MvEqxeXcvPO" 220 | }, 221 | "source": [ 222 | "### Solution" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": { 228 | "id": "rKVHqbD8LS7Q" 229 | }, 230 | "source": [ 231 | "There are a few ways to do this. The first way is a heuristic. As per the common complexities table in the Lesson Overview, $\\sqrt{n}$ grows faster than $\\log_2(n)$, so if both expressions are multipled by $n$, it follows that $n^{\\frac{3}{2}}$ grows faster than $n \\log_2(n)$.\n", 232 | "\n", 233 | "A second more formal approach uses derivatives. As seen in the lesson that defines big-O notation, if we can show that $f'(n) \u003e g'(n)$ for all large values of $n$, then $f$ grows faster than $g$. Given $f$ and $g$ defined as in the question, we have\n", 234 | " \n", 235 | "\\begin{align*}\n", 236 | "f'(n) \u0026= \\frac{3}{2} n^{\\frac{1}{2}} \\\\\n", 237 | "\u0026= O(\\sqrt{n}). \\\\\n", 238 | "\\end{align*}\n", 239 | "\n", 240 | "Using [log laws](https://en.wikipedia.org/wiki/List_of_logarithmic_identities#Using_simpler_operations), $\\log_2(n) = \\frac{\\ln(n)}{\\ln(2)}$. Thus we can rewrite $g$ as\n", 241 | "\n", 242 | "\\begin{align*}\n", 243 | "g(n) \u0026= \\frac{1}{\\ln(2)} n \\ln(n). \\\\\n", 244 | "\\end{align*}\n", 245 | "\n", 246 | "Using the chain rule to calculate the derivative of $g$ yields\n", 247 | "\n", 248 | "\\begin{align*}\n", 249 | "g'(n) \u0026= \\frac{1}{\\ln(2)} \\left( \\ln(n) \\cdot 1 + \\frac{1}{n} \\cdot n \\right) \\\\\n", 250 | "\u0026= \\frac{1}{\\ln(2)}(\\ln(n)+1) \\\\\n", 251 | "\u0026= \\log_2(n) + \\frac{1}{\\ln(2)} \\\\\n", 252 | "\u0026= O(\\log_2(n)). \\\\\n", 253 | "\\end{align*}\n", 254 | "\n", 255 | "As shown via derivatives in an exercise from the big-O definition lesson, $\\sqrt{n}$ grows faster than $\\log_2(n)$ for large $n$, so eventually for large $n$ $\\sqrt{n} \u003e \\log_2(n)$. Therefore, since $f'(n) \u003e g'(n)$ for large $n$, $f$ grows faster than $g$." 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": { 261 | "id": "18iplSYkSiS4" 262 | }, 263 | "source": [ 264 | "## Question 3" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": { 270 | "id": "k5rFE0NMSufk" 271 | }, 272 | "source": [ 273 | "[Advanced] Relate $f(n) = 100n^{100}$ and $g(n) = 2^n$ using big-O notation." 274 | ] 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "metadata": { 279 | "id": "FS1F2TMSTaiG" 280 | }, 281 | "source": [ 282 | "### Hint" 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": { 288 | "id": "jdo01UPtTcEt" 289 | }, 290 | "source": [ 291 | "What happens if you take the derivative over and over again?" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": { 297 | "id": "BFmgw8HlS0du" 298 | }, 299 | "source": [ 300 | "### Solution" 301 | ] 302 | }, 303 | { 304 | "cell_type": "markdown", 305 | "metadata": { 306 | "id": "NH9jFNS2UDkZ" 307 | }, 308 | "source": [ 309 | "This can be simplified by recognizing that constants can be ignored, so we can compare $f(n) = n^{100}$ to $g(n) = 2^n$. We covered in the Lesson Overview that $n^2 = O(2^n)$, so how does that change as the exponent of $n$ changes?\n", 310 | "\n", 311 | "These functions exemplify why we can't always solely use a visualization. See below for a graph of both functions on a log scale for values up to 100. For this range, it appears as if $n^{100}$ is growing much faster than $2^n$. If we expand the $x$-axis to larger values of $n$, we may hit computational issues. ($100^{100}$ is already a very large number, equal to 2 [googol](https://en.wikipedia.org/wiki/Googol).)" 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "execution_count": null, 317 | "metadata": { 318 | "id": "uYuKV9yyUssn" 319 | }, 320 | "outputs": [], 321 | "source": [ 322 | "N = 100\n", 323 | "n = [i for i in range(1, N+1)]\n", 324 | "\n", 325 | "f = [i**100 for i in n]\n", 326 | "g = [2**i for i in n]\n", 327 | "\n", 328 | "plt.plot(n, f, color='blue', label='n^100')\n", 329 | "plt.plot(n, g, color='red', label='2^n')\n", 330 | "plt.yscale('log')\n", 331 | "plt.legend()\n", 332 | "plt.show()" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": { 338 | "id": "9oWnGOMHV1Oq" 339 | }, 340 | "source": [ 341 | "For this comparison, let's try taking the derivative:\n", 342 | "\n", 343 | "\\begin{align*}\n", 344 | "f'(n) \u0026= 100n^{99} \\\\\n", 345 | "g'(n) \u0026= \\ln(2) \\cdot 2^n \\\\\n", 346 | "\\end{align*}\n", 347 | "\n", 348 | "And the second derivative:\n", 349 | "\n", 350 | "\\begin{align*}\n", 351 | "f''(n) \u0026= 9900n^{98} \\\\\n", 352 | "g''(n) \u0026= \\ln(2)^2 \\cdot 2^n \\\\\n", 353 | "\\end{align*}\n", 354 | "\n", 355 | "Let $f^{(m)}(n)$ be the $m^{\\textrm{th}}$ derivative of $f(n)$. By repeating this, we will see that:\n", 356 | "\n", 357 | "\\begin{align*}\n", 358 | "f^{(101)}(n) \u0026= 0 \\\\\n", 359 | "g^{(101)}(n) \u0026= \\ln(2)^ {101} \\cdot 2^n \\\\\n", 360 | "\\end{align*}\n", 361 | "\n", 362 | "So while all derivatives of $g(n)$ grow exponentially, the derivatives of $f(n)$ eventually have no growth. We can deduce from this that while $f(n)$ may have larger values than $g(n)$ for some $n$, eventually $g(n)$ will grow faster than $f(n)$, so $n^{100} = O(2^n)$." 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": { 368 | "id": "CEAl8qdBSkFr" 369 | }, 370 | "source": [ 371 | "## Question 4" 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": { 377 | "id": "EBAFxUfOZK_b" 378 | }, 379 | "source": [ 380 | "[Advanced] Relate $f(n)$ to $g(n)$ where:\n", 381 | "\n", 382 | "\\begin{align*}\n", 383 | "f(n) \u0026= 0.135 \\cdot 2^{2n+3} + 10^{100} \\sqrt n - 56 \\\\\n", 384 | "g(n) \u0026= 10n^{1000} + 9n^{999} + 8\\pi \\\\\n", 385 | "\\end{align*}" 386 | ] 387 | }, 388 | { 389 | "cell_type": "markdown", 390 | "metadata": { 391 | "id": "L-jwW9WvTwzT" 392 | }, 393 | "source": [ 394 | "### Hint" 395 | ] 396 | }, 397 | { 398 | "cell_type": "markdown", 399 | "metadata": { 400 | "id": "-t02mDBkTtU5" 401 | }, 402 | "source": [ 403 | "When a function is the sum of many functions, it only grows as fast as its fastest growing term. This can be formally stated as follows:\n", 404 | "\n", 405 | "\u003e If $f(n) = \\sum\\limits_{i=1}^n f_i(n)$ and there exists some $m$ such that $f_i(n) = O(f_m(n))$ for all $i$, then $f(n) = O(f_m(n))$." 406 | ] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": { 411 | "id": "yCNkp-OtS1dN" 412 | }, 413 | "source": [ 414 | "### Solution" 415 | ] 416 | }, 417 | { 418 | "cell_type": "markdown", 419 | "metadata": { 420 | "id": "H5VswXqTboMP" 421 | }, 422 | "source": [ 423 | "First, let's drastically simplify this problem by ignoring the multiplicative and additive constants.\n", 424 | "\n", 425 | "\\begin{align*}\n", 426 | "f(n) \u0026= 0.135 \\cdot 2^{2n+3} + 10^{100} \\sqrt n - 56 \\\\\n", 427 | "\u0026= O(2^{2n+3} + \\sqrt n) \\\\\n", 428 | "\u0026= O(2^3 \\cdot (2^2)^n + \\sqrt n) \\\\\n", 429 | "\u0026= O(4^n + \\sqrt n) \\\\\n", 430 | "g(n) \u0026= 10n^{1000} + 9n^{999} + 8\\pi \\\\\n", 431 | "\u0026= O(n^{1000} + n^{999}) \\\\\n", 432 | "\\end{align*}\n", 433 | "\n", 434 | "Now we can compare $f(n) = 4^n + \\sqrt n$ to $g(n) = n^{1000} + n^{999}$.\n", 435 | "\n", 436 | "Using the hint, if $f(n)$ can be broken down into a sum of other functions, then we only need to know which of those functions grows the fastest. Once we find that function $f_m(n)$, we know that $f(n) = O(f_m(n))$.\n", 437 | "\n", 438 | "Using similar approaches to above (either by derivatives or a visualization), we can show that $\\sqrt n = O(n^2)$, and as we saw in the Lesson Overview, $n^2 = O(2^n)$. It should then make intuitive sense that $2^n = O(4^n)$ since $4^n = (2^n)^2$, therefore $\\sqrt n = O(n^2) = O(2^n) = O(4^n)$. Using the above logic, $f(n) = O(4^n)$.\n", 439 | "\n", 440 | "Since $n^{1000} = n \\cdot n^{999}$, it should make sense that $n^{999} = O(n^{1000})$, therefore $g(n) = O(n^{1000})$. As we saw in Question 3, exponential growth beats any polynomial growth in the long run, so $n^{1000} = O(4^n)$, and $f(n) = O(g(n))$." 441 | ] 442 | }, 443 | { 444 | "cell_type": "markdown", 445 | "metadata": { 446 | "id": "kJrgzaA-UwTB" 447 | }, 448 | "source": [ 449 | "## Question 5" 450 | ] 451 | }, 452 | { 453 | "cell_type": "markdown", 454 | "metadata": { 455 | "id": "mRyUPOYH-rFg" 456 | }, 457 | "source": [ 458 | "Show that if $f(n) \\to \\infty$ as $n \\to \\infty$, then $f(n) + K = O(f(n))$ for a constant $K$.\n", 459 | "\n", 460 | "This is a formalization of what you have already seen above, namely that additive constants can be ignored for big-O comparisons. For example, $O(n^2 + 3) = O(n^2)$." 461 | ] 462 | }, 463 | { 464 | "cell_type": "markdown", 465 | "metadata": { 466 | "id": "Jd4TSqSp1AIc" 467 | }, 468 | "source": [ 469 | "### Solution" 470 | ] 471 | }, 472 | { 473 | "cell_type": "markdown", 474 | "metadata": { 475 | "id": "AhjMf98W_JHn" 476 | }, 477 | "source": [ 478 | "For this example, it is probably easiest to use the second definition of big-O. We need to show that if $f(n) \\to \\infty$ as $n \\to \\infty$ then $\\lim\\limits_{n \\to \\infty} \\frac{f(n) + K}{f(n)} \u003c \\infty$.\n", 479 | "\n", 480 | "\\begin{align*}\n", 481 | "\\lim_{n \\to \\infty} \\frac{f(n) + K}{f(n)} \u0026= \\lim_{n \\to \\infty} \\left( \\frac{f(n)}{f(n)} + \\frac{K}{f(n)} \\right) \\\\\n", 482 | "\u0026= \\lim_{n \\to \\infty} \\left( 1 + \\frac{K}{f(n)} \\right) \\\\\n", 483 | "\u0026= \\lim_{n \\to \\infty} 1 + \\lim_{n \\to \\infty} \\frac{K}{f(n)} \\\\\n", 484 | "\u0026= 1 + K \\lim_{n \\to \\infty} \\frac{1}{f(n)} \\\\\n", 485 | "\\end{align*}\n", 486 | "\n", 487 | "Since $f(n) \\to \\infty$ as $n \\to \\infty$, $\\frac{1}{f(n)} \\to 0$ as $n \\to \\infty$. Therefore the above expression reduces to 1, which is $\u003c \\infty$." 488 | ] 489 | }, 490 | { 491 | "cell_type": "markdown", 492 | "metadata": { 493 | "id": "8OFnqW7eUzfo" 494 | }, 495 | "source": [ 496 | "## Question 6" 497 | ] 498 | }, 499 | { 500 | "cell_type": "markdown", 501 | "metadata": { 502 | "id": "S4kjXOzJZJ_N" 503 | }, 504 | "source": [ 505 | "Show that if $Kf(n) = O(f(n))$ for a constant $K$.\n", 506 | "\n", 507 | "Again, this is a formalization of what you have seen above, that multiplicative constants can be ignored for big-O comparisons. For example, $O(3n^2) = O(n^2)$." 508 | ] 509 | }, 510 | { 511 | "cell_type": "markdown", 512 | "metadata": { 513 | "id": "E-BG70vtVG5r" 514 | }, 515 | "source": [ 516 | "### Solution" 517 | ] 518 | }, 519 | { 520 | "cell_type": "markdown", 521 | "metadata": { 522 | "id": "BRRfyltcZe3m" 523 | }, 524 | "source": [ 525 | "Again, this is most easily shown using the second mathematical definition above. We need to show that $\\lim\\limits_{n \\to \\infty} \\frac{Kf(n)}{f(n)} \u003c \\infty$.\n", 526 | "\n", 527 | "\\begin{align*}\n", 528 | "\\lim_{n \\to \\infty} \\frac{Kf(n)}{f(n)} \u0026= K \\lim\\limits_{n \\to \\infty} \\frac{f(n)}{f(n)} \\\\\n", 529 | "\u0026= K \\lim_{n \\to \\infty} 1 \\\\\n", 530 | "\u0026= K \\\\\n", 531 | "\\end{align*}\n", 532 | "\n", 533 | "Since $K$ is a constant, it is $\u003c \\infty$." 534 | ] 535 | }, 536 | { 537 | "cell_type": "markdown", 538 | "metadata": { 539 | "id": "y8-ka488U3fp" 540 | }, 541 | "source": [ 542 | "## Question 7" 543 | ] 544 | }, 545 | { 546 | "cell_type": "markdown", 547 | "metadata": { 548 | "id": "DQ46OXERaEX9" 549 | }, 550 | "source": [ 551 | "Show that if $f_1(n) = O(g_1(n))$ and $f_2(n) = O(g_2(n))$, then $f_1(n) f_2(n) = O(g_1(n) g_2(n))$.\n", 552 | "\n", 553 | "This is an important result since it shows that if you can break up a function into distinct parts, you can analyze the big-O notation of each part independently, then take the product. For example, $O(n^2 2^n) = O(n^2) O(2^n)$." 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": { 559 | "id": "-57LozH3VHqr" 560 | }, 561 | "source": [ 562 | "### Solution" 563 | ] 564 | }, 565 | { 566 | "cell_type": "markdown", 567 | "metadata": { 568 | "id": "4FizvQKja7mb" 569 | }, 570 | "source": [ 571 | "This is easiest to show when using the first definition of big-O.\n", 572 | "\n", 573 | "- $f_1(n) = O(g_1(n))$ therefore there exists $M_1$ such that $|f_1(n)| \\leq M_1g_1(n)$ for all $n \\geq N_1$.\n", 574 | "- $f_2(n) = O(g_2(n))$ therefore there exists $M_2$ such that $|f_2(n)| \\leq M_2g_2(n)$ for all $n \\geq N_2$.\n", 575 | "\n", 576 | "We need to show that for any given $N$, there exists an $M$ such that $|f_1(n) f_2(n)| \\leq Mg_1(n)g_2(n)$.\n", 577 | "\n", 578 | "Since the [absolute value of a product is the produce of absolute values](https://proofwiki.org/wiki/Absolute_Value_of_Product), we know that:\n", 579 | "\n", 580 | "$$|f_1(n) f_2(n)| = |f_1(n)||f_2(n)|$$\n", 581 | "\n", 582 | "Plugging in the inequalities above, we have that for $n \\geq \\max(N_1, N_2)$:\n", 583 | "\n", 584 | "\\begin{align*}\n", 585 | "|f_1(n)||f_2(n)| \u0026\\leq M_1g_1(n) M_2 g_2(n) \\\\\n", 586 | "\u0026= M_1M_2 g_1g_2(n)\n", 587 | "\\end{align*}\n", 588 | "\n", 589 | "We can therefore choose $M = M_1M_2$ and $n = \\max(N_1, N_2)$ to satisfy the inequality to show that $f_1(n) f_2(n) = O(g_1(n) g_2(n))$." 590 | ] 591 | }, 592 | { 593 | "cell_type": "markdown", 594 | "metadata": { 595 | "id": "8RnxdDplU5io" 596 | }, 597 | "source": [ 598 | "## Question 8" 599 | ] 600 | }, 601 | { 602 | "cell_type": "markdown", 603 | "metadata": { 604 | "id": "qYlLZIdl03NM" 605 | }, 606 | "source": [ 607 | "Show that if $f_1(n) = O(g_1(n))$ and $f_2(n) = O(g_2(n))$ then $f_1(n) + f_2(n) = O(g_1(n) + g_2(n))$.\n", 608 | "\n", 609 | "This is another result that you have already seen above. It shows that if a function can be split into a sum of other functions, it only grows as quickly as its fastest growing component. For example, $O(n^2 + 2^n) = O(2^n)$." 610 | ] 611 | }, 612 | { 613 | "cell_type": "markdown", 614 | "metadata": { 615 | "id": "B25QNhhXVIe9" 616 | }, 617 | "source": [ 618 | "### Solution" 619 | ] 620 | }, 621 | { 622 | "cell_type": "markdown", 623 | "metadata": { 624 | "id": "i64KEVBX1CBp" 625 | }, 626 | "source": [ 627 | "This is a similar proof to the solution to Question 3.\n", 628 | "\n", 629 | "- $f_1(n) = O(g_1(n))$ therefore there exists $M_1$ such that $|f_1(n)| \\leq M_1g_1(n)$ for all $n \\geq N_1$.\n", 630 | "- $f_2(n) = O(g_2(n))$ therefore there exists $M_2$ such that $|f_2(n)| \\leq M_2g_2(n)$ for all $n \\geq N_2$.\n", 631 | "\n", 632 | "By the [triangle inequality](https://en.wikipedia.org/wiki/Triangle_inequality):\n", 633 | "\n", 634 | "$$|f_1(n) + f_2(n)| \\leq |f_1(n)| + |f_2(n)|$$\n", 635 | "\n", 636 | "Plugging in the inequalities above, we have that for $n \\geq \\max(N_1, N_2)$:\n", 637 | "\n", 638 | "\\begin{align*}\n", 639 | "|f_1(n)| + |f_2(n)| \u0026\\leq M_1g_1(n) + M_2g_2(n) \\\\\n", 640 | "\u0026\\leq 2\\max(M_1, M_2) \\max(g_1(n), g_2(n)) \\\\\n", 641 | "\\end{align*}\n", 642 | "\n", 643 | "We have now found the constant $M = 2\\max(M_1M_2)$ to show that $f_1(n) + f_2(n) = O(\\max(g_1(n)g_2(n))$." 644 | ] 645 | } 646 | ], 647 | "metadata": { 648 | "colab": { 649 | "collapsed_sections": [ 650 | "9r1Qpr4JSsVm", 651 | "Yl7NrtHw3a7G", 652 | "5MvEqxeXcvPO", 653 | "FS1F2TMSTaiG", 654 | "BFmgw8HlS0du", 655 | "L-jwW9WvTwzT", 656 | "yCNkp-OtS1dN", 657 | "Jd4TSqSp1AIc", 658 | "E-BG70vtVG5r", 659 | "-57LozH3VHqr", 660 | "B25QNhhXVIe9" 661 | ], 662 | "name": "big_o_math_deep_dive.ipynb", 663 | "private_outputs": true, 664 | "provenance": [ ] 665 | }, 666 | "kernelspec": { 667 | "display_name": "Python 3", 668 | "name": "python3" 669 | } 670 | }, 671 | "nbformat": 4, 672 | "nbformat_minor": 0 673 | } 674 | -------------------------------------------------------------------------------- /5_sorting_algorithms/50_overview.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "nF3_3MhcKUiI" 7 | }, 8 | "source": [ 9 | "# Sorting Algorithms" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "5MsUJ7Z_yKJ3" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "DupfH8QDzKXw" 25 | }, 26 | "source": [ 27 | "\u003e A **sorting algorithm** is an algorithm that sorts an array of integers from lowest to highest." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "id": "vYg2OuntyM94" 34 | }, 35 | "source": [ 36 | "```python\n", 37 | "sort([3, 2, 4, 1])\n", 38 | "# [1, 2, 3, 4]\n", 39 | "```\n", 40 | "\n", 41 | "Sorting an array of integers is one of the most common and fundamental problems in computer science. Many sorting algorithms exist and are used in different contexts. The following lessons cover these five commonly used sorting algorithms:\n", 42 | "\n", 43 | "1. Bubble sort\n", 44 | "1. Insertion sort\n", 45 | "1. Selection sort\n", 46 | "1. Merge sort\n", 47 | "1. Quicksort" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": { 53 | "id": "aunXxoM9yPhU" 54 | }, 55 | "source": [ 56 | "### In-place and out-of-place sorting" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": { 62 | "id": "FrUNgRhOz2jO" 63 | }, 64 | "source": [ 65 | "Most sorting algorithms are implemented **in-place**. This means that the elements of the array are moved around within the input array itself, without creating a copy. The alternative method is **out-of-place** sorting, which does not alter the input array and instead returns a sorted copy." 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": { 71 | "id": "lBitTpA8ySH2" 72 | }, 73 | "source": [ 74 | "The code snippets below demonstrate two example methods, `sort_in_place` and `sort_out_of_place` that sort an array `arr` in-place and out-of-place respectively.\n", 75 | "\n", 76 | "Suppose you have written a function called `sort_in_place` that sorts an array in-place:\n", 77 | "\n", 78 | "```python\n", 79 | "arr = [3, 2, 4, 1]\n", 80 | "sort_in_place(arr)\n", 81 | "print(arr)\n", 82 | "# [1, 2, 3, 4]\n", 83 | "```\n", 84 | "\n", 85 | "With in-place sorting, the value `arr` is altered when it is passed through the sorting algorithm. Consequently, the original ordering is lost.\n", 86 | "\n", 87 | "Now, suppose you have written a function called `sort_in_place` that sorts an array in-place:\n", 88 | "\n", 89 | "```\n", 90 | "arr = [3, 2, 4, 1]\n", 91 | "arr_sorted = sort_out_of_place(arr)\n", 92 | "print(arr) # [3, 2, 4, 1]\n", 93 | "print(arr_sorted) # [1, 2, 3, 4]\n", 94 | "```\n", 95 | "\n", 96 | "With out-of-place sorting, the value of `arr` is not changed when it is passed through the sorting algorithm. Instead, the sorted array is assigned to a new variable `arr_sorted`.\n", 97 | "\n", 98 | "In-place sorting algorithms have an $O(1)$ space complexity, since no new variables are allocated. Out-of-place sorting algorithms have an $O(n)$ space complexity, since the input array is copied." 99 | ] 100 | } 101 | ], 102 | "metadata": { 103 | "colab": { 104 | "collapsed_sections": [], 105 | "name": "sorting_algorithms_overview.ipynb", 106 | "private_outputs": true, 107 | "provenance": [ ] 108 | }, 109 | "kernelspec": { 110 | "display_name": "Python 3", 111 | "name": "python3" 112 | } 113 | }, 114 | "nbformat": 4, 115 | "nbformat_minor": 0 116 | } 117 | -------------------------------------------------------------------------------- /5_sorting_algorithms/52_insertion_sort.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "5_USMGQXaApK" 7 | }, 8 | "source": [ 9 | "# Insertion Sort" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "WPswkEc6aCNq" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "RvmsLMdEK4xZ" 25 | }, 26 | "source": [ 27 | "**Insertion sort** is a sorting algorithm that repeatedly removes the first element of the input array and searches for the right place to put it in a sorted output array." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "id": "Q0w06rRJ5OMC" 34 | }, 35 | "source": [ 36 | "\u003e The average case time complexity of insertion sort is $O(n^2)$." 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "id": "V67mX01z5RFx" 43 | }, 44 | "source": [ 45 | "### Algorithm" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": { 51 | "id": "RQWtqHL75Xni" 52 | }, 53 | "source": [ 54 | "An example implementation of insertion sort is outlined here." 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": { 60 | "id": "8KIeigzQ6ZJD" 61 | }, 62 | "source": [ 63 | "0. **Initialize** an output array that will eventually contain all of the elements of the input array, sorted.\n", 64 | "\n", 65 | "1. **Insert** the first element of the input array to the output array, inserting it such that the output array maintains ordering. (Ensure to remove the moved element from the input array.)\n", 66 | "\n", 67 | "2. **Repeat** the insertion step until the input array is empty." 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": { 73 | "id": "JIbfTJmj5c3T" 74 | }, 75 | "source": [ 76 | "**Example**" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": { 82 | "id": "XI_dlrDC6SoG" 83 | }, 84 | "source": [ 85 | "The following table demonstrates sorting the array [2, 1, 4, 5] using insertion sort." 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": { 91 | "id": "U9_rtBHc5g30" 92 | }, 93 | "source": [ 94 | "**Iteration** | **Input array** | **Output array**\n", 95 | "--- | --- | ---\n", 96 | "0 | [2, 1, 4, 5] | []\n", 97 | "1 | [1, 4, 5] | [2]\n", 98 | "2 | [4, 5] | [1, 2]\n", 99 | "3 | [5] | [1, 2, 4]\n", 100 | "4 | [] | [1, 2, 4, 5]" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": { 106 | "id": "ZYkuukkoUimg" 107 | }, 108 | "source": [ 109 | "### Space complexity" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": { 115 | "id": "xEuyZNDpUlmo" 116 | }, 117 | "source": [ 118 | "Insertion sort can be implemented either in-place or out-of-place." 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": { 124 | "id": "FusegFLDUny2" 125 | }, 126 | "source": [ 127 | "Insertion sort may appear like an out-of-place sorting algorithm since it creates a new output array. However, it does not *copy* the input array, it *moves* elements from the input array to the output array. Insertion sort therefore does not create any *new* storage, so is $O(1)$." 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": { 133 | "id": "sMFaqHjnm10W" 134 | }, 135 | "source": [ 136 | "## Question" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": { 142 | "id": "zFDbri71aeTR" 143 | }, 144 | "source": [ 145 | "Most of the heavy lifting in insertion sort comes from finding the right place to insert an element into the sorted array, so let's start there. Write a function that takes in a sorted array along with a new element, and inserts the element in the appropriate place in the array. Your function should not `return` anything, instead it should modify the input array `arr`." 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": { 152 | "id": "qU_qi5V8Nt1u" 153 | }, 154 | "outputs": [], 155 | "source": [ 156 | "def insert_into_sorted_array(arr, el):\n", 157 | " \"\"\"Inserts an integer el into arr, an array of sorted integers.\"\"\"\n", 158 | " # TODO(you): Implement\n", 159 | " print('This function has not been implemented.')" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": { 165 | "id": "xQOwLQyinlJ9" 166 | }, 167 | "source": [ 168 | "### Hint" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": { 174 | "id": "CADUlmjLnmAE" 175 | }, 176 | "source": [ 177 | "`arr.insert(el, idx)` inserts the element `el` at index `idx` of an array `arr`." 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": { 183 | "id": "VH_yY0iqno-F" 184 | }, 185 | "source": [ 186 | "### Unit Tests\n", 187 | "\n", 188 | "Run the following cell to check your answer against some unit tests." 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": { 195 | "id": "ezASxlDIM7cK" 196 | }, 197 | "outputs": [], 198 | "source": [ 199 | "arr = []\n", 200 | "\n", 201 | "insert_into_sorted_array(arr, 4)\n", 202 | "print(arr) # Should print: [4]\n", 203 | "\n", 204 | "insert_into_sorted_array(arr, 2)\n", 205 | "print(arr) # Should print: [2, 4]\n", 206 | "\n", 207 | "insert_into_sorted_array(arr, 2)\n", 208 | "print(arr) # Should print: [2, 2, 4]\n", 209 | "\n", 210 | "insert_into_sorted_array(arr, 1)\n", 211 | "print(arr) # Should print: [1, 2, 2, 4]" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": { 217 | "id": "exda7mOlMnxh" 218 | }, 219 | "source": [ 220 | "### Solution" 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": null, 226 | "metadata": { 227 | "id": "zCDTKwbFM4B3" 228 | }, 229 | "outputs": [], 230 | "source": [ 231 | "def insert_into_sorted_array(arr, el):\n", 232 | " \"\"\"Inserts an integer el into arr, an array of sorted integers.\"\"\"\n", 233 | "\n", 234 | " # Iterate through arr and insert el at the first position whose value is\n", 235 | " # greater than or equal to el. If a value, exit the function.\n", 236 | " for i in range(len(arr)):\n", 237 | " if el \u003c= arr[i]:\n", 238 | " arr.insert(i, el)\n", 239 | " return\n", 240 | "\n", 241 | " # If no element was found in arr whose value is greater than or equal to el,\n", 242 | " # insert el at the end of arr.\n", 243 | " arr.append(el)" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": { 249 | "id": "IvxChCt7m_KT" 250 | }, 251 | "source": [ 252 | "## Question" 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "metadata": { 258 | "id": "q-z56wFNQaMs" 259 | }, 260 | "source": [ 261 | "What is the big-O time complexity of `insert_into_sorted_array` in the best, average, and worst case?" 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": null, 267 | "metadata": { 268 | "id": "dHNuCT-pn3Ye" 269 | }, 270 | "outputs": [], 271 | "source": [ 272 | "def insert_into_sorted_array(arr, el):\n", 273 | " \"\"\"Inserts an integer el into arr, an array of sorted integers.\"\"\"\n", 274 | "\n", 275 | " # Iterate through arr and insert el at the first position whose value is\n", 276 | " # greater than or equal to el. If a value, exit the function.\n", 277 | " for i in range(len(arr)):\n", 278 | " if el \u003c= arr[i]:\n", 279 | " arr.insert(i, el)\n", 280 | " return\n", 281 | "\n", 282 | " # If no element was found in arr whose value is greater than or equal to el,\n", 283 | " # insert el at the end of arr.\n", 284 | " arr.append(el)" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": null, 290 | "metadata": { 291 | "id": "LAcIXU9D5uPk" 292 | }, 293 | "outputs": [], 294 | "source": [ 295 | "#freetext" 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": { 301 | "id": "cxyAfkJsnKUg" 302 | }, 303 | "source": [ 304 | "### Solution" 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "metadata": { 310 | "id": "h2HBefN3Q52X" 311 | }, 312 | "source": [ 313 | "All of the operations in the function are $O(1)$, so the complexity is the number of iterations before `return` is called. Let $n$ be the length of the input array.\n", 314 | "\n", 315 | "In the trivial case, `len(arr) == 0`, and the function does not requires 0 iterations, so is $O(1)$.\n", 316 | "\n", 317 | "In the best non-trivial case, the element being added is less than the first element of the sorted array, so `el \u003c= arr[0]`. This case requires only 1 iteration, so is $O(1)$.\n", 318 | "\n", 319 | "In the worst case, the element being added is greater than the maximum element of the sorted array, so `el \u003e arr[i]` for all `i`. This case requires $n$ iterations *plus* the final append, which is equivalent to $n+1$ iterations, which is $O(n)$.\n", 320 | "\n", 321 | "Remember that the average case complexity is the mean complexity averaged over all possible insertion indices. If `el` is inserted at index 0, it requires 1 iteration. If `el` is inserted at index 1, it requires 2 iterations. If `el` is inserted at index 3, it requires 3 iterations, and so on. If `el` is inserted at position $n-1$, it requires $n$ iterations. If `el` is appended to the end of the array, it effectively requires $n+1$ iterations. This is equivalent to taking the mean of the integers between 1 and $n+1$. We have\n", 322 | "\n", 323 | "\\begin{align*}\n", 324 | "\\frac{1}{n+1} \\sum\\limits_{i=1}^{n+1} i \u0026= \\frac{1}{n+1} \\frac{(n+2)(n+1)}{2} \\\\\n", 325 | "\u0026= \\frac{n+2}{2} \\\\\n", 326 | "\u0026= \\frac{1}{2} n + 1 \\\\\n", 327 | "\u0026= O(n), \\\\\n", 328 | "\\end{align*}\n", 329 | "\n", 330 | "where the first equality comes from the [formula for an arithmetic sum](https://en.wikipedia.org/wiki/1_%2B_2_%2B_3_%2B_4_%2B_%E2%8B%AF).\n", 331 | "\n", 332 | "Therefore, the average case complexity, like the worst case, is $O(n)$, while the best case is $O(1)$. For this function, the average case number of iterations, $\\frac{n+2}{2}$, is equal to half way between the best case number of iterations, 1, and the worst case number of iterations, $n+1$." 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": { 338 | "id": "FjR0u4dpnAqA" 339 | }, 340 | "source": [ 341 | "## Question" 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": { 347 | "id": "bdW-Xk6IwfQe" 348 | }, 349 | "source": [ 350 | "Use `insert_into_sorted_array` to implement insertion sort." 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": null, 356 | "metadata": { 357 | "id": "Ock5ZiAsoPOu" 358 | }, 359 | "outputs": [], 360 | "source": [ 361 | "#persistent\n", 362 | "def insert_into_sorted_array(arr, el):\n", 363 | " \"\"\"Inserts an integer el into arr, an array of sorted integers.\"\"\"\n", 364 | "\n", 365 | " # Iterate through arr and insert el at the first position whose value is\n", 366 | " # greater than or equal to el. If a value, exit the function.\n", 367 | " for i in range(len(arr)):\n", 368 | " if el \u003c= arr[i]:\n", 369 | " arr.insert(i, el)\n", 370 | " return\n", 371 | "\n", 372 | " # If no element was found in arr whose value is greater than or equal to el,\n", 373 | " # insert el at the end of arr.\n", 374 | " arr.append(el)" 375 | ] 376 | }, 377 | { 378 | "cell_type": "code", 379 | "execution_count": null, 380 | "metadata": { 381 | "id": "a4uqASBZwpE5" 382 | }, 383 | "outputs": [], 384 | "source": [ 385 | "def insertion_sort(arr):\n", 386 | " \"\"\"Sorts an array of integers in ascending order.\"\"\"\n", 387 | " # TODO(you): Implement\n", 388 | " print('This function has not been implemented.')" 389 | ] 390 | }, 391 | { 392 | "cell_type": "markdown", 393 | "metadata": { 394 | "id": "CMv4NirTGTAh" 395 | }, 396 | "source": [ 397 | "### Hint" 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": { 403 | "id": "1cb3w4_dGT3T" 404 | }, 405 | "source": [ 406 | "Use the `pop` method to repeatedly remove the first element of the input array into the output array. Call `insert_into_sorted_array` to add the popped element to the output array such that the output array maintains order." 407 | ] 408 | }, 409 | { 410 | "cell_type": "markdown", 411 | "metadata": { 412 | "id": "GwaSDtjvoTGx" 413 | }, 414 | "source": [ 415 | "### Unit Tests\n", 416 | "\n", 417 | "Run the following cell to check your answer against some unit tests." 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": null, 423 | "metadata": { 424 | "id": "7eIU8jQG5ac-" 425 | }, 426 | "outputs": [], 427 | "source": [ 428 | "print(insertion_sort([2, 1, 4, 5, 2, 3, 7, 6]))\n", 429 | "# Should print: [1, 2, 2, 3, 4, 5, 6, 7]" 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "metadata": { 435 | "id": "6LThk8pDnLHQ" 436 | }, 437 | "source": [ 438 | "### Solution" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": { 444 | "id": "ESnayYJH5ALb" 445 | }, 446 | "source": [ 447 | "Insertion sort loops through each element in the input array and inserts it into the output array by repeatedly calling `insert_into_sorted_array`." 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": null, 453 | "metadata": { 454 | "id": "TLAS8xDu5AwW" 455 | }, 456 | "outputs": [], 457 | "source": [ 458 | "def insertion_sort(arr):\n", 459 | " \"\"\"Sorts an array of integers in ascending order.\"\"\"\n", 460 | " output = []\n", 461 | "\n", 462 | " while len(arr) \u003e 0:\n", 463 | " i = arr.pop(0)\n", 464 | " insert_into_sorted_array(output, i)\n", 465 | " \n", 466 | " return output" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": { 472 | "id": "ASIzt5_qnCai" 473 | }, 474 | "source": [ 475 | "## Question" 476 | ] 477 | }, 478 | { 479 | "cell_type": "markdown", 480 | "metadata": { 481 | "id": "hqBvzLexLn_l" 482 | }, 483 | "source": [ 484 | "What is the big-O time complexity of `insertion_sort`, in the best, average, and worst case? " 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "metadata": { 490 | "id": "_N7EGLkgWkTs" 491 | }, 492 | "source": [ 493 | "Remember that `insert_into_sorted_array` is $O(n)$ in the worst and average case, and $O(1)$ in the best case." 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": null, 499 | "metadata": { 500 | "id": "KueIJ7V1olei" 501 | }, 502 | "outputs": [], 503 | "source": [ 504 | "def insertion_sort(arr):\n", 505 | " \"\"\"Sorts an array of integers in ascending order.\"\"\"\n", 506 | " output = []\n", 507 | "\n", 508 | " while len(arr) \u003e 0:\n", 509 | " i = arr.pop(0)\n", 510 | " insert_into_sorted_array(output, i)\n", 511 | " \n", 512 | " return output" 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": null, 518 | "metadata": { 519 | "id": "eHWpBvvr59fn" 520 | }, 521 | "outputs": [], 522 | "source": [ 523 | "#freetext" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": { 529 | "id": "YwNkyGpmnLxh" 530 | }, 531 | "source": [ 532 | "### Solution" 533 | ] 534 | }, 535 | { 536 | "cell_type": "markdown", 537 | "metadata": { 538 | "id": "_OnjQQTAPE0N" 539 | }, 540 | "source": [ 541 | "As per a previous question and the hint, `insert_into_sorted_array` is $O(n)$ in the average and worst case, and $O(1)$ in the best case. The implementation of `insertion_sort` is essentially $n$ calls of `insert_into_sorted_array`. Therefore, `insertion_sort` is $O(n)$ in the best case, and $O(n^2)$ in the average and worst case." 542 | ] 543 | } 544 | ], 545 | "metadata": { 546 | "colab": { 547 | "collapsed_sections": [], 548 | "name": "insertion_sort.ipynb", 549 | "private_outputs": true, 550 | "provenance": [ ] 551 | }, 552 | "kernelspec": { 553 | "display_name": "Python 3", 554 | "name": "python3" 555 | } 556 | }, 557 | "nbformat": 4, 558 | "nbformat_minor": 0 559 | } 560 | -------------------------------------------------------------------------------- /5_sorting_algorithms/53_selection_sort.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "IobycsEITFLC" 7 | }, 8 | "source": [ 9 | "# Selection Sort" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "lCSaoIwgTHKD" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "KpG9GNclTPvD" 25 | }, 26 | "source": [ 27 | "**Selection sort** is a simple sorting algorithm that works by repeatedly moving the minimum element from the input (unsorted) array into the output (sorted) array." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "id": "C0fGNNtYAW5s" 34 | }, 35 | "source": [ 36 | "\u003e The average case time complexity of selection sort is $O(n^2)$." 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "id": "Qznmd_VUAe65" 43 | }, 44 | "source": [ 45 | "### Algorithm" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": { 51 | "id": "VqY6E9XKLKfB" 52 | }, 53 | "source": [ 54 | "An implemention of the selection sort algorithm is outlined here." 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": { 60 | "id": "uLZcJgv5JB3j" 61 | }, 62 | "source": [ 63 | "0. **Initialize** an output array that will eventually contain all of the elements of the input array, sorted.\n", 64 | "\n", 65 | "1. **Select** the minimum element of the input array and move it the end of the output array. (Ensure to remove the selected element from the input array.)\n", 66 | "\n", 67 | "2. **Repeat** the selection step until the input array is empty.\n", 68 | "\n", 69 | "Selection sort can be implemented either in-place or out-of-place. Selection sort may appear like an out-of-place sorting algorithm since it creates a new output array. However, it does not *copy* the input array, it *moves* elements from the input array to the output array. Selection sort therefore does not create any *new* storage, so is $O(1)$." 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": { 75 | "id": "9kyGPkPwAiPa" 76 | }, 77 | "source": [ 78 | "**Example**" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": { 84 | "id": "GCEvSI61JXiE" 85 | }, 86 | "source": [ 87 | "The following table demonstrates sorting the array [2, 1, 4, 5] using selection sort.\n", 88 | "\n" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": { 94 | "id": "r5N6uCyNAk6L" 95 | }, 96 | "source": [ 97 | "**Iteration** | **Input array** | **Output array**\n", 98 | "--- | --- | ---\n", 99 | "0 | [2, 1, 4, 5] | []\n", 100 | "1 | [2, 4, 5] | [1]\n", 101 | "2 | [4, 5] | [1, 2]\n", 102 | "3 | [5] | [1, 2, 4]\n", 103 | "4 | [] | [1, 2, 4, 5]" 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": { 109 | "id": "MxSN2DYZUA4p" 110 | }, 111 | "source": [ 112 | "## Question" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": { 118 | "id": "NjiqbTp5Toxq" 119 | }, 120 | "source": [ 121 | "An important step in the selection sort algorithm is finding the index of the minimum element of the array. Write this `minimum_index` function.\n", 122 | "\n", 123 | "If there is more than one minimum element, return the lowest index. For example, `minimum_index([2, 3, 1, 4, 1])` should return `2`, since 1, the lowest element, appears at indices 2 and 4." 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": null, 129 | "metadata": { 130 | "id": "GFTVZaT1UKvM" 131 | }, 132 | "outputs": [], 133 | "source": [ 134 | "def minimum_index(arr):\n", 135 | " \"\"\"Returns the index of the minimum value within a numerical array.\"\"\"\n", 136 | " # TODO(you): Implement" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": { 142 | "id": "0p5lNYV1UP3h" 143 | }, 144 | "source": [ 145 | "### Hint" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": { 151 | "id": "j7O-g53CUR4P" 152 | }, 153 | "source": [ 154 | "Use the following code scaffolding." 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": null, 160 | "metadata": { 161 | "id": "PetYqtGnUUYO" 162 | }, 163 | "outputs": [], 164 | "source": [ 165 | "def minimum_index(arr):\n", 166 | " \"\"\"Returns the index of the minimum value within a numerical array.\"\"\"\n", 167 | " # Initialize the minimum index.\n", 168 | " min_index = -1\n", 169 | " # Initialize the minimum value. Infinity is the standard initialization for\n", 170 | " # such functions, since every integer is less than infinity.\n", 171 | " min_value = float(\"Inf\")\n", 172 | "\n", 173 | " # Iterate through the input array list.\n", 174 | " for i in range(len(arr)):\n", 175 | " # Reset min_index and min_value if you find a value lower than min_value.\n", 176 | " # TODO(you): Complete\n", 177 | " \n", 178 | " return min_index" 179 | ] 180 | }, 181 | { 182 | "cell_type": "markdown", 183 | "metadata": { 184 | "id": "KImv111tU5oz" 185 | }, 186 | "source": [ 187 | "### Unit Tests\n", 188 | "\n", 189 | "Run the following cell to check your answer against some unit tests." 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": { 196 | "id": "NXAKffhuU7Ag" 197 | }, 198 | "outputs": [], 199 | "source": [ 200 | "print(minimum_index([2, 3, 1, 4, 5]))\n", 201 | "# Should print: 2\n", 202 | "print(minimum_index([2, 3, 1, 4, 1]))\n", 203 | "# Should print: 2" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": { 209 | "id": "iTUONZpRUDR1" 210 | }, 211 | "source": [ 212 | "### Solution" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "metadata": { 219 | "id": "1_h-gxH3UEx7" 220 | }, 221 | "outputs": [], 222 | "source": [ 223 | "def minimum_index(arr):\n", 224 | " \"\"\"Returns the index of the minimum value within a numerical array.\"\"\"\n", 225 | " # Initialize the minimum index.\n", 226 | " min_index = -1\n", 227 | " # Initialize the minimum value. Infinity is the standard initialization for\n", 228 | " # such functions, since every integer is less than infinity.\n", 229 | " min_value = float(\"Inf\")\n", 230 | "\n", 231 | " # Iterate through the input array list.\n", 232 | " for i in range(len(arr)):\n", 233 | " # Reset min_index and min_value if you find a value lower than min_value.\n", 234 | " if arr[i] \u003c min_value:\n", 235 | " min_index = i\n", 236 | " min_value = arr[i]\n", 237 | " \n", 238 | " return min_index" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": { 244 | "id": "q1pPyRrkOAWN" 245 | }, 246 | "source": [ 247 | "## Question" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": { 253 | "id": "0d3P2Q_yOB6n" 254 | }, 255 | "source": [ 256 | "What is the best, worst, and average case time complexity of `minimum_index`?" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "metadata": { 263 | "id": "8f2CZs0dBHDc" 264 | }, 265 | "outputs": [], 266 | "source": [ 267 | "#freetext" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": { 273 | "id": "W50WlY8fOKSH" 274 | }, 275 | "source": [ 276 | "### Solution" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": { 282 | "id": "U_ubkurlOLgY" 283 | }, 284 | "source": [ 285 | "All of the single-line operations in `minimum_index` are $O(1)$. Therefore the time complexity of `minimum_index` is the number of iterations of the `for` loop. In all cases, the loop iterates over all $n$ elements of `arr`, so time complexity is $O(n)$ in the best, worst, and average case." 286 | ] 287 | }, 288 | { 289 | "cell_type": "markdown", 290 | "metadata": { 291 | "id": "6jwo8NfUkQCe" 292 | }, 293 | "source": [ 294 | "## Question" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": { 300 | "id": "T09Yu1LUSO_m" 301 | }, 302 | "source": [ 303 | "Implement `selection_sort` using the `minimum_index` function." 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": null, 309 | "metadata": { 310 | "id": "08vlmsSXV4uM" 311 | }, 312 | "outputs": [], 313 | "source": [ 314 | "def minimum_index(arr):\n", 315 | " \"\"\"Returns the index of the minimum value within a numerical array.\"\"\"\n", 316 | " # Initialize the minimum index.\n", 317 | " min_index = -1\n", 318 | " # Initialize the minimum value. Infinity is the standard initialization for\n", 319 | " # such functions, since every integer is less than infinity.\n", 320 | " min_value = float(\"Inf\")\n", 321 | "\n", 322 | " # Iterate through the input array list.\n", 323 | " for i in range(len(arr)):\n", 324 | " # Reset min_index and min_value if you find a value lower than min_value.\n", 325 | " if arr[i] \u003c min_value:\n", 326 | " min_index = i\n", 327 | " min_value = arr[i]\n", 328 | " \n", 329 | " return min_index" 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": null, 335 | "metadata": { 336 | "id": "2FJ9nEsUZnnQ" 337 | }, 338 | "outputs": [], 339 | "source": [ 340 | "def selection_sort(arr):\n", 341 | " \"\"\"Sorts an array of integers in ascending order.\"\"\"\n", 342 | " # TODO(you): Implement\n", 343 | " print('This function has not been implemented.')" 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": { 349 | "id": "coGp5guolIgj" 350 | }, 351 | "source": [ 352 | "### Unit Tests\n", 353 | "\n", 354 | "Run the following cell to check your answer against some unit tests." 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": null, 360 | "metadata": { 361 | "id": "z57qIOHDlJ3i" 362 | }, 363 | "outputs": [], 364 | "source": [ 365 | "print(selection_sort([2, 1, 4, 5, 2, 3, 7, 6]))\n", 366 | "# Should print: [1, 2, 2, 3, 4, 5, 6, 7]" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": { 372 | "id": "bm6Givxy4v4O" 373 | }, 374 | "source": [ 375 | "### Solution" 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "metadata": { 381 | "id": "lLJzaOjmZzEL" 382 | }, 383 | "source": [ 384 | "This is by no means the only possible implementation. If your solution works and has the same time complexity, then it is completely valid!" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": null, 390 | "metadata": { 391 | "id": "E_-5t42iZ0KA" 392 | }, 393 | "outputs": [], 394 | "source": [ 395 | "def selection_sort(arr):\n", 396 | " \"\"\"Sorts an array of integers in ascending order.\"\"\"\n", 397 | " output = []\n", 398 | "\n", 399 | " while len(arr) \u003e 0:\n", 400 | " # Find the index of the minimum value within the input array.\n", 401 | " min_index = minimum_index(arr)\n", 402 | " # Remove the minimum value from the input array.\n", 403 | " min_value = arr.pop(min_index)\n", 404 | " # Add the minimum value to the output array.\n", 405 | " output.append(min_value)\n", 406 | " \n", 407 | " return output" 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": { 413 | "id": "n7jWRLsUkcuw" 414 | }, 415 | "source": [ 416 | "## Question" 417 | ] 418 | }, 419 | { 420 | "cell_type": "markdown", 421 | "metadata": { 422 | "id": "oin-LiWBLYCi" 423 | }, 424 | "source": [ 425 | "What is the best and worst case time complexity of selection sort?" 426 | ] 427 | }, 428 | { 429 | "cell_type": "code", 430 | "execution_count": null, 431 | "metadata": { 432 | "id": "6kkOHUNRBPd_" 433 | }, 434 | "outputs": [], 435 | "source": [ 436 | "#freetext" 437 | ] 438 | }, 439 | { 440 | "cell_type": "markdown", 441 | "metadata": { 442 | "id": "udH9IoSDP_-K" 443 | }, 444 | "source": [ 445 | "### Hint" 446 | ] 447 | }, 448 | { 449 | "cell_type": "markdown", 450 | "metadata": { 451 | "id": "xDOw2bn9QBgm" 452 | }, 453 | "source": [ 454 | "Remember that `minimum_index` is $O(n)$. How many iterations does `selection_sort` have in the worst case?" 455 | ] 456 | }, 457 | { 458 | "cell_type": "markdown", 459 | "metadata": { 460 | "id": "wjjRW3L8QL-Z" 461 | }, 462 | "source": [ 463 | "### Solution" 464 | ] 465 | }, 466 | { 467 | "cell_type": "markdown", 468 | "metadata": { 469 | "id": "2O0KwY9OQNBY" 470 | }, 471 | "source": [ 472 | "Unlike bubble sort, every step of selection sort must be performed regardless of what the input is. For example, even if the input array is already sorted, selection sort still repeatedly selects and moves the minimum of the input array to the output array. This indicates that the best and worst case time complexities should be the same." 473 | ] 474 | }, 475 | { 476 | "cell_type": "markdown", 477 | "metadata": { 478 | "id": "qs5u8uBmBdmj" 479 | }, 480 | "source": [ 481 | "There are only two lines in `selection_sort` that contribute to the time complexity. All other lines are $O(1)$.\n", 482 | "\n", 483 | "```python\n", 484 | "while len(arr) \u003e 0:\n", 485 | " min_index = minimum_index(arr)\n", 486 | "```\n", 487 | "\n", 488 | "The `while` loop has $n$ iterations, since the initial length of `arr` is $n$ and at each iteration 1 element is popped out. This is true in the best *and* the worst case. As per a previous question and the hint, `minimum_index` is $O(n)$. Therefore, since `selection_sort` contains $n$ calls of an $O(n)$ function, the best and worst case time complexity is $O(n^2)$." 489 | ] 490 | }, 491 | { 492 | "cell_type": "markdown", 493 | "metadata": { 494 | "id": "eAhNW3ilkURx" 495 | }, 496 | "source": [ 497 | "## Question" 498 | ] 499 | }, 500 | { 501 | "cell_type": "markdown", 502 | "metadata": { 503 | "id": "FmLezDohhKyh" 504 | }, 505 | "source": [ 506 | "What is the average case time complexity of selection sort?" 507 | ] 508 | }, 509 | { 510 | "cell_type": "code", 511 | "execution_count": null, 512 | "metadata": { 513 | "id": "8GhGj6zqBfds" 514 | }, 515 | "outputs": [], 516 | "source": [ 517 | "#freetext" 518 | ] 519 | }, 520 | { 521 | "cell_type": "markdown", 522 | "metadata": { 523 | "id": "satMptTRkunS" 524 | }, 525 | "source": [ 526 | "### Solution" 527 | ] 528 | }, 529 | { 530 | "cell_type": "markdown", 531 | "metadata": { 532 | "id": "QfV0ARoyimG8" 533 | }, 534 | "source": [ 535 | "Since the best and worst case time complexities are both $O(n^2)$, the time complexity must be $O(n^2)$ in *all* cases. Therefore, the average case time complexity is also $O(n^2)$." 536 | ] 537 | }, 538 | { 539 | "cell_type": "markdown", 540 | "metadata": { 541 | "id": "fH7H_OvZo2Rj" 542 | }, 543 | "source": [ 544 | "## Question" 545 | ] 546 | }, 547 | { 548 | "cell_type": "markdown", 549 | "metadata": { 550 | "id": "GQjJNvdwlnkc" 551 | }, 552 | "source": [ 553 | "Your friend Novell works for a publishing company called *A2ZBooks*, and his first task is to create a dictionary. Instead of building the dictionary from scratch, Novell has the clever idea to take all of the unique words in all of the books published by *A2ZBooks*, and just put them in order." 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": { 559 | "id": "bqhoudSDBiey" 560 | }, 561 | "source": [ 562 | "Novell has created an array of all the words used in all the books. This array is called `words`. He first converts all of the words to lower-case. He then uses some nifty code to create an array of unique words, in just one line of code. Finally, since he knows that sorting algorithms can be used just as effectively on strings as on integers, he uses your selection sort algorithm above to sort the unique words.\n", 563 | "\n", 564 | "```python\n", 565 | "# Lower-case every word in words.\n", 566 | "lower_case_words = [word.lower() for word in words]\n", 567 | "# Create a list of the unique words.\n", 568 | "unique_words = list(set(lower_case_words))\n", 569 | "# Sort the words using selection sort.\n", 570 | "sorted_words = selection_sort(unique_words)\n", 571 | "```\n", 572 | "\n", 573 | "Novell is convinced that this approach makes sense, but the `selection_sort` call throws a `TypeError`. Why is this? Can you adapt your code above to make it work for strings, while also still working for integers and floats? Below is the `selection_sort` code for reference.\n", 574 | "\n", 575 | "(If your solution above already works for strings, then you have already completed this question!)" 576 | ] 577 | }, 578 | { 579 | "cell_type": "code", 580 | "execution_count": null, 581 | "metadata": { 582 | "id": "aOPlKJncvPSm" 583 | }, 584 | "outputs": [], 585 | "source": [ 586 | "def minimum_index(arr):\n", 587 | " \"\"\"Returns the index of the minimum value within an array.\"\"\"\n", 588 | " # Initialize the minimum index.\n", 589 | " min_index = -1\n", 590 | " # Initialize the minimum value. Infinity is the standard initialization for\n", 591 | " # such functions, since every integer is less than infinity.\n", 592 | " min_value = float(\"Inf\")\n", 593 | "\n", 594 | " # Iterate through the input array list.\n", 595 | " for i in range(len(arr)):\n", 596 | " # Reset min_index and min_value if you find a value lower than min_value.\n", 597 | " if arr[i] \u003c min_value:\n", 598 | " min_index = i\n", 599 | " min_value = arr[i]\n", 600 | " \n", 601 | " return min_index\n", 602 | "\n", 603 | "\n", 604 | "def selection_sort(arr):\n", 605 | " # TODO(you): Make this function also work for strings.\n", 606 | " \"\"\"Sorts an array in ascending order.\"\"\"\n", 607 | " output = []\n", 608 | "\n", 609 | " while len(arr) \u003e 0:\n", 610 | " # Find the index of the minimum value within the input array.\n", 611 | " min_index = minimum_index(arr)\n", 612 | " # Remove the minimum value from the input array.\n", 613 | " min_value = arr.pop(min_index)\n", 614 | " # Add the minimum value to the output array.\n", 615 | " output.append(min_value)\n", 616 | " \n", 617 | " return output" 618 | ] 619 | }, 620 | { 621 | "cell_type": "markdown", 622 | "metadata": { 623 | "id": "4HqZ7GhMpX8I" 624 | }, 625 | "source": [ 626 | "### Unit Tests\n", 627 | "\n", 628 | "Run the following cell to check your answer against some unit tests." 629 | ] 630 | }, 631 | { 632 | "cell_type": "code", 633 | "execution_count": null, 634 | "metadata": { 635 | "id": "uq49LezLpZge" 636 | }, 637 | "outputs": [], 638 | "source": [ 639 | "print(selection_sort(['cat', 'ant', 'bee', 'aardvark']))\n", 640 | "# Should print: ['aardvark', 'ant', 'bee', 'cat']" 641 | ] 642 | }, 643 | { 644 | "cell_type": "markdown", 645 | "metadata": { 646 | "id": "8n0iIzadT5Yk" 647 | }, 648 | "source": [ 649 | "### Solution" 650 | ] 651 | }, 652 | { 653 | "cell_type": "markdown", 654 | "metadata": { 655 | "id": "9d2mhNhZvajY" 656 | }, 657 | "source": [ 658 | "*Almost* every line of `selection_sort` works just as well for floats and strings as it does for integers. The only exception is that `min_value` is initialized as `float(\"Inf\")` in `minimum_index`, and Python cannot compare this to a string.\n", 659 | "\n", 660 | "To fix this, we would need to find an equivalent \"positive infinity\" word that is greater than every possible string, similarly to how `float(\"Inf\")` is greater than any float or integer. Unfortunately, such a string does not exist. Therefore, the solution is a bit more nuanced.\n", 661 | "\n", 662 | "Instead of initializing `min_index` and `min_value` at `-1` and `float(\"Inf\")` respectively, we will initialize them at `0` and `arr[0]` respectively, with the caveat that if `arr` is empty, we `return -1`. Then, we iterate through the remaining elements of `arr`." 663 | ] 664 | }, 665 | { 666 | "cell_type": "code", 667 | "execution_count": null, 668 | "metadata": { 669 | "id": "pkd8AJz0y6tL" 670 | }, 671 | "outputs": [], 672 | "source": [ 673 | "def minimum_index(arr):\n", 674 | " \"\"\"Returns the index of the minimum value within an array.\"\"\"\n", 675 | " # This is necessary to ensure that arr[0] exists.\n", 676 | " if len(arr) == 0:\n", 677 | " return -1\n", 678 | "\n", 679 | " # Initialize the minimum index.\n", 680 | " min_index = 0\n", 681 | " # Initialize the minimum value.\n", 682 | " min_value = arr[0]\n", 683 | "\n", 684 | " # Iterate through the input array list.\n", 685 | " for i in range(1, len(arr)):\n", 686 | " # Reset min_index and min_value if you find a value lower than min_value.\n", 687 | " if arr[i] \u003c min_value:\n", 688 | " min_index = i\n", 689 | " min_value = arr[i]\n", 690 | " \n", 691 | " return min_index" 692 | ] 693 | }, 694 | { 695 | "cell_type": "markdown", 696 | "metadata": { 697 | "id": "QTcSl2r5z350" 698 | }, 699 | "source": [ 700 | "Now, `selection_sort` works for strings." 701 | ] 702 | }, 703 | { 704 | "cell_type": "code", 705 | "execution_count": null, 706 | "metadata": { 707 | "id": "tl37MPgTzl9h" 708 | }, 709 | "outputs": [], 710 | "source": [ 711 | "def selection_sort(arr):\n", 712 | " \"\"\"Sorts an array in ascending order.\"\"\"\n", 713 | " output = []\n", 714 | "\n", 715 | " while len(arr) \u003e 0:\n", 716 | " # Find the index of the minimum value within the input array.\n", 717 | " min_index = minimum_index(arr)\n", 718 | " # Remove the minimum value from the input array.\n", 719 | " min_value = arr.pop(min_index)\n", 720 | " # Add the minimum value to the output array.\n", 721 | " output.append(min_value)\n", 722 | " \n", 723 | " return output" 724 | ] 725 | }, 726 | { 727 | "cell_type": "markdown", 728 | "metadata": { 729 | "id": "ABf41GbWz7F4" 730 | }, 731 | "source": [ 732 | "Since the new `selection_sort` now works for strings as well as integers and floats, it is a more robust algorithm. It is therefore probably a better implementation, even though it contains some extra logic." 733 | ] 734 | } 735 | ], 736 | "metadata": { 737 | "colab": { 738 | "collapsed_sections": [], 739 | "name": "selection_sort.ipynb", 740 | "private_outputs": true, 741 | "provenance": [ ] 742 | }, 743 | "kernelspec": { 744 | "display_name": "Python 3", 745 | "name": "python3" 746 | } 747 | }, 748 | "nbformat": 4, 749 | "nbformat_minor": 0 750 | } 751 | -------------------------------------------------------------------------------- /5_sorting_algorithms/56_comparing_sorting_algorithms.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "GFSd1QNJ3tDI" 7 | }, 8 | "source": [ 9 | "# Comparing Sorting Algorithms" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "s2eAqNmxMmQt" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "PVYbsADyMnMd" 25 | }, 26 | "source": [ 27 | "Now that you have seen the most commonly used sorting algorithms, it is important to know the advantages and disadvantages of each algorithm, in terms of:\n", 28 | "\n", 29 | "- Time complexity\n", 30 | "- Space complexity\n", 31 | "- Code complexity" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": { 37 | "id": "d3OpZr-FNHOJ" 38 | }, 39 | "source": [ 40 | "The questions in this lesson ask you to compare the sorting algorithms you have seen, in order to know which algorithm is appropriate to use for different problems." 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": { 46 | "id": "7f4riG-1zuCG" 47 | }, 48 | "source": [ 49 | "## Question" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": { 55 | "id": "N_P2X41TzzUE" 56 | }, 57 | "source": [ 58 | "Fill out the following table of time and space complexities. The complexities should be expressed using big-O notation." 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": { 64 | "id": "FmG4A-jH4BHf" 65 | }, 66 | "source": [ 67 | "**Algorithm** | **Average time** | **Best case time** | **Worst case time** | **Space**\n", 68 | "--- | --- | --- | --- | ---\n", 69 | "Bubble sort | | | |\n", 70 | "Insertion sort | | | |\n", 71 | "Merge sort | | | |\n", 72 | "Quicksort | | | |\n", 73 | "Selection sort | | | |" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "metadata": { 80 | "id": "5dCh0GgB4Ble" 81 | }, 82 | "outputs": [], 83 | "source": [ 84 | "#freetext" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": { 90 | "id": "a-1AbJxY1cn2" 91 | }, 92 | "source": [ 93 | "### Solution" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": { 99 | "id": "9rB0Xoxd1e9V" 100 | }, 101 | "source": [ 102 | "Some of these complexities, such as the [space complexity of quicksort](https://stackoverflow.com/questions/12573330/why-does-quicksort-use-ologn-extra-space), depend on the implementation." 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": { 108 | "id": "CKnJGSOt4FC-" 109 | }, 110 | "source": [ 111 | "**Algorithm** | **Average time** | **Worst case time** | **Best case time** | **Space (all cases)**\n", 112 | "--- | --- | --- | --- | ---\n", 113 | "Bubble sort | $O(n^2)$ | $O(n^2)$ | $O(n)$ | $O(1)$\n", 114 | "Insertion sort | $O(n^2)$ | $O(n^2)$ | $O(n)$ | $O(1)$\n", 115 | "Merge sort | $O(n\\log(n))$ | $O(n\\log(n))$ | $O(n\\log(n))$ | $O(n)$\n", 116 | "Quicksort | $O(n\\log(n))$ | $O(n^2)$ | $O(n\\log(n))$ | $O(\\log(n))$\n", 117 | "Selection sort | $O(n^2)$ | $O(n^2)$ | $O(n^2)$ | $O(1)$" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": { 123 | "id": "KyaJoO3A7vsC" 124 | }, 125 | "source": [ 126 | "## Question" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": { 132 | "id": "gYB2WjkZ7xQq" 133 | }, 134 | "source": [ 135 | "Identify each of the following sorting algorithms, based on the state of the array after each iteration, as one of the following:\n", 136 | "\n", 137 | "- Bubble sort\n", 138 | "- Insertion sort\n", 139 | "- Merge sort\n", 140 | "- Quicksort\n", 141 | "- Selection sort" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": { 147 | "id": "-LtgMUz34KeV" 148 | }, 149 | "source": [ 150 | "The input array is [4, 2, 5, 4, 8, 1].\n", 151 | "\n", 152 | "**Iteration** | **Algorithm 1** | **Algorithm 2** | **Algorithm 3** | **Algorithm 4** | **Algorithm 5**\n", 153 | "--- | --- | --- | --- | --- | ---\n", 154 | "0 | [] | [4] [2] [5] [4] [8] [1] | [2, 4, 4, 5, 1, 8] | [] | [2, 4, 1, 4, 5, 8]\n", 155 | "1 | [4] | [2, 4] [4, 5] [1, 8] | [2, 4, 4, 1, 5, 8] | [1] | [1, 2, 4, 4, 5, 8]\n", 156 | "2 | [2, 4] | [2, 4, 4, 5] [1, 8] | [2, 4, 1, 4, 5, 8] | [1, 2] |\n", 157 | "3 | [2, 4, 5] | [1, 2, 4, 4, 5, 8] | [2, 1, 4, 4, 5, 8] | [1, 2, 4] |\n", 158 | "4 | [2, 4, 4, 5] | | | [1, 2, 4, 4] |\n", 159 | "5 | [2, 4, 4, 5, 8] | | | [1, 2, 4, 4, 5] |\n", 160 | "6 | [1, 2, 4, 4, 5, 8] | | | [1, 2, 4, 4, 5, 8] |" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": { 166 | "id": "b5k7W2QrKvZB" 167 | }, 168 | "source": [ 169 | "### Solution" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": { 175 | "id": "mA_1rbxJKwNg" 176 | }, 177 | "source": [ 178 | "1. Insertion sort\n", 179 | "\n", 180 | " A key indicator here is that the first iteration is an empty array, so the algorithm is either insertion or selection sort. At each iteration, the left-most element of the input array is moved to the output array and inserted at the right point.\n", 181 | "\n", 182 | "1. Merge sort\n", 183 | "\n", 184 | " Merge sort looks different from other algorithms in that it divides the input into singleton arrays then merges them back together.\n", 185 | "\n", 186 | "1. Bubble sort\n", 187 | "\n", 188 | " The array length remains the same at each iteration, as elements are swapped. A notable feature of bubble sort is that at each iteration, the next highest element is moved to the right. For example, 8 is moved to the right after iteration 0, then 5 in iteration 1, then 4 in iteration 2, and so on.\n", 189 | "\n", 190 | "1. Selection sort\n", 191 | "\n", 192 | " Like insertion sort, selection sort is characterized by adding elements to an output array one by one. While insertion sort moves the left-most element of the input array, selection sort moves the minimum element of the input array.\n", 193 | "\n", 194 | "1. Quicksort\n", 195 | "\n", 196 | " This is one of the trickiest to categorize, when the pivot at each iteration is not easy to spot. However, quicksort can be deduced by the process of elimination, since the array length stays the same at each iteration, but is much quicker than bubble sort." 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": { 202 | "id": "QiHsnRUxy0nq" 203 | }, 204 | "source": [ 205 | "## Question" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": { 211 | "id": "OLism73ipj_F" 212 | }, 213 | "source": [ 214 | "One of the clients you work with is a grocery chain *QuickShop*, and the IT department there relies on sorting algorithms to sort its employees (e.g. by name, start date, salary) and products (e.g. by price, units sold, expiration date). " 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "metadata": { 220 | "id": "jBRZnuGnhtb_" 221 | }, 222 | "source": [ 223 | "The IT department at *QuickShop* currently uses bubble sort to do this, but the head of IT is pushing for the company to use merge sort instead, since merge sort is $O(n\\log(n))$, and therefore more time efficient than bubble sort which is $O(n^2)$. The code for both of these algorithms is given here." 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": { 229 | "id": "aF4bpTmiErMP" 230 | }, 231 | "source": [ 232 | "**Bubble sort**" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": null, 238 | "metadata": { 239 | "id": "iIYvBwfqq1-6" 240 | }, 241 | "outputs": [], 242 | "source": [ 243 | "#persistent\n", 244 | "def bubble_sort(arr):\n", 245 | " \"\"\"Sorts an array of integers in ascending order.\"\"\"\n", 246 | " output = arr.copy()\n", 247 | " n = len(arr)\n", 248 | "\n", 249 | " for i in range(n-1):\n", 250 | " no_swaps = True # indicates whether any swaps were made\n", 251 | "\n", 252 | " # Only need to check the sub-array arr[:(n-i)].\n", 253 | " for j in range(n-i-1):\n", 254 | " if output[j+1] \u003c output[j]:\n", 255 | " # Swap the elements if out of order.\n", 256 | " output[j], output[j+1] = output[j+1], output[j]\n", 257 | " no_swaps = False\n", 258 | "\n", 259 | " # If no single pairs are swapped, the array is sorted.\n", 260 | " if no_swaps:\n", 261 | " break\n", 262 | " \n", 263 | " return output" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": { 269 | "id": "8id9WCZxEtA5" 270 | }, 271 | "source": [ 272 | "**Merge sort**" 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": null, 278 | "metadata": { 279 | "id": "1CIFofI2q5J6" 280 | }, 281 | "outputs": [], 282 | "source": [ 283 | "#persistent\n", 284 | "def merge(arr1, arr2):\n", 285 | " \"\"\"Merges two sorted arrays into one such that the final array is sorted.\"\"\"\n", 286 | " \n", 287 | " output = []\n", 288 | " i, j = 0, 0\n", 289 | " \n", 290 | " # Loop through arr1 and arr2 for indices that are in common.\n", 291 | " while i \u003c len(arr1) and j \u003c len(arr2):\n", 292 | " # Add the smaller element to the output.\n", 293 | " if arr1[i] \u003c arr2[j]:\n", 294 | " output.append(arr1[i])\n", 295 | " i += 1\n", 296 | " else:\n", 297 | " output.append(arr2[j])\n", 298 | " j += 1\n", 299 | " \n", 300 | " # Add the rest of arr1, if any, (which is already sorted) to output.\n", 301 | " while i \u003c len(arr1):\n", 302 | " output.append(arr1[i])\n", 303 | " i += 1\n", 304 | " \n", 305 | " # Add the rest of arr2, if any, (which is already sorted) to output.\n", 306 | " while j \u003c len(arr2):\n", 307 | " output.append(arr2[j])\n", 308 | " j += 1\n", 309 | " \n", 310 | " return output\n", 311 | "\n", 312 | "\n", 313 | "def merge_sort(arr):\n", 314 | " \"\"\"Sorts an array of integers in ascending order.\"\"\"\n", 315 | " if len(arr) \u003c 2:\n", 316 | " return arr\n", 317 | "\n", 318 | " # Split the array into two.\n", 319 | " midpoint = len(arr) // 2\n", 320 | " left = arr[:midpoint]\n", 321 | " right = arr[midpoint:]\n", 322 | "\n", 323 | " # Merge the arrays recursively.\n", 324 | " return merge(merge_sort(left), merge_sort(right))" 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "metadata": { 330 | "id": "ZTZOJ4u3EoH4" 331 | }, 332 | "source": [ 333 | "In order to transition from bubble sort to merge sort, the head of IT has to demonstrate to the Chief Technology Officer (CTO) that merge sort is indeed faster than bubble sort, not just theoretically but in practice. To do this, the head of IT has written some code to generate a random list of length `n` and report the runtime for bubble sort and merge sort." 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": null, 339 | "metadata": { 340 | "id": "mv1lI5UIq-x2" 341 | }, 342 | "outputs": [], 343 | "source": [ 344 | "import random\n", 345 | "import time\n", 346 | "\n", 347 | "\n", 348 | "def compare_runtimes(n, a=0, b=999):\n", 349 | " \"\"\"Runtimes for bubble sort and merge_sort for a random n-length arrays.\"\"\"\n", 350 | " random_list = [random.randint(a, b) for _ in range(n)]\n", 351 | "\n", 352 | " bubble_start = time.process_time()\n", 353 | " bubble_sorted = bubble_sort(random_list)\n", 354 | " bubble_time = time.process_time() - bubble_start\n", 355 | "\n", 356 | " merge_start = time.process_time()\n", 357 | " merge_sorted = merge_sort(random_list)\n", 358 | " merge_time = time.process_time() - merge_start\n", 359 | "\n", 360 | " return (bubble_time, merge_time)" 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": { 366 | "id": "f9DNiow1FbQJ" 367 | }, 368 | "source": [ 369 | "This function can then be used to compare the runtime of bubble sort and merge sort for a random array." 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": null, 375 | "metadata": { 376 | "id": "O0YgHvtjsySc" 377 | }, 378 | "outputs": [], 379 | "source": [ 380 | "bubble_time, merge_time = compare_runtimes(100)\n", 381 | "\n", 382 | "print('Bubble sort takes %f seconds. Merge sort takes %f seconds.' %\n", 383 | " (bubble_time, merge_time))" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": { 389 | "id": "s9DrATcgF6Kx" 390 | }, 391 | "source": [ 392 | "To make this more robust, the head of IT wants to generalize this. She wants to have `n_iters` random runs of `compare_runtimes` and calculate the mean runtime of all of the runs for bubble sort and merge sort. Can you modify `compare_runtimes` to accept an `n_iters` parameter and return the mean runtime over `n_iters` iterations?" 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": null, 398 | "metadata": { 399 | "id": "2meFZhtCJKV9" 400 | }, 401 | "outputs": [], 402 | "source": [ 403 | "import random\n", 404 | "import time\n", 405 | "\n", 406 | "\n", 407 | "def mean(arr):\n", 408 | " \"\"\"Mean of an array.\"\"\"\n", 409 | " return 1.0 * sum(arr) / len(arr)\n", 410 | "\n", 411 | "\n", 412 | "def compare_runtimes(n, n_iters, a=0, b=999):\n", 413 | " \"\"\"Mean runtime of n_iters random n-arrays of bubble sort and merge sort.\"\"\"\n", 414 | " # TODO(you): Modify this function to return the mean over n_iters iterations,\n", 415 | " # instead of just one iteration.\n", 416 | " random_list = [random.randint(a, b) for _ in range(n)]\n", 417 | "\n", 418 | " bubble_start = time.process_time()\n", 419 | " bubble_sorted = bubble_sort(random_list)\n", 420 | " bubble_time = time.process_time() - bubble_start\n", 421 | "\n", 422 | " merge_start = time.process_time()\n", 423 | " merge_sorted = merge_sort(random_list)\n", 424 | " merge_time = time.process_time() - merge_start\n", 425 | "\n", 426 | " return (bubble_time, merge_time)" 427 | ] 428 | }, 429 | { 430 | "cell_type": "markdown", 431 | "metadata": { 432 | "id": "oqgCJtM4y_b7" 433 | }, 434 | "source": [ 435 | "### Solution" 436 | ] 437 | }, 438 | { 439 | "cell_type": "markdown", 440 | "metadata": { 441 | "id": "biy5xO5nK3VC" 442 | }, 443 | "source": [ 444 | "You can do this by:\n", 445 | "\n", 446 | "1. Moving the original function code to within a `for _ in range(n_iters)` loop.\n", 447 | "2. Instead of returning the runtime, append it to a list of bubble/merge runtimes.\n", 448 | "3. Return the mean of each runtime array." 449 | ] 450 | }, 451 | { 452 | "cell_type": "code", 453 | "execution_count": null, 454 | "metadata": { 455 | "id": "7Na2KsbQK3z7" 456 | }, 457 | "outputs": [], 458 | "source": [ 459 | "import random\n", 460 | "import time\n", 461 | "\n", 462 | "\n", 463 | "def mean(arr):\n", 464 | " \"\"\"Mean of an array.\"\"\"\n", 465 | " return 1.0 * sum(arr) / len(arr)\n", 466 | "\n", 467 | "\n", 468 | "def compare_runtimes(n, n_iters, a=0, b=999):\n", 469 | " \"\"\"Mean runtime of n_iters random n-arrays of bubble sort and merge sort.\"\"\"\n", 470 | " bubble_times = []\n", 471 | " merge_times = []\n", 472 | "\n", 473 | " for _ in range(n_iters):\n", 474 | " random_list = [random.randint(a, b) for _ in range(n)]\n", 475 | "\n", 476 | " bubble_start = time.process_time()\n", 477 | " bubble_sorted = bubble_sort(random_list)\n", 478 | " bubble_times.append(time.process_time() - bubble_start)\n", 479 | "\n", 480 | " merge_start = time.process_time()\n", 481 | " merge_sorted = merge_sort(random_list)\n", 482 | " merge_times.append(time.process_time() - merge_start)\n", 483 | "\n", 484 | " return (mean(bubble_times), mean(merge_times))" 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": null, 490 | "metadata": { 491 | "id": "9rJ2nt4RMuey" 492 | }, 493 | "outputs": [], 494 | "source": [ 495 | "# Mean runtime over 1000 iterations for 100-arrays, for bubble and merge sort.\n", 496 | "compare_runtimes(100, 1000)" 497 | ] 498 | }, 499 | { 500 | "cell_type": "markdown", 501 | "metadata": { 502 | "id": "XUeS7GLfNQLC" 503 | }, 504 | "source": [ 505 | "Based on the above code, it seems like merge sort outperforms bubble sort on average, at least for arrays of length 100." 506 | ] 507 | }, 508 | { 509 | "cell_type": "markdown", 510 | "metadata": { 511 | "id": "vhrCH5jEy3Xc" 512 | }, 513 | "source": [ 514 | "## Question" 515 | ] 516 | }, 517 | { 518 | "cell_type": "markdown", 519 | "metadata": { 520 | "id": "te0WIsTRM5XG" 521 | }, 522 | "source": [ 523 | "The head of IT can now show the CTO how the average runtime of bubble sort compares to merge sort for a given array length. Now, she wants to be able to show how this changes for different array lengths. She has decided that a visualization is the most appropriate way to show this." 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": { 529 | "id": "YYp3-Rfw4yXa" 530 | }, 531 | "source": [ 532 | "Given the new `compare_runtimes` function you helped her with in the previous question, she has written some code to plot how the mean runtime changes as the input array length changes. (The first two code cells are the solution to the previous problem. The third code cell contains the new code.)" 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": null, 538 | "metadata": { 539 | "id": "I_a8Q7wUyxe6" 540 | }, 541 | "outputs": [], 542 | "source": [ 543 | "#persistent\n", 544 | "import random\n", 545 | "import time\n", 546 | "\n", 547 | "\n", 548 | "def mean(arr):\n", 549 | " \"\"\"Mean of an array.\"\"\"\n", 550 | " return 1.0 * sum(arr) / len(arr)\n", 551 | "\n", 552 | "\n", 553 | "def compare_runtimes(n, n_iters, a=0, b=999):\n", 554 | " \"\"\"Mean runtime of n_iters random n-arrays of bubble sort and merge sort.\"\"\"\n", 555 | " bubble_times = []\n", 556 | " merge_times = []\n", 557 | "\n", 558 | " for _ in range(n_iters):\n", 559 | " random_list = [random.randint(a, b) for _ in range(n)]\n", 560 | "\n", 561 | " bubble_start = time.process_time()\n", 562 | " bubble_sorted = bubble_sort(random_list)\n", 563 | " bubble_times.append(time.process_time() - bubble_start)\n", 564 | "\n", 565 | " merge_start = time.process_time()\n", 566 | " merge_sorted = merge_sort(random_list)\n", 567 | " merge_times.append(time.process_time() - merge_start)\n", 568 | "\n", 569 | " return (mean(bubble_times), mean(merge_times))" 570 | ] 571 | }, 572 | { 573 | "cell_type": "code", 574 | "execution_count": null, 575 | "metadata": { 576 | "id": "8WZtfFqNyxfJ" 577 | }, 578 | "outputs": [], 579 | "source": [ 580 | "# Mean runtime over 1000 iterations for 100-arrays, for bubble and merge sort.\n", 581 | "compare_runtimes(100, 1000)" 582 | ] 583 | }, 584 | { 585 | "cell_type": "code", 586 | "execution_count": null, 587 | "metadata": { 588 | "id": "GxZyhHIcPJdn" 589 | }, 590 | "outputs": [], 591 | "source": [ 592 | "# This code cell takes a few seconds to run.\n", 593 | "\n", 594 | "import matplotlib.pyplot as plt\n", 595 | "\n", 596 | "\n", 597 | "mean_bubble_times = []\n", 598 | "mean_merge_times = []\n", 599 | "x_range = range(10, 50)\n", 600 | "n_iters = 100\n", 601 | "\n", 602 | "for n in x_range:\n", 603 | " mean_bubble_time, mean_merge_time = compare_runtimes(n, n_iters)\n", 604 | " mean_bubble_times.append(mean_bubble_time)\n", 605 | " mean_merge_times.append(mean_merge_time)\n", 606 | "\n", 607 | "\n", 608 | "plt.plot(x_range, mean_bubble_times, label='bubble')\n", 609 | "plt.plot(x_range, mean_merge_times, label='merge')\n", 610 | "plt.xlabel('length of array')\n", 611 | "plt.ylabel('CPU runtime (seconds)')\n", 612 | "plt.legend()\n", 613 | "plt.show()" 614 | ] 615 | }, 616 | { 617 | "cell_type": "markdown", 618 | "metadata": { 619 | "id": "9tvZ-LwgQVOf" 620 | }, 621 | "source": [ 622 | "This is exactly the opposite to what the head of IT wanted to see! Bubble sort, with a time complexity of $O(n^2)$ seems to be consistently quicker than merge sort, with a time complexity of $O(n\\log(n))$. The head of IT has asked you to help her \"fix\" this graph. Can you help her?" 623 | ] 624 | }, 625 | { 626 | "cell_type": "markdown", 627 | "metadata": { 628 | "id": "0ny_NkaYzAM8" 629 | }, 630 | "source": [ 631 | "### Solution" 632 | ] 633 | }, 634 | { 635 | "cell_type": "markdown", 636 | "metadata": { 637 | "id": "DCc-6JkPQtBB" 638 | }, 639 | "source": [ 640 | "In general, for two functions of *n*, *f* and *g*, if *f* has a lower time complexity than *g*, that does *not* necessarily imply that it has a faster runtime for a specific value of *n*. What it implies is that as *n* increases, the runtime of *f* will grow less quickly than the runtime of *g*." 641 | ] 642 | }, 643 | { 644 | "cell_type": "markdown", 645 | "metadata": { 646 | "id": "YRwwfGQ35lBI" 647 | }, 648 | "source": [ 649 | "You might be able to see that in the graph presented to you by the head of IT, even though bubble sort appears to have a faster runtime than merge sort for all array lengths, the gap appears to be closing as $n$ increases. Let's see what happens when we increase `x_range` to values up to 100. " 650 | ] 651 | }, 652 | { 653 | "cell_type": "code", 654 | "execution_count": null, 655 | "metadata": { 656 | "id": "7JFlHG8ASFLa" 657 | }, 658 | "outputs": [], 659 | "source": [ 660 | "# This code cell takes a few seconds to run.\n", 661 | "\n", 662 | "mean_bubble_times = []\n", 663 | "mean_merge_times = []\n", 664 | "x_range = range(10, 100)\n", 665 | "n_iters = 100\n", 666 | "\n", 667 | "for n in x_range:\n", 668 | " mean_bubble_time, mean_merge_time = compare_runtimes(n, n_iters)\n", 669 | " mean_bubble_times.append(mean_bubble_time)\n", 670 | " mean_merge_times.append(mean_merge_time)" 671 | ] 672 | }, 673 | { 674 | "cell_type": "code", 675 | "execution_count": null, 676 | "metadata": { 677 | "id": "rge3KszxSFLd" 678 | }, 679 | "outputs": [], 680 | "source": [ 681 | "import matplotlib.pyplot as plt\n", 682 | "\n", 683 | "\n", 684 | "plt.plot(x_range, mean_bubble_times, label='bubble')\n", 685 | "plt.plot(x_range, mean_merge_times, label='merge')\n", 686 | "plt.xlabel('length of array')\n", 687 | "plt.ylabel('CPU runtime (seconds)')\n", 688 | "plt.legend()\n", 689 | "plt.show()" 690 | ] 691 | }, 692 | { 693 | "cell_type": "markdown", 694 | "metadata": { 695 | "id": "UglKq8nhSPGz" 696 | }, 697 | "source": [ 698 | "Finally! Merge sort seems to be outperforming bubble sort in terms of runtime, for larger values of $n$. Let's push this to 200. (Increasing to more than 200 takes a long time to run.)" 699 | ] 700 | }, 701 | { 702 | "cell_type": "code", 703 | "execution_count": null, 704 | "metadata": { 705 | "id": "RENOcGdcSZaB" 706 | }, 707 | "outputs": [], 708 | "source": [ 709 | "# This code cell takes a few seconds to run.\n", 710 | "\n", 711 | "mean_bubble_times = []\n", 712 | "mean_merge_times = []\n", 713 | "x_range = range(10, 200)\n", 714 | "n_iters = 100\n", 715 | "\n", 716 | "for n in x_range:\n", 717 | " mean_bubble_time, mean_merge_time = compare_runtimes(n, n_iters)\n", 718 | " mean_bubble_times.append(mean_bubble_time)\n", 719 | " mean_merge_times.append(mean_merge_time)" 720 | ] 721 | }, 722 | { 723 | "cell_type": "code", 724 | "execution_count": null, 725 | "metadata": { 726 | "id": "xBEJLv1fSZaH" 727 | }, 728 | "outputs": [], 729 | "source": [ 730 | "import matplotlib.pyplot as plt\n", 731 | "\n", 732 | "\n", 733 | "plt.plot(x_range, mean_bubble_times, label='bubble')\n", 734 | "plt.plot(x_range, mean_merge_times, label='merge')\n", 735 | "plt.xlabel('length of array')\n", 736 | "plt.ylabel('CPU runtime (seconds)')\n", 737 | "plt.legend()\n", 738 | "plt.show()" 739 | ] 740 | }, 741 | { 742 | "cell_type": "markdown", 743 | "metadata": { 744 | "id": "3yNah1IkSpjM" 745 | }, 746 | "source": [ 747 | "This graph shows clearly that merge sort has a lower runtime than bubble sort at large values of $n$. The \"crossover\" (where the runtimes are approximately equal) occurs at about $n=60$, and from then on, the runtime of bubble sort grows significantly more quickly than that of merge sort.\n", 748 | "\n", 749 | "When the head of IT shows the CTO of *QuickShop* this graph, the leadership is convinced that choosing merge sort over bubble sort will save time and computational resources going forward, so the company decides to go ahead with the transition." 750 | ] 751 | } 752 | ], 753 | "metadata": { 754 | "colab": { 755 | "collapsed_sections": [], 756 | "name": "comparing_sorting_algorithms.ipynb", 757 | "private_outputs": true, 758 | "provenance": [ ] 759 | }, 760 | "kernelspec": { 761 | "display_name": "Python 3", 762 | "name": "python3" 763 | } 764 | }, 765 | "nbformat": 4, 766 | "nbformat_minor": 0 767 | } 768 | -------------------------------------------------------------------------------- /6_search_algorithms/60_overview.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "oTveQpgkF5MB" 7 | }, 8 | "source": [ 9 | "# Search Algorithms" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "yMrRImm-ycDx" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "ZDhOz33HH8BH" 25 | }, 26 | "source": [ 27 | "\u003e An algorithm that searches an array of integers for a given integer is called a **search algorithm**." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "id": "BaFdfJTEyfGK" 34 | }, 35 | "source": [ 36 | "Depending on the implementation, search algorithms can return either:\n", 37 | "\n", 38 | "- A boolean indicating whether the search element is found or not\n", 39 | "\n", 40 | " ```python\n", 41 | " def is_in(arr, v):\n", 42 | " \"\"\"Return True if v is in arr, False otherwise.\"\"\"\n", 43 | " # ...\n", 44 | " \n", 45 | " is_in([1, 2, 4, 5], 4)\n", 46 | " # True\n", 47 | " is_in([1, 2, 4, 5], 3)\n", 48 | " # False\n", 49 | " ```\n", 50 | "\n", 51 | "- An integer indicating the index of where the search element is found (and -1 or an error if the search element is not found)\n", 52 | "\n", 53 | " ```python\n", 54 | " def search(arr, v):\n", 55 | " \"\"\"Returns the index of v in arr if v is contained in arr, otherwise -1.\"\"\"\n", 56 | " # ...\n", 57 | "\n", 58 | " search([1, 2, 4, 5], 4)\n", 59 | " # 2\n", 60 | " search([1, 2, 4, 5], 3)\n", 61 | " # -1\n", 62 | " ```\n", 63 | "\n", 64 | "The lessons in this unit cover the following search algorithms, which are by far the most commonly used:\n", 65 | "\n", 66 | "1. Linear search\n", 67 | "1. Binary search" 68 | ] 69 | } 70 | ], 71 | "metadata": { 72 | "colab": { 73 | "collapsed_sections": [], 74 | "name": "search_algorithms_overview.ipynb", 75 | "private_outputs": true, 76 | "provenance": [ ] 77 | }, 78 | "kernelspec": { 79 | "display_name": "Python 3", 80 | "name": "python3" 81 | } 82 | }, 83 | "nbformat": 4, 84 | "nbformat_minor": 0 85 | } 86 | -------------------------------------------------------------------------------- /6_search_algorithms/61_linear_search.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "fkgw_3aiIDZZ" 7 | }, 8 | "source": [ 9 | "# Linear Search" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "qVe_nsOAKQds" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "x4iesNCiKUaw" 25 | }, 26 | "source": [ 27 | "Linear search is an algorithm that searches an array for a given value. If the array contains the given value, the algorithm returns the lowest index at which the value is found. If not, the algorithm usually returns -1 or an error." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "id": "6EChE4XBOU9U" 34 | }, 35 | "source": [ 36 | "For example, searching for `2` in the array `[5, 2, 1, 2]` returns 1 (since the first instance of `2` is at index 1), whereas searching for `2` in the array `[1, 3]` returns -1 (since `2` is not in the array).\n", 37 | "\n", 38 | "Linear search is one of the simplest searching algorithms to implement, and, as the name suggests, is $O(n)$ (linear time) in the average case." 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": { 44 | "id": "RlzeISxrAPOU" 45 | }, 46 | "source": [ 47 | "### Algorithm" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": { 53 | "id": "vy1CagncLq9B" 54 | }, 55 | "source": [ 56 | "Suppose an array `arr` is being searched for a value `v`. This is an implementation of linear search that returns -1 if `v` is not contained in `arr`." 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": { 62 | "id": "QIJ36oSzARD5" 63 | }, 64 | "source": [ 65 | "0. *Initialize:* Set $i = 0$.\n", 66 | "1. *Repeat:* Inspect the element with index $i$.\n", 67 | " - If the element is equal to `v`, return $i$ and exit.\n", 68 | " - Otherwise, increment $i$ by 1.\n", 69 | "2. *Exit:* If index $i$ does not exist in `arr`, that means you have iterated through the entire array and not found `v`, so return -1." 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": { 75 | "id": "BzXugkDzKjkS" 76 | }, 77 | "source": [ 78 | "## Question 1" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": { 84 | "id": "MNJ6MvzrKk4T" 85 | }, 86 | "source": [ 87 | "Write an iterative algorithm that implements linear search." 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": null, 93 | "metadata": { 94 | "id": "DesT9ts7LJSG" 95 | }, 96 | "outputs": [], 97 | "source": [ 98 | "def linear_search(arr, v):\n", 99 | " \"\"\"Searches a list of integers arr for a value v.\"\"\"\n", 100 | " # TODO(you): Implement" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": { 106 | "id": "cA5aoJDILFiF" 107 | }, 108 | "source": [ 109 | "### Hint" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": { 115 | "id": "6W7aIv_ILGRH" 116 | }, 117 | "source": [ 118 | "Remember to return the *index* of the found element, and -1 if the element is not found." 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": { 124 | "id": "IGy277zmK-hL" 125 | }, 126 | "source": [ 127 | "### Unit Tests\n", 128 | "\n", 129 | "Run the following cell to check your answer against some unit tests." 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": null, 135 | "metadata": { 136 | "id": "PnU5BhL4Ls6I" 137 | }, 138 | "outputs": [], 139 | "source": [ 140 | "print(linear_search([1, 2, 3, 5, 6, 8, 9], 2))\n", 141 | "# Should print: 1\n", 142 | "\n", 143 | "print(linear_search([1, 3, 5, 6, 8, 9], 7))\n", 144 | "# Should print: -1" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": { 150 | "id": "rWcDSx2vK2Ko" 151 | }, 152 | "source": [ 153 | "### Solution" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": null, 159 | "metadata": { 160 | "id": "YQFm5FKRKupT" 161 | }, 162 | "outputs": [], 163 | "source": [ 164 | "def linear_search(arr, v):\n", 165 | " \"\"\"Searches a list of integers arr for a value v.\"\"\"\n", 166 | " for i in range(len(arr)):\n", 167 | " if arr[i] == v:\n", 168 | " return i\n", 169 | "\n", 170 | " return -1" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": { 176 | "id": "Bc7UaUFAL5pJ" 177 | }, 178 | "source": [ 179 | "## Question 2" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": { 185 | "id": "CjijCU60PIKu" 186 | }, 187 | "source": [ 188 | "What is the best case time complexity of `linear_search`? In what case does this occur?" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": { 195 | "id": "-vY28LIeXuxq" 196 | }, 197 | "outputs": [], 198 | "source": [ 199 | "#freetext" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": { 205 | "id": "Fjj416UZMeBC" 206 | }, 207 | "source": [ 208 | "### Solution" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": { 214 | "id": "9eysDwUsMfwd" 215 | }, 216 | "source": [ 217 | "In the best case, `linear_search` finds `v` in the very first iteration of the `for` loop, at index 0. In this case, the algorithm requires 1 iteration, so has a time complexity of $O(1)$." 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": { 223 | "id": "pWXluPLt74SS" 224 | }, 225 | "source": [ 226 | "## Question 3" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": { 232 | "id": "xQYemZYS75tp" 233 | }, 234 | "source": [ 235 | "What is the worst case time complexity of `linear_search`? In what case does this occur?" 236 | ] 237 | }, 238 | { 239 | "cell_type": "code", 240 | "execution_count": null, 241 | "metadata": { 242 | "id": "DLDs81yDXzMY" 243 | }, 244 | "outputs": [], 245 | "source": [ 246 | "#freetext" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": { 252 | "id": "DNoeZqqz7-_2" 253 | }, 254 | "source": [ 255 | "### Solution" 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": { 261 | "id": "fvRETK4V7_yp" 262 | }, 263 | "source": [ 264 | "In the worst case, `linear_search` needs to check every element of `arr` before either concluding that `v` is not in `arr` or that `v` is the last element of `arr`. In this case, the algorithm requires all $n$ iterations of the `for` loop (where $n$ is the length of `arr`), so has a time complexity of $O(n)$." 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": { 270 | "id": "e7_WHZm7WCrO" 271 | }, 272 | "source": [ 273 | "## Question 4" 274 | ] 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "metadata": { 279 | "id": "U2ZZnjMQWGFA" 280 | }, 281 | "source": [ 282 | "Your local public library has a scanning system that keeps track of all books at the library. Each book has a unique book number (for example, all copies of *The Bell Jar* by Sylvia Plath have the number 1842659).\n", 283 | "\n", 284 | "The library has asked you to write a function to count the number of copies of a given book number that are currently loaned out. The books that are loaned out are stored in an array. Use the principles of `linear_search` to write a `count_occurrences` function that counts the number of times an integer appears in an array." 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": null, 290 | "metadata": { 291 | "id": "nfOddifSnXMR" 292 | }, 293 | "outputs": [], 294 | "source": [ 295 | "def count_occurrences(arr, v):\n", 296 | " \"\"\"Returns the number of times the value v appears in arr.\"\"\"\n", 297 | " # TODO(you): Implement" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": { 303 | "id": "qGci-Y1KnfAQ" 304 | }, 305 | "source": [ 306 | "### Hint" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": { 312 | "id": "wp1bRGtznf2R" 313 | }, 314 | "source": [ 315 | "Modify your `linear_search` function, but instead of returning the index found, increment a counter." 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": null, 321 | "metadata": { 322 | "id": "vFL3Smjeni6T" 323 | }, 324 | "outputs": [], 325 | "source": [ 326 | "def linear_search(arr, v):\n", 327 | " \"\"\"Searches a list of integers arr for a value v.\"\"\"\n", 328 | " for i in range(len(arr)):\n", 329 | " if arr[i] == v:\n", 330 | " return i\n", 331 | "\n", 332 | " return -1" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": { 338 | "id": "I8MO9m3-A1J-" 339 | }, 340 | "source": [ 341 | "### Unit Tests\n", 342 | "\n", 343 | "Run the following cell to check your answer against some unit tests." 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": null, 349 | "metadata": { 350 | "id": "-7Ef6ht9A2e2" 351 | }, 352 | "outputs": [], 353 | "source": [ 354 | "print(count_occurrences([1, 2, 1, 3, 1, 4, 1, 5], 1))\n", 355 | "# Should print: 4\n", 356 | "\n", 357 | "print(count_occurrences([1, 2, 1, 3, 1, 4, 1, 5], 6))\n", 358 | "# Should print: 0" 359 | ] 360 | }, 361 | { 362 | "cell_type": "markdown", 363 | "metadata": { 364 | "id": "y7hXLXrvWEaR" 365 | }, 366 | "source": [ 367 | "### Solution" 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": null, 373 | "metadata": { 374 | "id": "Lfhe_5Q6nrnS" 375 | }, 376 | "outputs": [], 377 | "source": [ 378 | "def count_occurrences(arr, v):\n", 379 | " \"\"\"Returns the number of times the value v appears in arr.\"\"\"\n", 380 | " count = 0\n", 381 | "\n", 382 | " for i in arr:\n", 383 | " if i == v:\n", 384 | " count += 1\n", 385 | " \n", 386 | " return count" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": { 392 | "id": "FQfcdx5Mko4V" 393 | }, 394 | "source": [ 395 | "## Question 5" 396 | ] 397 | }, 398 | { 399 | "cell_type": "markdown", 400 | "metadata": { 401 | "id": "yu2ZPAWokp54" 402 | }, 403 | "source": [ 404 | "What optimizations can be made to the `linear_search` function if the input array is known to be pre-sorted (from lowest to highest)? Write a new function `linear_search_sorted` that accepts a pre-sorted array as an input." 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": null, 410 | "metadata": { 411 | "id": "SO9hRlvVkn9a" 412 | }, 413 | "outputs": [], 414 | "source": [ 415 | "def linear_search_sorted(arr, v):\n", 416 | " \"\"\"Searches a sorted list of integers arr for a value v.\"\"\"\n", 417 | " # TODO(you): Below is the linear_search code. Optimize it for a sorted input.\n", 418 | " for i in range(len(arr)):\n", 419 | " if arr[i] == v:\n", 420 | " return i\n", 421 | "\n", 422 | " return -1" 423 | ] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "metadata": { 428 | "id": "kfmFzBbX9hHd" 429 | }, 430 | "source": [ 431 | "### Hint" 432 | ] 433 | }, 434 | { 435 | "cell_type": "markdown", 436 | "metadata": { 437 | "id": "Bvu0rNsvk7en" 438 | }, 439 | "source": [ 440 | "If the input array is sorted, then as soon as the `for` loop hits a value greater than the search value `v` (if not already found), the algorithm should exit." 441 | ] 442 | }, 443 | { 444 | "cell_type": "markdown", 445 | "metadata": { 446 | "id": "c8E_AV7Yk6d3" 447 | }, 448 | "source": [ 449 | "### Solution" 450 | ] 451 | }, 452 | { 453 | "cell_type": "code", 454 | "execution_count": null, 455 | "metadata": { 456 | "id": "znqe2Sp1lWkZ" 457 | }, 458 | "outputs": [], 459 | "source": [ 460 | "def linear_search_sorted(arr, v):\n", 461 | " \"\"\"Searches a sorted list of integers arr for a value v.\"\"\"\n", 462 | " for i in range(len(arr)):\n", 463 | " if arr[i] == v:\n", 464 | " return i\n", 465 | " # Add the following if statement.\n", 466 | " if arr[i] \u003e v:\n", 467 | " return -1\n", 468 | "\n", 469 | " return -1" 470 | ] 471 | }, 472 | { 473 | "cell_type": "markdown", 474 | "metadata": { 475 | "id": "B9NwbF3DtF6X" 476 | }, 477 | "source": [ 478 | "## Question 6" 479 | ] 480 | }, 481 | { 482 | "cell_type": "markdown", 483 | "metadata": { 484 | "id": "D9uOku6ctGyV" 485 | }, 486 | "source": [ 487 | "Linear search can be adapted to work on a linked list of integers." 488 | ] 489 | }, 490 | { 491 | "cell_type": "markdown", 492 | "metadata": { 493 | "id": "nisoFZv-YCyj" 494 | }, 495 | "source": [ 496 | "Below is an implementation of a linked list from a previous lesson. Add a `search` method that uses linear search to search the linked list for a given value. If the value is in the linked list, return the index. If the value is not in the linked list, return -1." 497 | ] 498 | }, 499 | { 500 | "cell_type": "code", 501 | "execution_count": null, 502 | "metadata": { 503 | "id": "dV9DdYR84gk0" 504 | }, 505 | "outputs": [], 506 | "source": [ 507 | "class LinkedListElement:\n", 508 | "\n", 509 | " def __init__(self, value):\n", 510 | " self.value = value\n", 511 | " self.next = None" 512 | ] 513 | }, 514 | { 515 | "cell_type": "code", 516 | "execution_count": null, 517 | "metadata": { 518 | "id": "-EAkbR1akdKS" 519 | }, 520 | "outputs": [], 521 | "source": [ 522 | "class LinkedList:\n", 523 | "\n", 524 | " def __init__(self):\n", 525 | " self.first = None\n", 526 | "\n", 527 | " def search(self, v):\n", 528 | " #TODO(you): Implement " 529 | ] 530 | }, 531 | { 532 | "cell_type": "markdown", 533 | "metadata": { 534 | "id": "MVe3gLeft41N" 535 | }, 536 | "source": [ 537 | "### Hint" 538 | ] 539 | }, 540 | { 541 | "cell_type": "markdown", 542 | "metadata": { 543 | "id": "-hKM5xqDt71a" 544 | }, 545 | "source": [ 546 | "Use the following code scaffolding.\n", 547 | "\n", 548 | "```python\n", 549 | "def search(self, v):\n", 550 | " elem = self.first\n", 551 | " while elem is not None:\n", 552 | " ...\n", 553 | " elem = elem.next\n", 554 | " return -1\n", 555 | "```\n", 556 | "\n", 557 | "In order to return the index, you may need to introduce a `counter` variable." 558 | ] 559 | }, 560 | { 561 | "cell_type": "markdown", 562 | "metadata": { 563 | "id": "5XP2wGfbvVr3" 564 | }, 565 | "source": [ 566 | "### Unit Tests\n", 567 | "\n", 568 | "Run the following cell to check your answer against some unit tests." 569 | ] 570 | }, 571 | { 572 | "cell_type": "code", 573 | "execution_count": null, 574 | "metadata": { 575 | "id": "o4ohicR6vXLf" 576 | }, 577 | "outputs": [], 578 | "source": [ 579 | "lle1 = LinkedListElement(1)\n", 580 | "lle1.next = LinkedListElement(2)\n", 581 | "\n", 582 | "lle2 = LinkedListElement(3)\n", 583 | "lle2.next = LinkedListElement(5)\n", 584 | "lle2.next.next = LinkedListElement(6)\n", 585 | "lle2.next.next.next = LinkedListElement(8)\n", 586 | "lle2.next.next.next.next = LinkedListElement(9)\n", 587 | "\n", 588 | "lle1.next.next = lle2\n", 589 | "\n", 590 | "ll = LinkedList()\n", 591 | "ll.first = lle1\n", 592 | "print(ll.search(2))\n", 593 | "# Should print: 1\n", 594 | "\n", 595 | "lle1.next = lle2\n", 596 | "print(ll.search(7))\n", 597 | "# Should print: -1" 598 | ] 599 | }, 600 | { 601 | "cell_type": "markdown", 602 | "metadata": { 603 | "id": "uv8BAF4VuZnG" 604 | }, 605 | "source": [ 606 | "### Solution" 607 | ] 608 | }, 609 | { 610 | "cell_type": "code", 611 | "execution_count": null, 612 | "metadata": { 613 | "id": "eTpnb_1Kubp6" 614 | }, 615 | "outputs": [], 616 | "source": [ 617 | "class LinkedList:\n", 618 | "\n", 619 | " def __init__(self):\n", 620 | " self.first = None\n", 621 | "\n", 622 | " def search(self, v):\n", 623 | " elem = self.first\n", 624 | " counter = 0\n", 625 | " while elem is not None:\n", 626 | " if elem.value == v:\n", 627 | " return counter\n", 628 | " counter += 1\n", 629 | " elem = elem.next\n", 630 | " return -1" 631 | ] 632 | }, 633 | { 634 | "cell_type": "markdown", 635 | "metadata": { 636 | "id": "usWcDHY1TYkx" 637 | }, 638 | "source": [ 639 | "## Question 7" 640 | ] 641 | }, 642 | { 643 | "cell_type": "markdown", 644 | "metadata": { 645 | "id": "7fHt3EJ7TaXd" 646 | }, 647 | "source": [ 648 | "Linear search can also be implemented using recursion. Below is an implementation of a recursive linear search. However, it is producing some weird results. What is the bug in this code? Can you fix it?" 649 | ] 650 | }, 651 | { 652 | "cell_type": "code", 653 | "execution_count": null, 654 | "metadata": { 655 | "id": "Eq8cLn1lThC-" 656 | }, 657 | "outputs": [], 658 | "source": [ 659 | "def linear_search_recursive(arr, v, index = 0):\n", 660 | " \"\"\"Searches a list of integers arr for a value v using recursion.\"\"\"\n", 661 | " if len(arr) == 0:\n", 662 | " return -1\n", 663 | "\n", 664 | " if arr[0] == v:\n", 665 | " return index\n", 666 | "\n", 667 | " return linear_search_recursive(arr[1:], v, index)\n", 668 | "\n", 669 | "\n", 670 | "print(linear_search_recursive(\n", 671 | " [1, 2, 3, 5, 6, 8, 9], 2)) # returns 0, should return 1\n", 672 | "print(linear_search_recursive(\n", 673 | " [1, 2, 3, 5, 6, 8, 9], 5)) # returns 0, should return 4" 674 | ] 675 | }, 676 | { 677 | "cell_type": "markdown", 678 | "metadata": { 679 | "id": "KHBUQXL7CFWv" 680 | }, 681 | "source": [ 682 | "### Hint" 683 | ] 684 | }, 685 | { 686 | "cell_type": "markdown", 687 | "metadata": { 688 | "id": "ALuC7Li3CGUO" 689 | }, 690 | "source": [ 691 | "What is the purpose of the `index` parameter? Why is it necessary? What is the value of `index` at each recursion?" 692 | ] 693 | }, 694 | { 695 | "cell_type": "markdown", 696 | "metadata": { 697 | "id": "9WxJ3aSLToLq" 698 | }, 699 | "source": [ 700 | "### Unit Tests\n", 701 | "\n", 702 | "Run the following cell to check your answer against some unit tests." 703 | ] 704 | }, 705 | { 706 | "cell_type": "code", 707 | "execution_count": null, 708 | "metadata": { 709 | "id": "3iBsiNYAToLs" 710 | }, 711 | "outputs": [], 712 | "source": [ 713 | "print(linear_search_recursive([1, 2, 3, 5, 6, 8, 9], 2))\n", 714 | "# Should print: 1\n", 715 | "\n", 716 | "print(linear_search_recursive([1, 2, 3, 5, 6, 8, 9], 5))\n", 717 | "# Should print: 3\n", 718 | "\n", 719 | "print(linear_search_recursive([1, 3, 5, 6, 8, 9], 7))\n", 720 | "# Should print: -1" 721 | ] 722 | }, 723 | { 724 | "cell_type": "markdown", 725 | "metadata": { 726 | "id": "GpeBSP2nToLw" 727 | }, 728 | "source": [ 729 | "### Solution" 730 | ] 731 | }, 732 | { 733 | "cell_type": "markdown", 734 | "metadata": { 735 | "id": "ftb5Vb87pAHC" 736 | }, 737 | "source": [ 738 | "This is a very subtle bug. Identifying and fixing it relies on understanding the utility of the `index` parameter. Recursive functions often rely on an index or counter parameter that is altered at each recursive step. \n", 739 | "\n", 740 | "In the case of `linear_search_recursive`, when `v` is found in `arr`, `index` is returned. Since `index` is initialized at 0 and never changed, this implementation will always return 0 if `v` is found and -1 if `v` is not found.\n", 741 | "\n", 742 | "In order to fix this, `index` must be incremented at each recursion." 743 | ] 744 | }, 745 | { 746 | "cell_type": "code", 747 | "execution_count": null, 748 | "metadata": { 749 | "id": "Z8YIbomiToLx" 750 | }, 751 | "outputs": [], 752 | "source": [ 753 | "def linear_search_recursive(arr, v, index = 0):\n", 754 | " \"\"\"Searches a list of integers arr for a value v using recursion.\"\"\"\n", 755 | " if len(arr) == 0:\n", 756 | " return -1\n", 757 | "\n", 758 | " if arr[0] == v:\n", 759 | " return index\n", 760 | "\n", 761 | " # Change index to index + 1.\n", 762 | " return linear_search_recursive(arr[1:], v, index + 1)" 763 | ] 764 | }, 765 | { 766 | "cell_type": "markdown", 767 | "metadata": { 768 | "id": "jUCnC4qXNnDE" 769 | }, 770 | "source": [ 771 | "## Question 8" 772 | ] 773 | }, 774 | { 775 | "cell_type": "markdown", 776 | "metadata": { 777 | "id": "QSvDTlaHO7eb" 778 | }, 779 | "source": [ 780 | "[Advanced] Why is the average case time complexity of linear search linear?" 781 | ] 782 | }, 783 | { 784 | "cell_type": "code", 785 | "execution_count": null, 786 | "metadata": { 787 | "id": "Uy8G658HYvOZ" 788 | }, 789 | "outputs": [], 790 | "source": [ 791 | "#freetext" 792 | ] 793 | }, 794 | { 795 | "cell_type": "markdown", 796 | "metadata": { 797 | "id": "cRvrTBmt8h-t" 798 | }, 799 | "source": [ 800 | "### Hint" 801 | ] 802 | }, 803 | { 804 | "cell_type": "markdown", 805 | "metadata": { 806 | "id": "Ce1zG-MY8iyv" 807 | }, 808 | "source": [ 809 | "Consider two cases separately:\n", 810 | "\n", 811 | "- If `v` is in `arr`\n", 812 | "- If `v` is not in `arr`\n", 813 | "\n", 814 | "Calculate the average case time complexity under each case, and show that both are $O(n)$. If the average case time complexity for both cases is $O(n)$, the average case time complexity averaged across all cases must also be $O(n)$." 815 | ] 816 | }, 817 | { 818 | "cell_type": "markdown", 819 | "metadata": { 820 | "id": "TtghL7B_O50_" 821 | }, 822 | "source": [ 823 | "### Solution" 824 | ] 825 | }, 826 | { 827 | "cell_type": "markdown", 828 | "metadata": { 829 | "id": "rXBfjokNPXfv" 830 | }, 831 | "source": [ 832 | "Let's first assume that `v` is in `arr`. In the average case, `v` has an equal probability of being in any index, namely $\\frac{1}{n}$.\n", 833 | "\n", 834 | "If `v` is in index 0, `linear_search` takes 1 iteration. If `v` is in index 1, `linear_search` takes 2 iterations. In general, if `v` is in index $i$, `linear_search` takes $i+1$ iterations. Since the probability of `v` being at any specific index is $\\frac{1}{n}$, the average case complexity is\n", 835 | "\n", 836 | "$$ \\frac{1}{n} (1 + 2 + ... + n). $$\n", 837 | "\n", 838 | "Using the formula that $\\sum\\limits_{i=1}^n i = \\frac{n(n+1)}{2}$, this is\n", 839 | "\n", 840 | "\\begin{align*}\n", 841 | "\\frac{1}{n} \\frac{n(n+1)}{2} \u0026= \\frac{n+1}{2} \\\\\n", 842 | "\u0026= \\frac{n}{2} + \\frac{1}{2} \\\\\n", 843 | "\u0026= O(n). \\\\\n", 844 | "\\end{align*}\n", 845 | "\n", 846 | "Therefore, if `v` is in `arr`, the average case time complexity is $O(n)$.\n", 847 | "\n", 848 | "Now, assume that `v` is not in `arr`. This is the worst case of linear search, and as per a previous question, and the time complexity $O(n)$. Since the average case time complexity is $O(n)$ in both cases (whether `v` is in `arr` or not), the average case time complexity of linear search is $O(n)$." 849 | ] 850 | } 851 | ], 852 | "metadata": { 853 | "colab": { 854 | "collapsed_sections": [ 855 | "cA5aoJDILFiF", 856 | "IGy277zmK-hL", 857 | "rWcDSx2vK2Ko", 858 | "Fjj416UZMeBC", 859 | "DNoeZqqz7-_2", 860 | "qGci-Y1KnfAQ", 861 | "I8MO9m3-A1J-", 862 | "y7hXLXrvWEaR", 863 | "kfmFzBbX9hHd", 864 | "c8E_AV7Yk6d3", 865 | "MVe3gLeft41N", 866 | "5XP2wGfbvVr3", 867 | "uv8BAF4VuZnG", 868 | "KHBUQXL7CFWv", 869 | "9WxJ3aSLToLq", 870 | "GpeBSP2nToLw", 871 | "cRvrTBmt8h-t", 872 | "TtghL7B_O50_" 873 | ], 874 | "name": "linear_search.ipynb", 875 | "private_outputs": true, 876 | "provenance": [ ] 877 | }, 878 | "kernelspec": { 879 | "display_name": "Python 3", 880 | "name": "python3" 881 | } 882 | }, 883 | "nbformat": 4, 884 | "nbformat_minor": 0 885 | } 886 | -------------------------------------------------------------------------------- /7_graphs/70_overview.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "7ty-Y67T_n4N" 7 | }, 8 | "source": [ 9 | "# Graphs" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "id": "2uy_htdAyoj4" 16 | }, 17 | "source": [ 18 | "## Lesson Overview" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "pco7lYEp_rbl" 25 | }, 26 | "source": [ 27 | "In computer science, a graph is a collection of objects and connections, if they exist, between those objects." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "id": "T7x0qn6GytV-" 34 | }, 35 | "source": [ 36 | "While this concept is simple, graphs have remarkably powerful applications in the real world:\n", 37 | "\n", 38 | "- Social networks are graphs, in which a person is connected to another person if they are friends or if they follow each other.\n", 39 | "\n", 40 | "- Flight maps are graphs, in which each airport is conncted to another airport if there is a direct flight between the two airports.\n", 41 | "\n", 42 | "- The teams in the NBA form a graph, in which each team is connected to the other teams in the same division.\n", 43 | "\n", 44 | "In the following lessons, you will learn how to represent and manipulate graphs in code, search graphs for specific values, and practice using graphs to model real-world examples." 45 | ] 46 | } 47 | ], 48 | "metadata": { 49 | "colab": { 50 | "collapsed_sections": [], 51 | "name": "graphs_overview.ipynb", 52 | "private_outputs": true, 53 | "provenance": [ ] 54 | }, 55 | "kernelspec": { 56 | "display_name": "Python 3", 57 | "name": "python3" 58 | } 59 | }, 60 | "nbformat": 4, 61 | "nbformat_minor": 0 62 | } 63 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # How to Contribute 2 | 3 | We'd love to accept your patches and contributions to this project. There are 4 | just a few small guidelines you need to follow. 5 | 6 | ## Contributor License Agreement 7 | 8 | Contributions to this project must be accompanied by a Contributor License 9 | Agreement (CLA). You (or your employer) retain the copyright to your 10 | contribution; this simply gives us permission to use and redistribute your 11 | contributions as part of the project. Head over to 12 | to see your current agreements on file or 13 | to sign a new one. 14 | 15 | You generally only need to submit a CLA once, so if you've already submitted one 16 | (even if it was for a different project), you probably don't need to do it 17 | again. 18 | 19 | ## Code Reviews 20 | 21 | All submissions, including submissions by project members, require review. We 22 | use GitHub pull requests for this purpose. Consult 23 | [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more 24 | information on using pull requests. 25 | 26 | ## Community Guidelines 27 | 28 | This project follows 29 | [Google's Open Source Community Guidelines](https://opensource.google/conduct/). 30 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [yyyy] [name of copyright owner] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Applied Data Structures & Algorithms 2 | 3 | ## Mission 4 | 5 | This curriculum is a supplemental educational program that is designed to 6 | complement courses at the university level covering Data Structures & 7 | Algorithms. The lessons within this curriculum are designed to guide students 8 | through application exercises to reinforce learnings from the classroom. 9 | 10 | ## Licensing Information 11 | 12 | All course content (Colabs, slides, guides, and materials) are open sourced 13 | under the 14 | [CC-BY-4.0 International license](https://creativecommons.org/licenses/by/4.0/). 15 | All code contained in this course is open sourced under the 16 | [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0). 17 | --------------------------------------------------------------------------------