Trie data structure

├── .gitignore ├── trie ├── trie.html ├── trie_recursive.js └── README.markdown ├── LICENSE ├── selection_sort ├── selection_sort_imperative.rb └── README.markdown ├── binary_search ├── binary_search_iterative.py └── README.markdown ├── stack ├── stack_functional.pl └── README.markdown ├── singly_linked_list ├── singly_linked_list_object_oriented.php └── README.markdown └── README.markdown /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | -------------------------------------------------------------------------------- /trie/trie.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Trie data structure 6 | 7 | 8 |

This is an example of the trie data structure.

9 |

Open your web browser's developer tools to examine the JavaScript code in a debugger.

10 | 11 | 12 | 13 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to 25 | -------------------------------------------------------------------------------- /selection_sort/selection_sort_imperative.rb: -------------------------------------------------------------------------------- 1 | # 2 | # Selection sort algorithm, imperative implementation. 3 | # 4 | # The selection sort algorithm is a simple technique that takes a set 5 | # of unordered items and orders them according to some characteristic 6 | # that can be compared against each other item. Common examples of 7 | # sort order are "highest to lowest" or "alphabetically." 8 | # 9 | # Author:: Meitar "maymay" Moscovitz 10 | # License:: GPL-3.0 11 | # 12 | # :main:selection_sort_imperative.rb 13 | # 14 | 15 | # Alphabetizes the given +items+ using selection sort. The +items+ are 16 | # deleted, and the newly ordered items are returned in a copied array. 17 | def alphabetize_using_selection_sort (items) 18 | 19 | # Make a new array to hold the ordered data as we select them. 20 | ordered_items = Array.new 21 | 22 | # As we select items, we'll be removing those items from the 23 | # original input. When we have no more items in our original input, 24 | # we can assume we have successfully ordered them so we'll be done. 25 | while items.length > 0 26 | 27 | # We have to remember which item in the data we've selected, so we 28 | # start by "selecting" the item in the first position. 29 | selection = 0 30 | 31 | # For each of the items we still need to order, 32 | items.each do |item| 33 | 34 | # check if that item is "less than" (alphabetically earlier) 35 | # than the item we've currently "selected" 36 | if item < items[selection] 37 | # and if it is, remember that item's position. 38 | selection = items.find_index(item) 39 | end 40 | 41 | end 42 | 43 | # Once we've read through all the items, we have the position of 44 | # the item that meets the criteria we want, so we can put that 45 | # item at the end of our new array, 46 | ordered_items.push(items[selection]) 47 | # and remove that item from the original data. 48 | items.delete_at(selection) 49 | 50 | end 51 | 52 | # Now we have a new, ordered array. :) 53 | return ordered_items 54 | 55 | end 56 | 57 | print 'Enter a few words to alphabetize: ' 58 | words = gets.chomp.split(' ') 59 | puts alphabetize_using_selection_sort(words) 60 | -------------------------------------------------------------------------------- /trie/trie_recursive.js: -------------------------------------------------------------------------------- 1 | /** 2 | * Trie data strcuture, recursive implementation. 3 | * 4 | * A trie (pronounced "try," shortened name from the word "reTRIEval") 5 | * is a kind of "tree" data structure useful for quickly retrieving 6 | * a result based on a given starting input (the prefix). 7 | * 8 | * For example, when you start typing into a search box and are shown 9 | * auto-completed search suggestions almost instantly, there's likely 10 | * a trie somewhere under the hood. 11 | * 12 | * @author Meitar "maymay" Moscovitz 13 | * 14 | * @see {@link https://en.wikipedia.org/wiki/Trie} 15 | */ 16 | 17 | // Store these words in a trie. 18 | var words = ['cat', 'caper', 'dark', 'dapper']; 19 | 20 | // The trie object itself. 21 | var trie = {}; // It begins empty, since there's no data in the trie. 22 | 23 | // Each word needs to be added to the trie. 24 | words.forEach(function (word) { 25 | addToTrie(word, trie); 26 | }); 27 | 28 | /** 29 | * Takes a string and adds it to the given trie. 30 | * 31 | * Strings are not added to tries as their original string. 32 | * Instead, they are broken into their constituent characters 33 | * and each character is added to the trie as part of the trie's 34 | * tree structure. 35 | * 36 | * @param {string} string 37 | * @param {Object} trie 38 | * 39 | * @return {Object} 40 | */ 41 | function addToTrie(string, trie) { 42 | // Get the first letter of our string. 43 | var letter = string[0]; 44 | 45 | // Check to see if the first letter is in our trie. 46 | if (!trie[letter]) { 47 | // If it's not, we need to add a node to our trie, and call 48 | // that node by the letter we've just checked. That "letter" 49 | // is now an object in our data structure. 50 | trie[letter] = {}; 51 | } 52 | 53 | // Next we need to move on to the "rest" of the string. 54 | // That is, we no longer care about the first letter because 55 | // we've already determined that letter ("prefix") exists in 56 | // our trie structure, so we've stored it alreddy. 57 | string = string.substr(1); 58 | 59 | // Since we have to do the above for each letter in the string 60 | // that we're adding to the trie, we can just call this same 61 | // function again and again (i.e., we can call it recursively). 62 | // This time, though, we're one-level deep. 63 | if (string) { // But only *actually* recurse if we have data left. 64 | addToTrie(string, trie[letter]); 65 | } 66 | 67 | // When we're no longer recursing, we return the trie we made. 68 | return trie; 69 | } 70 | 71 | console.log(JSON.stringify(trie, null, 4)); 72 | -------------------------------------------------------------------------------- /binary_search/binary_search_iterative.py: -------------------------------------------------------------------------------- 1 | """ 2 | Binary search algorithm, iterative implementation. 3 | 4 | A binary search is a method to quickly find a given item in a sorted 5 | list. It's important the input list is sorted, or the binary search 6 | algorithm won't be useful at all. Sorted lists are things like 7 | alphabets (because "A" is always before "B," and so on), or number 8 | lines (because "1" is always before "2," and so on). 9 | """ 10 | 11 | from math import floor # We're gonna need to round down. 12 | 13 | def binary_search_iterative(stuff, item): 14 | """ 15 | Return the index of ``item`` in ``stuff`` using binary search. 16 | 17 | Args: 18 | stuff (list): A Python list that has already been sorted. 19 | item: The value of the item to search for (not its index). 20 | 21 | Returns: 22 | The index of ``item`` or ``None`` if ``item`` is not found. 23 | 24 | >>> binary_search_iterative([1, 2, 3, 4, 5, 6, 7, 8], 8) 25 | Guess number 1 is 4 26 | Guess number 2 is 6 27 | Guess number 3 is 7 28 | Guess number 4 is 8 29 | 7 30 | 31 | Notice that the final return value is 7, not 8, because Python 32 | list indexes start counting from position number 0, not number 1. 33 | """ 34 | 35 | # We will be keeping track of a range of positions instead of 36 | # looking at position 0 and then looking at position 1, so we need 37 | # to keep track of the lowest and highest positions of that range, 38 | # not just the current spot in the list of stuff we're looking at. 39 | 40 | # The low position always starts at 0. 41 | low = 0 42 | 43 | # The high position is however much stuff we're looking through. 44 | high = len(stuff) - 1 45 | 46 | # We also keep track of how many guesses we're making to find it. 47 | guess_number = 0 # (This is just for our own edification.) 48 | 49 | # Eventually, `low` and `high` will converge, because we'll keep 50 | # shrinking the range by half (hence, "*binary* search") until we 51 | # find the item we're looking for. After each guess, we'll loop 52 | # (that is, we'll iterate) over the same list in a narrower range. 53 | while low <= high: 54 | # The middle spot of our range is always going to be the 55 | # current value of low plus high, divided by two. 56 | mid = int(floor((low + high) / 2)) # round down, just in case. 57 | 58 | guess = stuff[mid] # That middle spot will be our next guess, 59 | guess_number = guess_number + 1 # so let's count our guesses. 60 | 61 | # How many guesses have we made so far? 62 | print("Guess number " + str(guess_number) + " is " + str(guess)) 63 | 64 | if guess == item: # If this is the correct guess, 65 | return mid # then we've found the item! Yay! :) 66 | 67 | # If we haven't found the item yet, then our guess was either 68 | # too low or too high. Thankfully, our list of stuff is sorted 69 | # so, if the guess was too high 70 | if guess > item: 71 | # then we know the item we're looking for is in the lower 72 | # half of our range. This means we can set the high end of 73 | # our range to the middle position of our last guess. We 74 | # also subtract 1 since we know our last guess was wrong. 75 | high = mid - 1 76 | else: 77 | # On the other hand, if the guess was too low, then we 78 | # know the item is in the higher half of our range, so we 79 | # set the low end of our range to the middle position of 80 | # our last guess, instead. 81 | low = mid + 1 82 | 83 | # If we still haven't found the item, then it's not in the list! 84 | return None 85 | 86 | if __name__ == "__main__": 87 | num = int(input('Enter a number between 1 and 100: ')) 88 | binary_search_iterative(range(1, 101), num) 89 | -------------------------------------------------------------------------------- /stack/stack_functional.pl: -------------------------------------------------------------------------------- 1 | # 2 | # Stack data structure, functional implementation. 3 | # 4 | use strict; 5 | use warnings; 6 | 7 | =pod 8 | 9 | =head1 Name 10 | 11 | C - Example of a stack implementation in 12 | functional programming style. 13 | 14 | =head1 Description 15 | 16 | A stack is nothing more than a set of items that can be accessed in 17 | only one very specific way. Specifically, the last item added to the 18 | stack is the only one that can be immediatelly accessed. This is just 19 | like stacking books one atop the other; to open the book on bottom of 20 | the stack, you must first move all the books above it out of the way. 21 | 22 | Since a stack is such a basic data structure, most programming 23 | languages have built-in variable types that can be treated as though 24 | the variable were a stack. In Perl, these include arrays and hashes. 25 | The thing that makes a stack "a stack" is just the fact that you will 26 | always restrict yourself to working with the last item added before 27 | any other items in the stack. 28 | 29 | Most languages also have built-in functions to do this for you, but 30 | we're going to (mostly) ignore that fact for the sake of our own 31 | education. Instead, we'll use one such variable type (a Perl array) 32 | and write our own functions to make use of it like it is a stack. 33 | 34 | =head1 Functions 35 | 36 | One of the important things we need to be able to do with a stack is 37 | to add items to it. The other important thing is to remove that item. 38 | These operations will be implemented as two separate functions (or 39 | "subroutines" in Perl lingo). 40 | 41 | =over 42 | 43 | =item add_to_stack() 44 | 45 | Adds an item to a given array and returns a copy of the array: 46 | 47 | @new_array = add_to_stack(@old_array, 'Item'); 48 | 49 | Programmers call this operation a "push," and most programming 50 | languages have a built-in C function.. 51 | 52 | =cut 53 | 54 | sub add_to_stack { 55 | # Copy all arguments over to a local variable. 56 | my @args = @_; 57 | 58 | # Simply return them all, since what we were given was an array 59 | # and then the argument to add to it. :) 60 | return @args; 61 | } 62 | 63 | =pod 64 | 65 | =item remove_top_item_from_stack() 66 | 67 | Given an array, returns a reference to a new array without the last 68 | item in it, and the last item itself. 69 | 70 | ($item, $array_reference) = remove_top_item_from_stack(@stack); 71 | 72 | Programmers call this operation a "pop," and most programming 73 | languages have a built-in C function. 74 | 75 | =back 76 | 77 | =cut 78 | 79 | sub remove_top_item_from_stack { 80 | my @args = @_; 81 | 82 | # Get the very last argument passed to this function. That is the 83 | # "top" of our stack. 84 | my $top_item = $args[-1]; 85 | 86 | # Get a copy of all the arguments passed to this function except 87 | # the very last one. This is our new stack. 88 | my @shortened_stack = splice @args, 0, -1; # `splice` is built-in 89 | 90 | return ($top_item, \@shortened_stack); # Return a reference! 91 | } 92 | 93 | # First, make an empty array, which will be where we stack some books. 94 | my @stack_of_books = (); 95 | 96 | # Now we add a book to it. 97 | my @bigger_stack = add_to_stack(@stack_of_books, 'The Giver'); 98 | 99 | # Let's stack a few more books on top of the last one. 100 | @bigger_stack = add_to_stack(@bigger_stack, q(Harry Potter and the Sorcerer's Stone)); 101 | @bigger_stack = add_to_stack(@bigger_stack, 'Jurassic Park'); 102 | @bigger_stack = add_to_stack(@bigger_stack, 'Lord of the Rings: The Fellowship of the Ring'); 103 | 104 | # Now we only remove the top book, one at a time. 105 | my ($book, $stack) = remove_top_item_from_stack(@bigger_stack); 106 | print "The top book is: $book\n"; 107 | 108 | ($book, $stack) = remove_top_item_from_stack(@{$stack}); # De-reference the array. 109 | print "The next book is: $book\n"; 110 | 111 | 1; # Return success. 112 | 113 | __END__ 114 | 115 | =pod 116 | 117 | =head1 Author 118 | 119 | Meitar "maymay" Moscovitz 120 | 121 | =cut 122 | -------------------------------------------------------------------------------- /stack/README.markdown: -------------------------------------------------------------------------------- 1 | # Stack 2 | 3 | A stack is a (data) structure made of a sequence of items placed one atop the other. You almost certainly already have an intuitive understanding of how stacks work, because they work the same way outside of a computer as they do in a computer (and don't let anyone tell you any different). Another word for "stack" that you might hear is "LIFO," which stands for "last in, first out." 4 | 5 | It's easy to visualize stacks (or LIFOs) because they're so common. For example, the last time you moved house, you might have had to pack your things into boxes. Maybe you pack your silverware first, put it in a box and place it on the floor: 6 | 7 | ``` 8 | ______________ 9 | | | 10 | | Silverware | 11 | |____________| 12 | ``` 13 | 14 | Next, you grab your bed linens, put them in a box, and chuck that box on top of the first one: 15 | 16 | ``` 17 | ______________ 18 | | | 19 | | Bed Linens | 20 | |____________| 21 | ______________ 22 | | | 23 | | Silverware | 24 | |____________| 25 | ``` 26 | 27 | Finally, you pack up your toiletries in the same way: 28 | 29 | ``` 30 | ______________ 31 | | | 32 | | Toiletries | 33 | |____________| 34 | ______________ 35 | | | 36 | | Bed Linens | 37 | |____________| 38 | ______________ 39 | | | 40 | | Silverware | 41 | |____________| 42 | ``` 43 | 44 | Phew, that was a lot of work! You're hungry, so you order some takeout but then, *PLOT TWIST*, the restaurant forgot to include utensils. Dismayed, you look back at your packed things, and realize your forks are all the way at the bottom of your stack of boxes. Since the boxes are in a stack, you can't get to the one on the bottom without first moving the ones above it. 45 | 46 | The important thing to notice is that the box you packed *last* is the one you have to move *first*; there's just no way to get to the boxes under it without first moving the top ones. In other words, the last item into the pile is the first one you need to move out. (Hence a stack's jargon name, LIFO, or "last in, first out.") Another way to say this same thing is that emptying a stack must proceed in exactly the reverse order from which it is filled. 47 | 48 | Although this fact about stack structures can be frustrating in the physical world, this symmetry can be used to do some awesomely powerful things, and is a useful characteristic for more complex algorithms. But don't let that intimdate you. Underneath it all, a "stack" is just a set of items positioned in a way that only lets you access them in a specific order. Specifically, the reverse order in which you stacked the items in the first place. This explicit *constraint* is the main reason for using stacks. 49 | 50 | It might be counter-intuitive to imagine constraints as useful, so let's explore a more practical example of how a "stack" might be used. One common use of stacks is in navigation systems. Imagine you're hiking on a mountain trail. There are many forks in the trail, places where the path splits and you can follow one of two walkways. Your goal is to hike to the top of the mountain and then return to the trailhead that you set off from. The challenge is that the hike will last all day and the forks will look different at night than they do in the daylight. 51 | 52 | That's a situation in which you can use a stack to find your way. Before you begin hiking, you pack a deck of index cards and a pen. Then, as you hike *up* the mountain, you write "Left" or "Right" on a blank index card each time you reach a fork in the trail. You place each new card on top of the last card, creating a stack. When you then hike *down* the mountain and reach the first fork, you go the *opposite* direction as the one written on the index card, and put that card away. This technique works because the number of index cards you have will always be the number of "Left" or "Right" choices you made, and to backtrack your own route you always need to take the opposite direction as you did when first navigating. 53 | 54 | There are many other real-world problems that can be solved by doing one thing in one direction and the opposite in the other direction. Regardless of how complex the real-world problem is, they can all use this neat feature of stacks to do at least some of their problem solving. 55 | 56 | > :construction: TK-TODO: Maybe some more examples? Brace pair matching? Link to [singly linked list](../singly_linked_list/README.markdown), which is a sort of (reverse) stack? 57 | 58 | ## Further reading 59 | 60 | * [Wikipedia](https://en.wikipedia.org/wiki/Stack_%28abstract_data_type%29) 61 | -------------------------------------------------------------------------------- /trie/README.markdown: -------------------------------------------------------------------------------- 1 | # Trie 2 | 3 | A trie (pronounced "try") is a nested data structure useful for quickly retrieving (hence the name, as it's used for re*trie*val) a result based on a given starting input (called the *prefix*). You probably use tries every day without thinking much about it. For example, when you start typing into a search box and are immediately shown [auto-completed search suggestions](https://support.google.com/websearch/answer/106230?hl=en "Google Search Help: Search using autocomplete"), it's likely that the app you're using has made a trie of all prior search queries it's been asked before! 4 | 5 | A word or phrase is not added to a trie as a single, complete entry. Instead, the word or phrase is first broken up into its individual characters. The word "cake" would be stored as a set of four separate items: the letters `c`, `a`, `k`, and `e`. The magic is in choosing where in the trie to insert each character. 6 | 7 | If we only have one word in our trie, such as the word `cake`, we would have a structure that might look like this in JavaScript: 8 | 9 | ```js 10 | { 11 | 'c': { 12 | 'a': { 13 | 'k': { 14 | 'e': { 15 | // and we're done, so this object is empty! 16 | } 17 | } 18 | } 19 | } 20 | } 21 | ``` 22 | 23 | As you can see, we have a containing object (the outermost braces), which is our trie itself. Inside that, we have a `c`, and inside that `c` we have an `a`, and so on. One character is nested "inside" the preceding character until we reach the end of the word (at `e`, in this case). 24 | 25 | The important thing to notice is that the first level is only the letter "c" and not the whole word "cake." This means that when we add another word that *begins with the same letter* (it has the "same prefix"), we can add it to the trie by repurposing the outermost `c` character and only adding characters in places where the prefix is different. For instance, if we add the word "cute" to our trie, our JavaScript representation might now look like this: 26 | 27 | ```js 28 | { 29 | 'c': { 30 | 'a': { 31 | 'k': { 32 | 'e': { 33 | // this object has spelled "cake" 34 | } 35 | } 36 | }, 37 | 'u': { 38 | 't': { 39 | 'e': { 40 | // and this one has spelled "cute" 41 | // but we used the same "c" from the beginning! 42 | } 43 | } 44 | } 45 | } 46 | } 47 | ``` 48 | 49 | The above trie contains both the words "cake" and the word "cute." However, since both words start with the letter "c," there is only one `c` inside the trie; `c` is the common prefix for both `cake` and `cute`. If we then add the word "cat" to the above trie, our JavaScript object should look like this: 50 | 51 | ```js 52 | { 53 | 'c': { 54 | 'a': { 55 | 'k': { 56 | 'e': { 57 | // "cake" again 58 | } 59 | }, 60 | 't': { 61 | // "cat", sharing the "ca" prefix with "cake" 62 | } 63 | }, 64 | 'u': { 65 | 't': { 66 | 'e': { 67 | // "cute", sharing only the first letter, "c" 68 | } 69 | } 70 | } 71 | } 72 | } 73 | ``` 74 | 75 | This time, notice that when we added the word "cat" we only had to add the one letter that was not already the same as any other prefix. In this case, that was simply the letter `t`, because `cat` shares a common two-character prefix with `cake`, which we already added in our trie when we first added the word "cake" to it. 76 | 77 | How might this structure make tasks such as "check if we have seen the word `cake` before" much faster and easier to accomplish? Well, rather than having to check to see if we have the word "cake" in our (possibly very long) list of words, we simply have to check if we have any words that begin with `c`, and if we do, if that `c` "contains" an `a`, and if so, whether that `a` contains a `k`, and so on. If at any point we do *not* have the next character of our lookup word, we simply stop looking because it's certainly not possible to have the word "cake" in our trie if we've never seen any given *prefix* of the word itself! That is, we can't possibly have the word "cake" in our list of words if we've never seen a word that begins with `ca`. (And we can't possibly have `cat` in our list then, either.) 78 | 79 | Here's another example of a trie, this time with four words in it: 80 | 81 | ```js 82 | { 83 | "c": { 84 | "a": { 85 | "t": {}, 86 | "p": { 87 | "e": { 88 | "r": {} 89 | } 90 | } 91 | } 92 | }, 93 | "d": { 94 | "a": { 95 | "r": { 96 | "k": {} 97 | }, 98 | "p": { 99 | "p": { 100 | "e": { 101 | "r": {} 102 | } 103 | } 104 | } 105 | } 106 | } 107 | } 108 | ``` 109 | 110 | ## Further reading 111 | 112 | * [Wikipedia](https://en.wikipedia.org/wiki/Trie) 113 | -------------------------------------------------------------------------------- /binary_search/README.markdown: -------------------------------------------------------------------------------- 1 | # Binary search 2 | 3 | The binary search algorithm is a method to quickly find a given item in a list (called a "search space") that has already been sorted. The word "binary" in its name can be misleading because it refers to halving the search space (dividing the search space by two) at each attempt to find the thing you're looking for. You don't need to know anything about binary math to use it! 4 | 5 | In order to be useful, the list or search space you're looking through must already be sorted. Some examples of things that are sorted include number lines (because `1` always comes before `2`, and so on) or alphabets (because `A` always comes before `B`, and so on). For instance, the dictionary is a *sorted* list of words; it's sorted because it's alphabetized and you are certain to find "Aardvark" well before you encounter the word "Zebra" if you simply read it from beginning to end (i.e., if you read the dictionary linearly). 6 | 7 | When you perform a binary search, you start in the middle of the search space instead of at its beginning or its end. If the item you're looking for is not the item in the exact middle, you check to see if it's supposed to be listed before or after that middle item. This is why it's so important for the list to be sorted before you try to do a binary search in it. If you try to apply the binary search algorithm to a search space that is not sorted, it won't work because you won't know which half of the list to keep looking in. 8 | 9 | Here's a simple visualization showing how halving the search space works. If there are eight items in the search space, and they're sorted, then you only need up to three guesses to find any item (and then one more "guess" to actually pick it out): 10 | 11 | ``` 12 | Search space: |__1__|__2__|__3__|__4__|__5__|__6__|__7__|_*8__| "Guess a number from 1 to 8." (It's 8.) 13 | Before first guess: |_______________________________________________| 14 | After first guess: |xxxxxxxxxxxxxxxxxxxxxxx|_______________________| "If it's not four, is it bigger or smaller than four?" 15 | After second guess: |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|___________| "If it's not six, is it bigger or smaller than six?" 16 | After third guess: |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|_____| "If it's not seven, is it bigger or smaller than seven?" 17 | Fourth "guess": |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|FOUND| "It's eight!" 18 | ``` 19 | 20 | Here's another example: 21 | 22 | ``` 23 | Search space: |__1__|__2__|_*3__|__4__|__5__|__6__|__7__|__8__| "Guess a number from 1 to 8." (It's 3.) 24 | Before first guess: |_______________________________________________| 25 | After first guess: |_______________________|xxxxxxxxxxxxxxxxxxxxxxx| "If it's not four, is it bigger or smaller than four?" 26 | After second guess: |xxxxx|_____|_____|xxxxxxxxxxxxxxxxxxxxxxxxxxxxx| "If it's not two, is it bigger or smaller than two?" 27 | After third guess: |xxxxxxxxxxx|_____|xxxxxxxxxxxxxxxxxxxxxxxxxxxxx| "It's bigger than two but smaller than four, so..." 28 | Fourth "guess": |xxxxxxxxxxx|FOUND|xxxxxxxxxxxxxxxxxxxxxxxxxxxxx| "that means it must be three!" 29 | ``` 30 | 31 | Notice that the search space shrinks by half after each guess. If you guess `4` but the answer is `8`, then you can safely discard `1`, `2`, and `3`, because these are even lower than `4`, which you're told is too low and is what you guessed first. Rather than guessing `5` next, find the middle of the remaining search space and guess that item (`6`). Then check if your guess is correct, or if it's too high or too low. Keep guessing in the middle of the remaining search space until there's only one possibility left. Now you can see why a *sorted* input is so important for binary search to work! 32 | 33 | The important pattern to notice is that the number of possible correct answers (the "search space") is reduced by half after each guess: 34 | 35 | * 8 possible answers *before* the first guess. 36 | * 4 possible answers *after* the first guess. 37 | * 2 possible answers after the second guess. 38 | * 1 possible answer after the third guess. 39 | 40 | Notice that in the second example, however, we didn't even need to use a third guess, because there was only one possible answer after the second guess. Three guesses is the *slowest* it would take to find the answer in a set of eight items, not the fastest. This is still potentially much faster than a worst-case scenario if we used a *linear* search instead of a *binary* search. If we used a linear search, it would have taken us 8 guesses to guess the number 8 in the first example if we started guessing at 1, because our next guess would have been 2, then 3, and so on. 41 | 42 | Binary search is also sometimes called *logarithmic search* because the search speed is calculatable by taking a logarithm of the length of the list. [A logarithm is the opposite of an exponent](https://www.khanacademy.org/math/algebra-home/alg-exp-and-log/alg-graphs-of-logarithmic-functions/v/comparing-exponential-logarithmic-functions). 2 raised to the power of 3 (that is, 2 multiplied by itself 3 times) equals eight (2³ = 8). We can retrieve the exponent (the 3, which is the most guesses we'd need to search an 8-item list) by applying the logarithm function to the number 8 (the size of our search space) with a base of 2 (because we're halving the space at each guess): log₂8 = 3. 43 | 44 | Using logarithmic math, you can easily find out how many guesses it would take you to find a given item in an arbitrarily large search space. For instance, if you had to guess a number between 1 and 100, it would take you at most log₂100 guesses if you always halved the search space at each guess (i.e., if you "used binary search"). 45 | 46 | ## Further reading 47 | 48 | 1. [Khan Academy](https://www.khanacademy.org/computing/computer-science/algorithms/binary-search/a/binary-search) 49 | 1. [RosettaCode](http://rosettacode.org/wiki/Binary_search) 50 | 1. [Working with Exponents and Logarithms](https://www.mathsisfun.com/algebra/exponents-logarithms.html) 51 | 1. [Binary Search - Wikipedia](https://en.wikipedia.org/wiki/Binary_search_algorithm) 52 | 1. [Logarithm - Wikipedia](https://en.wikipedia.org/wiki/Logarithm) 53 | -------------------------------------------------------------------------------- /singly_linked_list/singly_linked_list_object_oriented.php: -------------------------------------------------------------------------------- 1 | | Elem 2 | --> | Elem 3 | 11 | * —————————— —————————— —————————— 12 | * 13 | * These diagrams are somewhat incomplete because they don't show the 14 | * list itself or clearly explain what the "links" (the arrows) are. 15 | * More accurate diagrams would show the list element's own structure, 16 | * perhaps something like this: 17 | * 18 | * —————————————————————————————————————————————————— 19 | * | Singly Linked List | 20 | * | First element: "Elem 1" | 21 | * | | 22 | * | —————————— —————————— —————————— | 23 | * | | Elem 1 | | Elem 2 | | Elem 3 | | 24 | * | | | | | | | | 25 | * | | Title: | | Title: | | Title: | | 26 | * | |-Opening| | -Main | |-Ending | | 27 | * | | | | Theme | | Credits| | 28 | * | | | | | | | | 29 | * | | Next: | | Next: | | Next: | | 30 | * | | -Elem 2| | -Elem 3| | - NONE | | 31 | * | —————————— —————————— —————————— | 32 | * | | 33 | * —————————————————————————————————————————————————— 34 | * 35 | * The list itself is made up of its elements. We can interact with a 36 | * given element by accessing that element directly, but in order to 37 | * access that element in the first place we first need to find it in 38 | * the list. 39 | * 40 | * To do this, we will define two PHP classes, which correspond to the 41 | * types of containing boxes in the diagram above: one for the list 42 | * itself, and one for the list's elements. 43 | * 44 | * @file singly_linked_list_object_oriented.php Example implementation in Object-Oriented Programming style. 45 | * 46 | * @author Meitar "maymay" Moscovitz 47 | */ 48 | 49 | /** 50 | * Singly linked list. 51 | * 52 | * The singly linked list itself is the logical container for all the 53 | * list's elements. The list needs to know which element is the first 54 | * element. It does *not* need to know the order of the list items 55 | * themselves, because each element is responsible for knowing which 56 | * one comes after it. 57 | */ 58 | class Singly_Linked_List { 59 | 60 | /** 61 | * The first item. 62 | * 63 | * When first created, the Singly_Linked_List will have its 64 | * `$first` member variable set to `null` so that we know there 65 | * are no elements in the list. 66 | * 67 | * @var null|Singly_Linked_List_Element 68 | */ 69 | public $first = null; 70 | 71 | /** 72 | * Adds an element to the end of the list. 73 | * 74 | * We need to be able to add elements to the list. To do that, we 75 | * make a method find the end of the list and then make a new 76 | * element there. 77 | * 78 | * @param mixed $value 79 | */ 80 | public function appendElement ($value = null) { 81 | if ($this->first instanceof Singly_Linked_List_Element) { 82 | // If there are already elements in the list, we add the 83 | // new element to the end of the list by making the last 84 | // element's `$next` variable point to the new element. 85 | $last_element = $this->getLastElement(); 86 | $last_element->next = new Singly_Linked_List_Element($value); 87 | } else { 88 | // Otherwise, if the first element is still null (which 89 | // is how we initialized it), we have an empty list, so 90 | // this element can go in the first spot. 91 | $this->first = new Singly_Linked_List_Element($value); 92 | } 93 | } 94 | 95 | /** 96 | * Find the element at the end of the list. 97 | * 98 | * @return null|Singly_Linked_List_Element 99 | */ 100 | public function getLastElement () { 101 | // Start at the beginning of the list. 102 | $current_element = $this->first; 103 | 104 | // Examine each element to see if it knows about a next one. 105 | while (null !== $current_element->next) { 106 | // If it does, check that element next. 107 | $current_element = $current_element->next; 108 | } 109 | 110 | // When the examined element's `$next` member variable doesn't 111 | // know about a next element, it means that element is the 112 | // last one in the list, so we return it. 113 | return $current_element; 114 | } 115 | 116 | } 117 | 118 | /** 119 | * An element in a Singly_Linked_List. 120 | * 121 | * Each element contains some arbitrary value and a pointer to the 122 | * following element. 123 | */ 124 | class Singly_Linked_List_Element { 125 | 126 | /** 127 | * The value of the element. 128 | * 129 | * @var mixed 130 | */ 131 | public $value; 132 | 133 | /** 134 | * The next element in the list. 135 | * 136 | * @var Singly_Linked_List_Element 137 | */ 138 | public $next; 139 | 140 | /** 141 | * Creates an element in a Singly Linked List. 142 | * 143 | * When we create a new element, it should contain whatever was 144 | * given to it. If nothing was then it should be empty (`null`). 145 | * 146 | * @param mixed $value The value of this element. 147 | * @param null|Singly_Linked_List_Element The next element in the list, or `null` to indicate the final element. 148 | */ 149 | public function __construct ($value = null, $next = null) { 150 | $this->value = $value; 151 | $this->next = $next; 152 | } 153 | 154 | } 155 | 156 | // Let's put these "items" into a singly linked list. 157 | $items = array('hairbrush', 'stuffed animal', 'blanket', 'cooking pot'); 158 | 159 | // Create the list. 160 | $singly_linked_list = new Singly_Linked_List(); 161 | 162 | // We add each item to a list element, one after the other. 163 | foreach ($items as $item) { 164 | $singly_linked_list->appendElement($item); 165 | } 166 | 167 | var_dump($singly_linked_list); 168 | -------------------------------------------------------------------------------- /singly_linked_list/README.markdown: -------------------------------------------------------------------------------- 1 | # Singly linked list 2 | 3 | A singly linked list is a data structure made of a sequence of items, or elements, each of which points to the next element in the sequence. The last element in the list is the only exception to this rule, since it cannot point to a next element by virtue of being the last one. The name "linked list" can be confusing because there is no literal "linking" going on; the elements are not joined together in any physical way. Instead, the name "linked" refers merely to a given element's own knowledge of where the next element is. 4 | 5 | You might already be familiar with the way a singly linked list works, because it's similar to a treasure hunt. Each clue in a treasure hunt points you toward the next clue, which in turn points you to the next one, until finally you reach the treasure at the end. The treasure hunt is like a singly linked list, since it is made up of a sequence of clues. The clues are like the individual elements in the list. To start the treasure hunt, you need to know the location of the first clue. That first clue knows the location of the second clue, and so on down the chain. 6 | 7 | Here's a simple visual example of a treasure hunt: 8 | 9 | ``` 10 | TREASURE HUNT! Start at the square marked "1" to find "the spot!" 11 | ————————————————————— 12 | | |1| | | | | | | | | <-- First clue: "Go down two and right three." 13 | ————————————————————— 14 | |3| | | | | | | | | | <-- Third clue: "Go down two and right seven." 15 | ————————————————————— 16 | | | | | |2| | | | | | <-- Second clue: "Go up one and left four." 17 | ————————————————————— 18 | | | | | | | | |x| | | <-- "x" marks the spot! 19 | ————————————————————— 20 | ``` 21 | 22 | Singly linked lists work the same way as this treasure hunt. The list *itself* must know the location of its first element, but nothing more. That's because the first *element* knows where the second element is, and so on down the chain. 23 | 24 | The key insight about linked lists is that their contents (the elements) can be physically positioned any which way you like, much like the clues to a treasure hunt are often scattered far apart from one another. The elements don't all have to be next to each other, as though they were ducks in a row. Computers use this property of a linked list to store the list's data in whatever regions of memory they have available, which lets them make better use of limited available space. 25 | 26 | This same property also makes it faster to perform certain operations on the list, such as adding elements ("inserting") to the start of the list. To understand why, imagine that we want to add a new clue to the beginning of a treasure hunt we made for a friend. Since the existing first clue already just points to the location of the second clue, all we have to do is write a new clue that points to the current location of the first clue. Then, instead of telling our friend where they can find the old first clue, we just tell them where they can find the new first clue. That new first clue points to the second, which already points to the third. 27 | 28 | Singly linked lists are an important building block of more complex data structures, but they are somewhat limited on their own. Since the list itself only knows about its beginning, we can only find its end by reading all the items in the list. Think about the treasure hunt again: to find the treasure, your friend has to start at the clue you give them, and then follow each clue to the location of the next one. This also means it's not possible to go *backwards* through the list (to find the previous clue in the treasure hunt from your current clue). 29 | 30 | Singly linked lists are often drawn as diagrams that may look something like this: 31 | 32 | ``` 33 | —————————— —————————— —————————— 34 | | Elem 1 | --> | Elem 2 | --> | Elem 3 | 35 | —————————— —————————— —————————— 36 | ``` 37 | 38 | These diagrams are incomplete at best because they don't show the list itself or clearly explain what the "links" (the arrows) are. They also don't show what the list is actually a list of. This is because the only idea that the notion of a "singly linked list" conveys is the way the list items themselves remember which is next. That said, a more complete and technically accurate diagram of a linked list that was, say, a music album's track listing, might look more like this: 39 | 40 | ``` 41 | —————————————————————————————————————————————————— 42 | | Singly Linked List | 43 | | First element: "Elem 1" | 44 | | | 45 | | —————————— —————————— —————————— | 46 | | | Elem 1 | | Elem 2 | | Elem 3 | | 47 | | | | | | | | | 48 | | | Title: | | Title: | | Title: | | 49 | | |-Opening| | -Main | |-Ending | | 50 | | | | | Theme | | Credits| | 51 | | | | | | | | | 52 | | | Next: | | Next: | | Next: | | 53 | | | -Elem 2| | -Elem 3| | - NONE | | 54 | | —————————— —————————— —————————— | 55 | | | 56 | —————————————————————————————————————————————————— 57 | ``` 58 | 59 | Notice that the arrows are gone, and have been replaced by a label called "Next." In "Elem(ent) 1," the "Next" label references "Elem 2." That's how the first element "links" itself to the second. The second element does the same to link itself to the third, but the third element's "Next" label contains a special marker, indicating that it's the final element in the list. 60 | 61 | Notice also that the elements contain a second label, called "Title" in this example. This is the actual "contents" of the item itself. The diagram above is intended to be an album's track listing, so the diagram actually represents how a computer might store a list like the following in its own memory: 62 | 63 | ``` 64 | 1. Opening (then play track 2) 65 | 2. Main Theme (then play track 3) 66 | 3. Ending Credits (okay, stop playing) 67 | ``` 68 | 69 | Again, the important thing here is that the list items themselves note which one to read next, and *the point* of doing that is so I could physically write the items down out-of-order without changing the order in which I should play them back aloud, like this: 70 | 71 | ``` 72 | 3. Ending Credits (okay, stop playing) 73 | 1. Opening (then play track 2) 74 | 2. Main Theme (then play track 3) 75 | ``` 76 | 77 | This may seem contrived to a human able to think abstractly, but is very useful to a computer that needs to read data from physical devices with limited space on them, such as hard drives, CDs, or memory chips (RAM). Being able to write the items down in whatever chunk of physical memory happens to be available, even if those chunks are far away from each other, and even if those chunks have other things already written on them, is a bit like writing the above track listing down on the same piece of paper that already has doodles on it, and that you kept last week's shopping list on. As long as there's enough space *somewhere* on the paper for each of the new items to fit individually, you can re-use the paper and still keep all the lists, doodles, and so on logically separated and contained. 78 | 79 | ## Further reading 80 | 81 | * [Wikipedia](https://en.wikipedia.org/wiki/Linked_list) 82 | -------------------------------------------------------------------------------- /README.markdown: -------------------------------------------------------------------------------- 1 | # Data Structures and Algorithms "for people without computer science degrees" 2 | 3 | A compendium for self-education about "data structures and algorithms," created by and for "people without computer science degrees." 4 | 5 | 1. [Motivation](#motivation) 6 | 1. [How to use this repository](#how-to-use-this-repository) 7 | 1. [Running the code](#running-the-code) 8 | * [JavaScript](#javascript) 9 | * [Java](#java) 10 | * [PHP](#php) 11 | * [Perl](#perl) 12 | * [Python](#python) 13 | * [Ruby](#ruby) 14 | 1. [About code comments](#about-code-comments) 15 | 1. [Exercise suggestions](#exercise-suggestions) 16 | 17 | ## Motivation 18 | 19 | Core computer science concepts, such as "data structures and algorithms," are taught using a classist, fucked-up pedagogical approach that makes me viscerally, incoherently angry. Nevertheless, I would like to know what the fuck people mean when they say things like "data structure" or "algorithm" and refer to specific structures or specific algorithms. Despite 20 years of practical programming experience, working in a variety of Information Technology sectors, I still feel completely lost when attempting to navigate this area of specialized knowledge. 20 | 21 | THIS IS NOT A PERSONAL FAILURE. Moreover, there are *explicit and intentional* reasons that benefit mostly white men that explain the consistency and reliability for which education around computer science fails so many people like myself. 22 | 23 | This repository exists, therefore, as a place for me to compile my own notes about this esoteric but critically important branch of the knowledge tree. It is my hope that it becomes useful, over time, to others who are feeling similarly despondent about the over-the-top doucheyness with which "CS courses" and "techies" talk about this area of their craft. 24 | 25 | We understand that this doucheyness is a form of power-hoarding, i.e., gatekeeping. In the spirit of the original hacker ethos, "information should be free," we recognize this social gatekeeping behavior as a form of censorship and are determined to use the Internet to route around it. 26 | 27 | TL;DR: Fuck your CS courses and your classist academia. We're going to learn this thing without your goddamn "help." 28 | 29 | ## How to use this repository 30 | 31 | This is not a book. You don't have to read it "in order." In fact, I'm not even sure there's any particular order to read it in. I've also avoided any kind of "easier to harder" gradation, because these have not been useful to me when I've encountered them. 32 | 33 | Instead, I encourage you to jump directly to the subsections that interest you. If you get frustrated, try a different section. Although you will certainly find yourself becoming more proficient the more you practice, each section is written to be accessible by novice coders with no prior exposure to the algorithm or data structure in question and assumes no prior knowledge of other algorithms or data structures. 34 | 35 | ### Running the code 36 | 37 | I strongly encourage you to actually run and play around with the code itself. Do not merely read the source code, although do *also* read the source code. The source code files are *thoroughly* commented and are written with the intent of being educational. However, do absolutely download the files, or type them verbatim into your text editor, and execute them yourself. 38 | 39 | Better yet, *debug* them yourself. This doesn't mean that there are bugs in the code. There aren't. (At least, I hope there aren't!) Instead, it means use your language's built-in debugging tools to inspect the code at each stage of its operation. This section will explain how to do that in more detail. 40 | 41 | #### [JavaScript](https://en.wikipedia.org/wiki/JavaScript) 42 | 43 | The JavaScript code samples are all compatible with [NodeJS](https://nodejs.org/) and all modern Web browser consoles because they conform to [ECMAScript 5](https://en.wikipedia.org/wiki/ECMAScript#5th_Edition). You can run them in a shell at a command line, you can copy-and-paste them into the JavaScript console in your browser's developer tools. 44 | 45 | To run them in a shell, execute them with `node` like this: 46 | 47 | ```sh 48 | # to run the recursive implementation example of the trie data structure 49 | node trie/trie_recursive.js 50 | ``` 51 | 52 | NodeJS supports the [Chrome DevTools Protocol](https://devtools.chrome.com/) with which you can attach any supported debugger, including the command-line debugging interface built into NodeJS itself. To debug the JavaScript code samples with the built-in NodeJS debugger, execute them like so: 53 | 54 | ```sh 55 | node inspect trie/trie_recursive.js 56 | ``` 57 | 58 | You can use the debugger to run one line of the code at a time, and it will allow you to inspect the values of all the variables during program execution. Once in the debugger, type `help` to get help. (The NodeJS debugger's help isn't very extensive, so feel free to hop into the [Better Angels's public chat room](https://gitter.im/betterangels/better-angels) if you need help from a human.) 59 | 60 | Alternatively, open the `.html` file in your web browser (probably just by double-clicking it). The code will run automatically, though you may need to ensure your developer tools are open to see it. To learn more about your browser's developer tools, refer to your browser's documentation: 61 | 62 | * [Google Chrome DevTools Overview](https://developers.google.com/web/tools/chrome-devtools/) provides everything you need to know about using the DevTools in Google Chrome, but for our purposes, focus on the following articles: 63 | * Using the [Google Chrome Console](https://developers.google.com/web/tools/chrome-devtools/console/) 64 | * [Inspect and Debug JavaScript: Set Breakpoints](https://developers.google.com/web/tools/chrome-devtools/javascript/add-breakpoints) 65 | * [Inspect and Debug JavaScript: Step Through Code](https://developers.google.com/web/tools/chrome-devtools/javascript/step-code) 66 | * [Mozilla Firefox Developer Tools portal](https://developer.mozilla.org/en-US/docs/Tools) provides everything you need to know about using the Developer Tools in Mozilla Firefox, but for our purposes, focus on the following articles: 67 | * [Opening the Mozilla Firefox Web Console](https://developer.mozilla.org/en-US/docs/Tools/Web_Console/Opening_the_Web_Console) 68 | * [Opening the Firefox JavaScript Debugger](https://developer.mozilla.org/en-US/docs/Tools/Debugger/How_to/Open_the_debugger) 69 | * [Set a breakpoint - Firefox Developer Tools](https://developer.mozilla.org/en-US/docs/Tools/Debugger/How_to/Set_a_breakpoint) 70 | * [Step through code - Firefox Developer Tools](https://developer.mozilla.org/en-US/docs/Tools/Debugger/How_to/Set_a_breakpoint) 71 | 72 | #### [Java](https://java.com/) 73 | 74 | The Java code samples are all compatible with [Java SE](https://en.wikipedia.org/wiki/Java_Platform%2C_Standard_Edition "Java Platform, Standard Edition") 8. They must first be compiled before they can be run. To compile them, invoke the Java compiler like this at a command shell: 75 | 76 | ```sh 77 | # to compile the recursive implementation of the binary search example 78 | javac binary_search/BinarySearchRecursive.java 79 | ``` 80 | 81 | Once compiled, the code samples can be run by informing the Java application launcher where to find them, and which class's code to execute. Do so like this: 82 | 83 | ```sh 84 | java -classpath binary_search BinarySearchRecursive 85 | # ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^ 86 | # Tell Java to look for Name of the class whose `main()` 87 | # compiled code (`.class`) method should be executed. 88 | # files in the 89 | # following directory. 90 | ``` 91 | 92 | Java's standard debugger is called `jdb` (on both [macOS/*nix](https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jdb.html) and [Windows](https://docs.oracle.com/javase/8/docs/technotes/tools/windows/jdb.html)). To debug the Java code samples, first compile them with the `-g` option to include debugging information: 93 | 94 | ```sh 95 | javac -g binary_search/BinarySearchRecursive.java 96 | # ^^ 97 | # Make sure you include this `-g` switch! 98 | ``` 99 | 100 | Once compiled with debugging information included, you can use the Java debugger to inspect the values of all variables during program execution, run the code one line at a time, step by step, and more: 101 | 102 | ```sh 103 | jdb -classpath binary_search BinarySearchRecursive.java 104 | ``` 105 | 106 | Once in the debugger, type `help` to get help. (The `jdb` help is pretty thorough but can be terse, so feel free to hop into the [Better Angels's public chat room](https://gitter.im/betterangels/better-angels) if you need help from a human.) 107 | 108 | #### [PHP](https://php.net/) 109 | 110 | The PHP code samples are all compatible with PHP 5.6 and newer. To run them, execute them like this at a command shell: 111 | 112 | ```sh 113 | # to run the object-oriented implementation of the singly linked list data structure example 114 | php singly_linked_list/singly_linked_list_object_oriented.php 115 | ``` 116 | 117 | PHP's standard debugger is called [`phpdbg`](http://phpdbg.com/). To debug the PHP code samples, execute them like so: 118 | 119 | ```sh 120 | phpdbg singly_linked_list/singly_linked_list_object_oriented.php 121 | ``` 122 | 123 | You can use the debugger to run one line of code at a time, and it will allow you to inspect the values of all the variables during program execution. Once in the debugger, type `help` to get help. (The `phpdbg` help is verbose but not very intuitive, so feel free to hop into the [Better Angels's public chat room](https://gitter.im/betterangels/better-angels) if you need help from a human.) 124 | 125 | #### [Perl](https://perl.org/) 126 | 127 | The Perl code samples are all compatible with Perl 5.16 and newer. To run them, execute them like this at a command shell: 128 | 129 | ```sh 130 | # to run the functional-style implementation of the stack data structure example 131 | perl stack/stack_functional.pl 132 | ``` 133 | 134 | [Perl has a built-in debugger](http://perldoc.perl.org/perldebug.html#The-Perl-Debugger). To debug the code samples, execute them like so: 135 | 136 | ```sh 137 | perl -d stack/stack_functional.pl # debug the functional stack example 138 | ``` 139 | 140 | You can use the debugger to run one line of the code at a time, and it will allow you to inspect the values of all the variables during program execution. Once in the debugger, type `h` to get help. (The Perl debugger's help is pretty thorough, but can be terse, so feel free to hop into the [Better Angels's public chat room](https://gitter.im/betterangels/better-angels) if you need help from a human.) 141 | 142 | #### [Python](https://python.org/) 143 | 144 | The Python code samples are all compatible with both [Python 2.7](https://docs.python.org/2.7/) and [Python 3](https://docs.python.org/3/) versions. To run them, execute them like this at a command shell: 145 | 146 | ```sh 147 | # to run the iterative implementation of the binary search algorithm example 148 | python binary_search/binary_search_iterative.py 149 | ``` 150 | 151 | Python's standard debugging module is called [`pdb`](https://docs.python.org/3/library/pdb.html). To debug the code samples, execute them like so: 152 | 153 | ```sh 154 | python -m pdb binary_search/binary_search_iterative.py # debug the binary_search_iterative.py example 155 | ``` 156 | 157 | You can use the debugger to run one line of the code at a time, and it will allow you to inspect the values of all the variables during program execution. Once in the debugger, type `help` to get help. (The `pdb` help is pretty good, but feel free to hop into the [Better Angels's public chat room](https://gitter.im/betterangels/better-angels) if you need help from a human.) 158 | 159 | #### [Ruby](https://www.ruby-lang.org/) 160 | 161 | The Ruby code samples are all compatible with Ruby 2.0 and newer. To run them, execute them like this at a command shell: 162 | 163 | ```sh 164 | # to run the imperative-style implementation of the selection sort algorithm example 165 | ruby selection_sort/selection_sort_imperative.rb 166 | ``` 167 | 168 | Ruby's standard debugging library is called [`Debug`](http://ruby-doc.org/stdlib-2.0.0/libdoc/debug/rdoc/DEBUGGER__.html). To debug the Ruby code samples, execute them like so: 169 | 170 | ```sh 171 | ruby -r debug selection_sort/selection_sort_imperative.rb 172 | ``` 173 | 174 | You can use the debugger to run one line of code at a time, and it will allow you to inspect the values of all the variables during program execution. Once in the debugger, type `help` to get help. (Ruby's debugger help is somewhat limited, so feel free to hop into the [Better Angels's public chat room](https://gitter.im/betterangels/better-angels) if you need help from a human.) 175 | 176 | ### About code comments 177 | 178 | In addition to containing detailed inline code comments, each example is also formally documented using the best practices of the language in which the example code is written. Formal documentation means that the files, classes, class members, methods, functions, arguments of each function, and other relevant implementation details are accessible by tools that automatically generate a programmer's manual for how to use the class, method, or function implemented by the example. Each language has its own de-facto standard tool for this: 179 | 180 | * [JSDoc](http://usejsdoc.org/) is used for documenting the JavaScript code samples. 181 | * [Javadoc](https://docs.oracle.com/javase/8/docs/technotes/guides/javadoc/index.html) is used for documenting the Java code samples. 182 | * [PHPDoc](https://phpdoc.org/) is used for documenting the PHP code samples. 183 | * [Plain Old Documentation (POD)](http://perldoc.perl.org/perlpod.html) format is used for documenting the Perl code samples. 184 | * [Google-style Python docstrings](https://google.github.io/styleguide/pyguide.html?showone=Comments#Comments) are used for documenting the Python code samples. Additionally, the Python code samples embed [`doctest`s](https://en.wikipedia.org/wiki/Doctest) to show example usage and output. 185 | * [RDoc](http://rdoc.sourceforge.net/doc/index.html) is used for documenting the Ruby code samples. 186 | 187 | I've done this in order to habitualize novice programmers to reading (and hopefully writing) such auto-generatable documentation. Since each code sample is self-contained and relatively small, this also provides a good opportunity to practice installing, using, and tweaking the formatting or output of such automatic code documentation tools, if you want to do that. (I recommend it.) 188 | 189 | ## Exercise suggestions 190 | 191 | In addition to running the code samples, I suggest you try one or more of the following exercises with each sample. It's more fun if you can find a friend to do them with. This is called "pair programming" (or just "pairing" for short), and it works a little bit like the way a pilot and co-pilot collaborate when flying a plane: one person has their hands on the keyboard (the "driver") and the other person suggests things to try (the "navigator"). Switch up who's driving and who's navigating as often as you feel comfortable, but try to make sure one of you isn't monopolizing one role or the other. (You might be surprised how much you can learn from navigating rather than driving, or vice versa, if you're not used to it.) 192 | 193 | * Re-implement one of the algorithms or construct one of the data structures using a different programming language than the ones in the repository. For instance, if you know Ruby but only see a Python code sample, rewrite the code so it does the same thing in Ruby. 194 | * Re-implement the same algorithm using a different programming style. For instance, if you see a code sample implementing an algorithm recursively, implement an iterative variant, or vice versa. 195 | -------------------------------------------------------------------------------- /selection_sort/README.markdown: -------------------------------------------------------------------------------- 1 | # Selection sort 2 | 3 | The selection sort algorithm is a simple technique that takes a set of unordered items and orders them according to some characteristic that can be compared against each other item. Common examples of sort order are "highest to lowest" or "alphabetically." For instance, you might have been keeping a table of sports teams and the number of games they've each won: 4 | 5 | | Team | Wins | 6 | |---------|------| 7 | | Pirates | 4 | 8 | | Robots | 5 | 9 | | Aliens | 8 | 10 | | Ninjas | 6 | 11 | 12 | Before you can do interesting things with this data, you'll often need to sort it. (For instance, [binary search](../binary_search/README.markdown) requires a *sorted* list.) You could sort this data alphabetically by team name, so that you can use it like a league catalog: 13 | 14 | | Team | Wins | 15 | |---------|------| 16 | | Aliens | 8 | 17 | | Ninjas | 6 | 18 | | Pirates | 4 | 19 | | Robots | 5 | 20 | 21 | Or perhaps you want it to be a leaderboard. In that case, it needs to show the team with the most wins first, and the least wins last: 22 | 23 | | Team | Wins | 24 | |---------|------| 25 | | Aliens | 8 | 26 | | Ninjas | 6 | 27 | | Robots | 5 | 28 | | Pirates | 4 | 29 | 30 | In either case, you can use selection sort to take your unordered data and order it as you want. The algorithm works rather intuitively: we examine the data a number of times, each time "selecting" the item that matches our criteria (hence the name). Let's say we want to make a leaderboard. In that case, each time we examine the data, we select the team with the most wins. To go from the unordered data to a leaderboard would take these steps: 31 | 32 | 1. Make a new, empty table alongside our original table: 33 | ``` 34 | Step 1: 35 | 36 | ORIGINAL TABLE NEW TABLE 37 | 38 | | Team | Wins | | Team | Wins | 39 | |---------|------| |---------|------| 40 | | Pirates | 4 | 41 | | Robots | 5 | 42 | | Aliens | 8 | 43 | | Ninjas | 6 | 44 | ``` 45 | 1. Take your finger and put it next to the first row of the original table. This is to keep track of where we are. 46 | ``` 47 | Step 2: 48 | 49 | ORIGINAL TABLE NEW TABLE 50 | 51 | | Team | Wins | | Team | Wins | 52 | |---------|------| |---------|------| 53 | --> | Pirates | 4 | 54 | | Robots | 5 | 55 | | Aliens | 8 | 56 | | Ninjas | 6 | 57 | ``` 58 | 1. Note that team's win count. (It's `4`.) Let's remember this number as our "highest number so far." 59 | ``` 60 | Step 3: 61 | 62 | ORIGINAL TABLE NEW TABLE 63 | 64 | | Team | Wins | | Team | Wins | 65 | |---------|------| |---------|------| 66 | -->*| Pirates | 4 |HI: 4 67 | | Robots | 5 | 68 | | Aliens | 8 | 69 | | Ninjas | 6 | 70 | ``` 71 | 1. For all of the remaining rows in our table, compare that team's win count with the highest number we remember, replacing it with the new team's win count if it is higher than what we read before: 72 | ``` 73 | Step 4.1: 74 | 75 | ORIGINAL TABLE NEW TABLE 76 | 77 | | Team | Wins | | Team | Wins | 78 | |---------|------| |---------|------| 79 | | Pirates | 4 | 80 | -->*| Robots | 5 |HI: 5 81 | | Aliens | 8 | 82 | | Ninjas | 6 | 83 | 84 | Step 4.2: 85 | 86 | ORIGINAL TABLE NEW TABLE 87 | 88 | | Team | Wins | | Team | Wins | 89 | |---------|------| |---------|------| 90 | | Pirates | 4 | 91 | | Robots | 5 | 92 | -->*| Aliens | 8 |HI: 8 93 | | Ninjas | 6 | 94 | 95 | Step 4.3: 96 | 97 | ORIGINAL TABLE NEW TABLE 98 | 99 | | Team | Wins | | Team | Wins | 100 | |---------|------| |---------|------| 101 | | Pirates | 4 | 102 | | Robots | 5 | 103 | *| Aliens | 8 |HI: 8 104 | --> | Ninjas | 6 | 105 | ``` 106 | 1. When we reach the end of the table, move the row with the highest win count from the original table to the new one: 107 | ``` 108 | ORIGINAL TABLE NEW TABLE 109 | 110 | | Team | Wins | | Team | Wins | 111 | |---------|------| |---------|------| 112 | | Pirates | 4 | | Aliens | 8 | 113 | | Robots | 5 | 114 | | Ninjas | 6 | 115 | ``` 116 | 1. Repeat the process from step 2 to the end of the table again. 117 | 118 | Here's a visualization of each remaining step in the process: 119 | 120 | ``` 121 | ORIGINAL TABLE NEW TABLE 122 | 123 | | Team | Wins | | Team | Wins | 124 | |---------|------| |---------|------| 125 | -->*| Pirates | 4 |HI: 4| Aliens | 8 | 126 | | Robots | 5 | 127 | | Ninjas | 6 | 128 | 129 | 130 | ORIGINAL TABLE NEW TABLE 131 | 132 | | Team | Wins | | Team | Wins | 133 | |---------|------| |---------|------| 134 | | Pirates | 4 | | Aliens | 8 | 135 | -->*| Robots | 5 |HI: 5 136 | | Ninjas | 6 | 137 | 138 | 139 | ORIGINAL TABLE NEW TABLE 140 | 141 | | Team | Wins | | Team | Wins | 142 | |---------|------| |---------|------| 143 | | Pirates | 4 | | Aliens | 8 | 144 | | Robots | 5 | 145 | -->*| Ninjas | 6 |HI: 6 146 | 147 | 148 | ORIGINAL TABLE NEW TABLE 149 | 150 | | Team | Wins | | Team | Wins | 151 | |---------|------| |---------|------| 152 | | Pirates | 4 | | Aliens | 8 | 153 | | Robots | 5 | | Ninjas | 6 | 154 | 155 | 156 | ORIGINAL TABLE NEW TABLE 157 | 158 | | Team | Wins | | Team | Wins | 159 | |---------|------| |---------|------| 160 | -->*| Pirates | 4 |HI: 4| Aliens | 8 | 161 | | Robots | 5 | | Ninjas | 6 | 162 | 163 | 164 | ORIGINAL TABLE NEW TABLE 165 | 166 | | Team | Wins | | Team | Wins | 167 | |---------|------| |---------|------| 168 | | Pirates | 4 | | Aliens | 8 | 169 | -->*| Robots | 5 |HI: 5| Ninjas | 6 | 170 | 171 | 172 | ORIGINAL TABLE NEW TABLE 173 | 174 | | Team | Wins | | Team | Wins | 175 | |---------|------| |---------|------| 176 | | Pirates | 4 | | Aliens | 8 | 177 | | Ninjas | 6 | 178 | | Robots | 5 | 179 | 180 | 181 | ORIGINAL TABLE NEW TABLE 182 | 183 | | Team | Wins | | Team | Wins | 184 | |---------|------| |---------|------| 185 | -->*| Pirates | 4 |HI: 4| Aliens | 8 | 186 | | Ninjas | 6 | 187 | | Robots | 5 | 188 | 189 | 190 | ORIGINAL TABLE NEW TABLE 191 | 192 | | Team | Wins | | Team | Wins | 193 | |---------|------| |---------|------| 194 | | Aliens | 8 | 195 | | Ninjas | 6 | 196 | | Robots | 5 | 197 | | Pirates | 4 | 198 | ``` 199 | 200 | That's it. There's no magic! This process might even have been how you would've gone about sorting the data in the table "by hand." 201 | 202 | ## Space 203 | 204 | One thing to notice about selection sort is that there are always the same number of total rows across both tables as the number of rows that we started with in the first table. This is because after looking through all the items, we remove the "selected" item from the original data and add it to the new data.^{[1](#footnote-1)} This makes selection sort very space-efficient; at no time do we have more than 4 rows across the two tables (the same number of rows we started with). 205 | 206 | In computer science lingo, this fact about selection sort is described as "using `O(n)` space" (`O(n)` is pronounced "Oh of N"). The `n` means "however many items there are" and the capital letter O with parentheses is a (very academic) way of notating algorithm efficiency, called "Big-Oh notation." We use `n` instead of `4` here because if we had five rows instead of four, the same fact ("there are always the same number of total rows across both tables as the number of rows that we started with in the first table") could be truthfully asserted. In the case of having five rows instead of four, there will always be five total rows, so `n` is a stand-in for whatever that number happens to be for a given dataset's size. 207 | 208 | ## Time 209 | 210 | Another thing to notice about selection sort is that to find the team with the most wins, you have to take a step (check each row of the original table) however many number of times there are rows in that table. Of course, how much *real time* a single "step" takes (1 second or 1 microsecond, etc.) depends on how fast your computer is. The faster your computer, the faster it will be able to accomplish whatever it needs to do in each step. 211 | 212 | Even a very slow computer might be fast enough if you only have four items to work on. With so few items, you may not even need a computer! But the whole point of using computers in the first place is to work on datasets that are too big to work on by hand. What if we had many more than just four teams? What if we had one hundred teams, or one thousand teams, or ten thousand teams? 213 | 214 | That's a pretty subtle wrinkle, so let's work it out. 215 | 216 | In our example, there are four rows, so we say the *n*umber equals four, or `n = 4`. In other words, it takes `4` steps to be sure we've found the very highest number (to find the team with the most wins) because we need to look at the number of wins of *each* team at least once. Then, to find *each* next-highest number (so we end up ordering the teams in a leaderboard), it takes `n - 1` steps. 217 | 218 | 1. On the first pass, when `n` begins at `4`, it takes four steps to find the most-winning team. 219 | * After we found that team and remove them from the original table, we also have to look at each *remaining* team's row at least once. Since we've removed one, there will now be three left. 220 | 1. Next, we repeat the process we went through when `n` was `4` to find the team with the second-most wins, except this time `n` is `3`. 221 | * So far, we've taken seven steps: 4 to find the most-winning team, and another 3 to find the second-most winning team (`4 + 3`). 222 | 1. When we start looking for the third-most winning team, there are only two teams remaining, but we still have to look at both of them, which takes us two steps. 223 | * Now we've taken `4 + 3 + 2` steps. 224 | 1. At this point, we have only one team left, and one more step to take, which means the whole process takes `4 + 3 + 2 + 1` total steps. 225 | 226 | Let's look at that pattern again, more visually. Each circle represents a step: 227 | 228 | ``` 229 | o o o o <-- 4 steps/items to start, remove one 230 | o o o <-- + 3 steps/items remaining, remove one 231 | o o <-- + 2 steps/items remaining, remove one 232 | o <-- + 1 step remaining, remove it 233 | --------- 234 | 10 total steps 235 | ``` 236 | 237 | But what if we had ten teams? Let's draw that again, but starting with ten instead of four: 238 | 239 | ``` 240 | o o o o o o o o o o <-- 10 steps 241 | o o o o o o o o o <-- + 9 steps 242 | o o o o o o o o <-- + 8 steps 243 | o o o o o o o <-- + 7 steps 244 | o o o o o o <-- + 6 steps 245 | o o o o o <-- + 5 steps 246 | o o o o <-- + 4 steps 247 | o o o <-- + 3 steps 248 | o o <-- + 2 steps 249 | o <-- + 1 step 250 | --------- 251 | 55 total steps 252 | ``` 253 | 254 | This results in a clear (triangular) pattern. Here's a table showing that same pattern for every value of `n` between `1` and `10`: 255 | 256 | | Rows (`n`) | Total steps | 257 | |------------|-------------| 258 | | 1 | 1 | 259 | | 2 | 3 | 260 | | 3 | 6 | 261 | | 4 | 10 | 262 | | 5 | 15 | 263 | | 6 | 21 | 264 | | 7 | 28 | 265 | | 8 | 36 | 266 | | 9 | 45 | 267 | | 10 | 55 | 268 | | … | … | 269 | | 100 | 5,050 | 270 | | … | … | 271 | | 1,000 | 500,500 | 272 | | … | … | 273 | | 10,000 | 50,005,000 | 274 | 275 | The thing to pay attention to is that the more items (rows) we have, the bigger the *difference* between the last total number of steps and the new total number of steps are. This difference grows at a rate proportional to the number of items we add, i.e., it grows by exactly `n`. That may make intuitive sense to you already, because each time we add a row, we are adding `n` steps. The wrinkle is that these are an *additional* `n` steps, not a *total* of `n` steps. 276 | 277 | Said another way, when we add one row to a table that had three rows, we end up with a total of four rows and thus have to take all the steps we would have had to take when the table was only three rows long *plus* `4` *additional* steps. Likewise, if we add one more row again, we'd have a total of five rows and thus also add `5` *additional* steps, even more than the `4` we added previously. 278 | 279 | In computer science lingo, this fact about selection sort is described as "using `O(n^2)` time" (`O(n^2)` is pronounced "Oh of N-squared" or sometimes just "quadratic"). But *why* is it O(n^2)? To understand that, let's take a closer look at those triangles showing us the total number of steps it takes selection sort to do its work. 280 | 281 | Recall that the total number of steps selection sort needs to take to completely order our four-team leaderboard looks something like this: 282 | 283 | ``` 284 | o o o o <-- 4 steps/items to start, remove one 285 | o o o <-- + 3 steps/items remaining, remove one 286 | o o <-- + 2 steps/items remaining, remove one 287 | o <-- + 1 step remaining, remove it 288 | --------- 289 | 10 total steps 290 | ``` 291 | 292 | This expression looks like `4 + 3 + 2 + 1 = 10`. That's fine specifically for `4` rows (when `n` is `4`), but how can we find out the total number of steps for *any* number of rows (for *any* value of `n`)? To do that, we need to have noticed the following pattern (and if you didn't, it's okay, because you're about to notice it) in the triangles above: 293 | 294 | ``` 295 | o o o o <-- 4 <---------------------------| first number (4) 296 | o o o <-- + 3 <—+ second number (3) plus | plus 297 | o o <-- + 2 <—+ 2nd-to-last number (2)=5| last number (1) 298 | o <-- + 1 <---------------------------| ALSO equals (5) 299 | ``` 300 | 301 | More visually again, this same pattern: 302 | 303 | ``` 304 | 4 on this side 305 | | | | | 306 | v v v v 307 | 4 —> o o o o <— 4 308 | on —> o o o <— on 309 | this —> o o <— this 310 | side —> o <— side, too! 311 | ``` 312 | 313 | This pattern holds no matter how big our triangles get. Adding the first and last number together will always yield the same result as adding the second number and the second-to-last number, the third number with the third-to-last number, and so on. The fact that this pattern holds true for any value of `n` is the key. Knowing this, we can write our procedure out without the triangles: 314 | 315 | ``` 316 | (4 + 3 + 2 + 1) <-- Our original procedure. 317 | + (1 + 2 + 3 + 4) <-- A copy of our original procedure, reversed. 318 | ----------------- 319 | 5 + 5 + 5 + 5 <-- Each pair added to its reversed copy. 320 | ``` 321 | 322 | Again, the important thing is that the sum of each pair is always the same! It's `5`, our original number (`4`) plus one. (That means we can replace `5` with `n + 1` later on). Now we have four copies of `5` all added together: `5 + 5 + 5 + 5` or, written more concisely, `4 * 5`, which totals `20`. But remember that we added together *two* copies of our original numbers, so the actual total number of steps is *half* that, or `4 * 5 / 2`, which totals `10`. 323 | 324 | So now we know that the total number of steps selection sort needs to take to finish working on a dataset of `n` items is `n * (n + 1) / 2`; we've replaced `4` with `n` and replaced `5` with `n + 1` to generalize the expression. Once again, we can visualize this more clearly by drawing it out with triangles. We start with one: 325 | 326 | ``` 327 | o <-- + 1 step 328 | ``` 329 | 330 | And we have a total of `n` steps: 331 | 332 | ``` 333 | o o o o <-- "n" steps 334 | ``` 335 | 336 | Finally, we know that we will need to pass over the data however many times there were items in the dataset to begin with, which in our case was `n`: 337 | 338 | ``` 339 | —— 340 | o <-- + 1 step \ 341 | …………… <-- » "n" times 342 | o o o o <-- "n" steps / 343 | —— 344 | ``` 345 | 346 | When computer scientists talk about "Big-O," they're only talking about an approximation for super large numbers. Big-O isn't very useful for numbers like 10. So let's look at `n * (n + 1) / 2` and pretend like `n` is a much bigger number such as, say, ten thousand. 347 | 348 | What is half of a really big number (i.e., "a really big number divided by two")? *Still* a really big number. To make the point, let's look at 10,000. That's a ten with *three* zeros after it. What's half of 10,000? It's 5,000, which is *also* a number with three zeros after it. Both numbers are "three orders of magnitude" in size, so as far as computer notation is concerned, both are (basically) the same. A program which needs to take 10,000 steps to complete won't be noticably different to a human than a program which needs to take 5,000 steps to complete. 349 | 350 | This is all to say that (for really big values of `n`), `n * (n + 1) / 2` is basically the same as `n * (n + 1)`, without the `/ 2` at the end. The `/ 2` just isn't important. By the same logic, a really big number `+ 1` is *also* a really big number. So, when dealing with Big-O notation, we can safely drop the `+ 1` as well. For all we care, `10,000 + 1` might as well be `10,000`. 351 | 352 | So again, `n * (n + 1)` might as well be `n * (n)`, or just `n * n`. In mathemtical notation, `n * n` is just another way of writing n². So, for very large values of `n`, `n * (n + 1) / 2` is *basically* the same as n². That's all that O(n²) means: it means "something's that *basically* the same as n² whenever the `n`'s are really big numbers." 353 | 354 | Keep in mind that when talking about how fast an algorithm works (how many steps it would take to complete), its rate of growth matters much more than whatever number `n` happens to be, because the point of an algorithm is that it can be applied to any value of `n`. The problem is that a slow algorithm (like selection sort) is only really useful for small values of `n` (that is, for small amounts of data). In other words, you *can* use selection sort to order a list of 10,000 items. However, if it takes your computer 1 second to complete each "step," then you'll be waiting fifty million five thousand seconds or one and a half *years* for the algorithm to complete. Even if each step took only 1 *micro*second (one thousandth of a second), you'd still be waiting 5.7 *days* for it to finish. That's a long time to wait for your computer to do something. 355 | 356 | That's why selection sort is used only when there are an acceptably small number of items to order, and even then, only if there isn't much memory space to do it in. 357 | 358 | ## Footnotes 359 | 360 | ### Footnote 1 361 | 362 | Since selection sort makes changes to the list it was given as it's sorting it, computer programmers call this an ["in-place" algorithm](https://en.wikipedia.org/wiki/In-place_algorithm). 363 | 364 | ## Further reading 365 | 366 | 1. [Wikipedia](https://en.wikipedia.org/wiki/Selection_sort) 367 | 1. [Big O notation - Wikipedia](https://en.wikipedia.org/wiki/Big_O_notation) 368 | 1. [Triangular number - Wikipedia](https://en.wikipedia.org/wiki/Triangular_number) 369 | 1. [Visualization of the triangular number formula](https://web.archive.org/web/20170110221917/http://i1115.photobucket.com/albums/k544/akinuri/nth%20triangle%20number-01.jpg) 370 | 1. [Quadratic time - Wikipedia](https://en.wikipedia.org/wiki/Quadratic_time) 371 | 1. [Order of magnitude - Wikipedia](https://en.wikipedia.org/wiki/Order_of_magnitude) 372 | --------------------------------------------------------------------------------