└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # learn-data-structures 2 | Learn about data structures and algorithms, and the effects they can have on code performance 3 | 4 | ## *Why*? 5 | 6 | Data structures and algorithms play a massive role in any and every programming 7 | language. With a better understanding of both you can pick the ones that best 8 | suit your needs and massively speed up your applications. 9 | 10 | **Better hardware is not a solution** 11 | 12 | Better hardware can only do so much. No matter how powerful your hardware is, 13 | it is not going to make up for an inefficient algorithm. 14 | 15 | The right choices in data structure and algorithm could be the difference 16 | between something being processed in seconds or hours. 17 | 18 | ## *What*? 19 | 20 | ### Data Structures 21 | 22 | > Data is a broad term that refers to all types of information down to the most 23 | basic numbers and strings. 24 | 25 | [Data structures](https://en.wikipedia.org/wiki/Data_structure), refer to how 26 | data is organised. 27 | 28 | ```elixir 29 | hello = "hello" # data 30 | world = "world" # data 31 | 32 | [hello, world] # data structure 33 | ``` 34 | 35 | ### Algorithms 36 | 37 | Simply put, an [algorithm](https://en.wikipedia.org/wiki/Algorithm) is a set of 38 | steps for solving a problem. You would used an algorithm for making a sandwich 39 | for example: 40 | 41 | ``` 42 | 1 - Grab 2 slices of bread 43 | 2 - Butter each slice 44 | 3 - Put filling on top of one of the slices 45 | 4 - Put other slice on top of filling 🥪 46 | ``` 47 | 48 | In computing an algorithm is the same thing. Just a set of steps for a computer 49 | program to accomplish a task. 50 | 51 | There are often a number of ways your could accomplish a task so learning to 52 | pick the right algorithm is key. 53 | 54 | ## *How*? 55 | 56 | Most examples of code in this README will be in elixir. If you would like to 57 | learn a little more about elixir, head over to 58 | [learn-elixir](https://github.com/dwyl/learn-elixir). 59 | 60 | ## Data Structures 61 | 62 | As we mentioned earlier, data structures refer to how data is organised. This 63 | organisation is not just for organisation's sake however. The data structures you 64 | use could massively effect the speed at which your application runs. It could 65 | even be the difference between whether you app runs at all or errors because it 66 | cannot handle the load. 67 | 68 | Learning about the different types of data structures and the pros/cons of each 69 | will allow you to pick the one that best suits the needs of your application. 70 | 71 | ### Operations 72 | 73 | Operations are how we interact with data structures. Some of the common 74 | operations are as follows: 75 | 76 | - Insert: To add a value to a data structure 77 | - Delete: To remove a value from a data structure 78 | - Read: Looking something up from a specific spot in a data structure 79 | - Search: Looking for a value in a data structure 80 | 81 | Different operations run at different speeds. However, speed is not measured in 82 | time as you expect. It is measured in the number of steps that the operation 83 | takes to complete. 84 | 85 | The reason for this is because the time an operation will take can vary from 86 | machine to machine. It may take 5 seconds to search for a value on an older 87 | computer compared to 2 seconds on a newer one, despite the fact that they are 88 | using the same operation. 89 | 90 | If the operation is measured in the number of steps that it takes though, this 91 | is a constant value that will not vary no matter the machine. If a operation 92 | takes 2 steps on 'machine a', it will take 2 steps on 'machine b'. And an 93 | operation that take 2 steps will always be faster than an operation that 94 | takes 200. 95 | 96 | Now let's take a look at these operations being applied to an array (the first 97 | data structure we are going to look at) 98 | 99 | ### Array 100 | 101 | An [array](https://en.wikipedia.org/wiki/Array_data_structure) is just a list of 102 | data elements. 103 | 104 | ``` 105 | array = ["one", "two", "three", "four", "five"] 106 | ``` 107 | 108 | #### Reading 109 | > The following examples are not 'exactly' how the computer stores arrays in 110 | memory. The computer actually stores all the elements of an array together in 111 | memory, with each cell of the computer's memory holding one element's value. The 112 | computer records the memory address of where that array started and uses it to 113 | jump straight to that spot in memory. How the computer stores the information is 114 | not the key point here so we will not go into much detail here. 115 | 116 | **Reading from an array takes one step**. Each cell in an array is given an index. 117 | The index always begins at 0 and increases by 1 with each element. The 118 | computer is able to 'jump' to any index in an array and get the data from 119 | inside. 120 | 121 | So for using the array above, say we want to get the element from the 2nd index. 122 | The computer knows that the first index is index 0 and that index 2 is exactly 2 123 | over from index 0. With this the computer can jump right to the 2nd and then 124 | grab the data from that point. 125 | 126 | #### Searching 127 | If we wanted to check if a value exists in this array then we would use the 128 | search operation. Say we wanted to check if the value `"six"` was in the array. 129 | We can just glance at the array and clearly see that it is not, but the computer 130 | does not have this skill. The computer needs to access every element 131 | individually and check if the value it get's back is `"six"`. If it finds a 132 | match it will stop and return `true` otherwise it will continue until there are no more 133 | elements, at which point it will return `false`. 134 | 135 | Let's count the number of steps that searching our array for the value `"six"` 136 | would take. The computer starts by checking index 0 and sees that the value it 137 | contains is `"one"` so moves on to the next index. At index 1 it checks the 138 | value and sees that it is `"two"`. This is not what we are looking for so the 139 | computer again moves on.... 140 | (the computer repeats these steps until it gets to the end of the array). 141 | 142 | In total **the computer had to check 5 elements before it could be sure that the 143 | value was not there so this operation took 5 steps to complete**. If the array 144 | had a million values then the operation **could potentially take a million steps** to 145 | complete. However, if the element we were searching for was the first element in 146 | the array, then the operation would only take 1 step. **The number of steps is 147 | dependent on where or if the element is in the array.** 148 | 149 | This type of search is known as a **linear search**. There are other, more complex 150 | types of search as well but this is just a basic search that will work with all 151 | array types. 152 | 153 | #### Inserting and deleting 154 | Inserting and deleting from an array work fairly similarly to one another so 155 | I'll cover both here. 156 | 157 | Say we want to insert a value into our array. Similar to how the number of 158 | steps in searching is dependent on where the value is in the array, inserting 159 | depends on where you want to insert the element into the array. If you want to 160 | insert an element onto the end of an array then this is considered the 'best 161 | case scenario' as it only take the computer one step. 162 | 163 | The number of steps increase when you want to insert an element anywhere else in 164 | the array. Say for example you want to insert the value`"ten"` into our array near the 165 | beginning at index 2. But index 2 already has a value, it has the value 166 | `"three"`. As we do not want to replace any of the values in our array, we only 167 | want to add to it, we first need to shift all the values over to make space. 168 | 169 | This means that `"five"` would become index 5, `"four"` would become index 4 and 170 | `"three"` would become index 3. Each one of these is an individual step as the 171 | next index cannot be shifted until the former has been complete. Now index 2 is 172 | free and we can insert the new value `"ten"` in. 173 | 174 | Our array looks like this now... 175 | ``` 176 | ["one", "two", "ten", "three", "four", "five"] 177 | ``` 178 | 179 | This insert would have taken 4 steps in total. 180 | ``` 181 | step 1 - shifting value "five" from index 4 to index 5 182 | step 2 - shifting value "four" from index 3 to index 4 183 | step 3 - shifting value "three" from index 2 to index 3 184 | step 4 - inserting value "ten" in index 2 185 | ``` 186 | 187 | The worst case for inserting into an array would be inserting into the start of 188 | it as all elements would have to be shifted before the insert could take place. 189 | This would mean that in our array of 5 elements the number of steps to take 190 | would be 6. Put another way, if `N` is the number of elements, the number of 191 | steps in the worst case for inserting is `N + 1`. 192 | 193 | Deleting is very similar to the above. The best case is to delete an element 194 | from the end of the array and this will only take 1 step. 195 | 196 | If we want to delete the value at index 2 from our array (the value `"ten"`) 197 | then the following steps would be taken... 198 | ``` 199 | step 1 - deleting value "ten" in index 2 200 | step 2 - shifting value "three" from index 3 to index 2 201 | step 3 - shifting value "four" from index 4 to index 3 202 | step 4 - shifting value "five" from index 5 to index 4 203 | ``` 204 | 205 | The worst case for deleting, just like inserting, is to delete the first 206 | element from the array. 207 | 208 | ### Set (array-based set) 209 | 210 | An array based set is very similar to an array but with one key difference, **it 211 | never allows duplicate values to be inserted.** 212 | 213 | For example, what have the following set... 214 | ``` 215 | set = [1,2,3] 216 | ``` 217 | 218 | If we tried to add `1` to the set the computer would not let it happen as the 219 | set already contains the value. 220 | 221 | Sets come in handy when you need to make sure that you have no duplicate 222 | data. 223 | 224 | The operations that we used on our array all work in the same way on our set, 225 | with the exception of insert. 226 | 227 | When we call insert on our array based set it does work similarly to our example 228 | above but, before it can insert, it first needs to check every cell to make sure 229 | the value we want to insert is not already in the set. 230 | 231 | Let's take our currently defined set and try to insert the value `4` onto the 232 | end 233 | 234 | ``` 235 | step 1 - check the value at index 0 236 | step 2 - check the value at index 1 237 | step 3 - check the value at index 2 238 | step 4 - insert value 4 onto the end of set 239 | ``` 240 | 241 | In the above example, we inserted onto the end of our set and it took 4 steps. 242 | Like inserting onto the end of an array, this is the best case scenario. Unlike 243 | the array, this took 4 steps compared to 1. 244 | 245 | This means that if we had a set containing 1,000,000 items and we wanted to 246 | insert onto the end of that set it would take 1,000,000 steps checking the 247 | values in the indexes and then 1 step inserting. Number of steps is `N + 1`. 248 | 249 | Remember, this is the best case scenario. The worst case scenario is if want to 250 | insert into the first element of our set. If we so this, it has to first check 251 | every index, then shift every index over by one (like inserting into the array 252 | did). 253 | 254 | Let's insert `4` into the start of our set this time... 255 | 256 | ``` 257 | step 1 - check the value at index 0 258 | step 2 - check the value at index 1 259 | step 3 - check the value at index 2 260 | step 4 - shift the value 3 to index 3 261 | step 5 - shift the value 2 to index 2 262 | step 6 - shift the value 1 to index 1 263 | step 7 - insert value 4 into index 0 264 | ``` 265 | 266 | This took 7 steps to complete. The way we could express this 'worst case 267 | scenario' is `2N + 1`. 268 | 269 | As you can see this is almost exactly double the number of steps needed to 270 | insert into an array. That doesn't mean that sets are a bad data structure 271 | however. If you need to make sure there is no duplicate data then this could be 272 | the right fit for you. If not, you may be better off with an array. 273 | 274 | ### Ordered Array 275 | 276 | An ordered array is again very similar to an array. The difference here, as the 277 | name suggests, is that this array has to be ordered. 278 | 279 | Lets take the following array... 280 | ``` 281 | [1,2,4,5] 282 | ``` 283 | and try to insert the value `3`. 284 | 285 | Inserting works in a similar way to the set insert. First the computer would 286 | have to go to index 0 and compare the value inside against the one we want to 287 | insert. If the value we want to insert is greater than what is inside index 0, 288 | then we move on. It repeats the steps until it comes across a value that is 289 | greater than the value we want to insert. At this point it then shifts all the 290 | values up an index and inserts the value. 291 | 292 | In our case `3` would be inserted into index 2. 293 | 294 | Search also works in a similar way to previous examples but with a key 295 | difference. With an ordered array, a search can stop early if we know a value 296 | could not possibly be contained in the array. 297 | 298 | For example take the following array 299 | ``` 300 | [1, 10, 37, 85, 96] 301 | ``` 302 | 303 | Now we tell computer to search this for the value `26`. As we mentioned, the 304 | computer will need to check each cell, one at a time, in order, so it sounds 305 | pretty similar to the previous searches. The difference here is that when the 306 | computer comes across a cell that has a larger value than the one we are 307 | searching for, it can immediately stop looking as it knows it can not exist past 308 | that point. 309 | 310 | ``` 311 | step 1 - check the value at index 0 312 | step 2 - check the value at index 1 313 | step 3 - check the value at index 2 314 | ``` 315 | 316 | Only 3 steps needed. Once the computer gets to `index 2` and sees the value `37` 317 | it knows that is larger than the `26` we are looking for and can stop searching 318 | at that point. 319 | 320 | Of course this won't always be the case. If you were looking for 96, or greater, 321 | the search would still have to check every cell. 322 | 323 | However this assumes that our search always starts at the beginning of an array 324 | and works its way through looking for the element in question. This is not true. 325 | A linear search is just one of many algorithms that can be used for searching 326 | arrays. 327 | 328 | ## Algorithms 329 | 330 | An algorithm, as we mentioned earlier, is a set of steps for solving a problem. 331 | 332 | There are many different algorithms, and there are often multiple algorithms 333 | that can be used to achieve the same end results. How they work will be 334 | different however. 335 | 336 | Imagine you are travelling home and your travel options are, a combination of 337 | public transport and walking, or riding a bike. Both will have the same result 338 | of getting you home but one may be faster, safer, more convenient etc. For 339 | example, if your journey home was 300 miles long and there was a super fast 340 | train that took you within a 2 minute walk of your front door, this may be a 341 | better option than riding your bike home (unless you like that kind of thing). 342 | But this doesn't mean that public transport is always the way to go. Let's say 343 | now your journey home is only 2 miles and there is no direct transport option. 344 | In this situation it may be better for you to ride your bike. 345 | 346 | The point of the above is to show that although both have the same result, 347 | depending on the situation, each 'algorithm' has its purpose. 348 | 349 | Picking the one that suites your needs is important as it can greatly effect the 350 | performance of an application. 351 | 352 | We have already seen how to perform a linear search on an array so let's check 353 | out a new type of search. Binary. 354 | 355 | Binary search works by checking the middle element of a list, checking if it 356 | contains the value we are after, and depending on the value in that cell, knows 357 | if is should check the first or second half of the array. This means that after 358 | just one step, binary search has already found that it no longer needs to search 359 | half of the elements in the list (because of the way it works binary search 360 | can only be used on ordered arrays). 361 | 362 | Let's look at an example array of 1 to 10. 363 | ``` 364 | [1, 2, 3, 4, 5, 6, 7, 8, 9] 365 | ``` 366 | 367 | And search for the value `7`. 368 | 369 | If we were to use linear search the computer would check each cell, starting at 370 | `index 0` and working its way up the indexes until it finds our value. We can see 371 | that this would take `7` steps. 372 | 373 | Let's see how binary search will effect this. 374 | 375 | First the computer jumps to the middle of the array and checks the value there. 376 | That value is `5`. As we are searching for the number `7`, and as this is an 377 | ordered array, the computer knows that everything 'to the left' of index 4 is 378 | also lower in value than 7. This means that is can remove these from the list of 379 | possible indexes left to check. 380 | 381 | Now we are left with the values 6 to 9 to check. The computer will again pick the middle index and check to see if it contains our number. In our case there is no exact middle. Could be index 6 or 7 (values 7 or 8) but we'll say in this case the computer picks the lower of the two middle choices and goes with index 6. 382 | 383 | Binary search just found our number in 2 steps!!!!! 384 | 385 | 386 | ``` 387 | step 1 - check the value at index 4 388 | value is 5 so computer discards it and everything to the left away 389 | left with indexes of 5,6,7,8 390 | step 2 - check the value at index 6 391 | value returned is 7 392 | ``` 393 | 394 | Let's do another example where we have to look for `1,000,000` from an array of 395 | numbers ranging from 1 to 1,000,000. Remember, this is a worst case scenario. 396 | 397 | As mentioned in the searching section above, the linear search would take 398 | all `1,000,000` steps to complete this. Let's compare this to the binary search 399 | 400 | Binary steps 401 | ``` 402 | step 1 - check the value at 500,000 403 | value is less than we are looking for so discard it and everything less than 404 | it. (Just cleared HALF A MILLION STEPS IN ONE GO!!!!) 405 | step 2 - check the value at 750,000 406 | value is less than we are looking for so discard it and everything less than 407 | it. 408 | step 3 - check the value at 875,000 - too low, discard all to left 409 | step 4 - check the value at 937,500 - too low, discard all to left 410 | step 5 - check the value at 968,750 - you get the idea... 411 | step 6 - check the value at 968,750 412 | step 7 - check the value at 984,375 413 | step 8 - check the value at 992,187 414 | step 9 - check the value at 996,093 415 | step 10 - check the value at 998,046 416 | step 11 - check the value at 999,023 417 | step 12 - check the value at 999,511 418 | step 13 - check the value at 999,755 - getting close 419 | step 14 - check the value at 999,877 420 | step 15 - check the value at 999,938 421 | step 16 - check the value at 999,969 422 | step 17 - check the value at 999,984 423 | step 18 - check the value at 999,992 424 | step 19 - check the value at 999,996 - nearly there 425 | step 20 - check the value at 999,998 426 | step 21 - check the value at 999,999 427 | step 22 - check the value at 1,000,000 - WE MADE IT 428 | ``` 429 | 430 | This might seem like a lot of text/steps but just think about what this search 431 | algorithm just managed to do. It took a worst case scenario of searching an 432 | array of `1,000,000` values for its `1,000,000`th value and fount it in 22 433 | steps. 434 | 435 | This means the maximum number of steps searching an ordered array of a million 436 | values with binary search is `22`. Quite the improvement over a million I'm sure 437 | you'll agree. 438 | 439 | Check out [this youtube video](https://www.youtube.com/watch?v=EXtkCmRXfMo) to 440 | learn more about binary and linear search. 441 | --------------------------------------------------------------------------------