├── .gitignore
├── README.md
└── cool-python-tips.ipynb


/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # python-is-cool
  2 | 
  3 | A gentle guide to the Python features that I didn't know existed or was too afraid to use. This will be updated as I learn more and become less lazy.
  4 | 
  5 | This uses `python >= 3.6`.
  6 | 
  7 | GitHub has problem rendering Jupyter notebook so I copied the content here. I still keep the notebook in case you want to clone and run it on your machine, but you can also click the Binder badge below and run it in your browser.
  8 | 
  9 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/chiphuyen/python-is-cool/master?urlpath=lab/tree/cool-python-tips.ipynb)
 10 | 
 11 | ## 1. Lambda, map, filter, reduce
 12 | The lambda keyword is used to create inline functions. The functions`square_fn` and `square_ld` below are identical.
 13 | 
 14 | ```python
 15 | def square_fn(x):
 16 |     return x * x
 17 | 
 18 | square_ld = lambda x: x * x
 19 | 
 20 | for i in range(10):
 21 |     assert square_fn(i) == square_ld(i)
 22 | ```
 23 | 
 24 | Its quick declaration makes `lambda` functions ideal for use in callbacks, and when functions are to be passed as arguments to other functions. They are especially useful when used in conjunction with functions like `map`, `filter`, and `reduce`.
 25 | 
 26 | `map(fn, iterable)` applies the `fn` to all elements of the `iterable` (e.g. list, set, dictionary, tuple, string) and returns a map object.
 27 | 
 28 | ```python
 29 | nums = [1/3, 333/7, 2323/2230, 40/34, 2/3]
 30 | nums_squared = [num * num for num in nums]
 31 | print(nums_squared)
 32 | 
 33 | ==> [0.1111111, 2263.04081632, 1.085147, 1.384083, 0.44444444]
 34 | ```
 35 | 
 36 | This is the same as calling using `map` with a callback function.
 37 | 
 38 | ```python
 39 | nums_squared_1 = map(square_fn, nums)
 40 | nums_squared_2 = map(lambda x: x * x, nums)
 41 | print(list(nums_squared_1))
 42 | 
 43 | ==> [0.1111111, 2263.04081632, 1.085147, 1.384083, 0.44444444]
 44 | ```
 45 | 
 46 | You can also use `map` with more than one iterable. For example, if you want to calculate the mean squared error of a simple linear function `f(x) = ax + b` with the true label `labels`, these two methods are equivalent:
 47 | 
 48 | ```python
 49 | a, b = 3, -0.5
 50 | xs = [2, 3, 4, 5]
 51 | labels = [6.4, 8.9, 10.9, 15.3]
 52 | 
 53 | # Method 1: using a loop
 54 | errors = []
 55 | for i, x in enumerate(xs):
 56 |     errors.append((a * x + b - labels[i]) ** 2)
 57 | result1 = sum(errors) ** 0.5 / len(xs)
 58 | 
 59 | # Method 2: using map
 60 | diffs = map(lambda x, y: (a * x + b - y) ** 2, xs, labels)
 61 | result2 = sum(diffs) ** 0.5 / len(xs)
 62 | 
 63 | print(result1, result2)
 64 | 
 65 | ==> 0.35089172119045514 0.35089172119045514
 66 | ```
 67 | 
 68 | Note that objects returned by `map` and `filter` are iterators, which means that their values aren't stored but generated as needed. After you've called `sum(diffs)`, `diffs` becomes empty. If you want to keep all elements in `diffs`, convert it to a list using `list(diffs)`.
 69 | 
 70 | `filter(fn, iterable)` works the same way as `map`, except that `fn` returns a boolean value and `filter` returns all the elements of the `iterable` for which the `fn` returns True.
 71 | 
 72 | ```python
 73 | bad_preds = filter(lambda x: x > 0.5, errors)
 74 | print(list(bad_preds))
 75 | 
 76 | ==> [0.8100000000000006, 0.6400000000000011]
 77 | ```
 78 | 
 79 | `reduce(fn, iterable, initializer)` is used when we want to iteratively apply an operator to all elements in a list. For example, if we want to calculate the product of all elements in a list:
 80 | 
 81 | ```python
 82 | product = 1
 83 | for num in nums:
 84 |     product *= num
 85 | print(product)
 86 | 
 87 | ==> 12.95564683272412
 88 | ```
 89 | 
 90 | This is equivalent to:
 91 | ```python
 92 | from functools import reduce
 93 | product = reduce(lambda x, y: x * y, nums)
 94 | print(product)
 95 | 
 96 | ==> 12.95564683272412
 97 | ```
 98 | 
 99 | ### Note on the performance of lambda functions
100 | 
101 | Lambda functions are meant for one time use. Each time `lambda x: dosomething(x)` is called, the function has to be created, which hurts the performance if you call `lambda x: dosomething(x)` multiple times (e.g. when you pass it inside `reduce`).
102 | 
103 | When you assign a name to the lambda function as in `fn = lambda x: dosomething(x)`, its performance is slightly slower than the same function defined using `def`, but the difference is negligible. See [here](https://stackoverflow.com/questions/26540885/lambda-is-slower-than-function-call-in-python-why).
104 | 
105 | Even though I find lambdas cool, I personally recommend using named functions when you can for the sake of clarity.
106 | 
107 | ## 2. List manipulation
108 | Python lists are super cool.
109 | 
110 | ### 2.1 Unpacking
111 | We can unpack a list by each element like this:
112 | ```python
113 | elems = [1, 2, 3, 4]
114 | a, b, c, d = elems
115 | print(a, b, c, d)
116 | 
117 | ==> 1 2 3 4
118 | ```
119 | 
120 | We can also unpack a list like this:
121 | 
122 | ```python
123 | a, *new_elems, d = elems
124 | print(a)
125 | print(new_elems)
126 | print(d)
127 | 
128 | ==> 1
129 |     [2, 3]
130 |     4
131 | ```
132 | 
133 | ### 2.2 Slicing
134 | We know that we can reverse a list using `[::-1]`.
135 | 
136 | ```python
137 | elems = list(range(10))
138 | print(elems)
139 | 
140 | ==> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
141 | 
142 | print(elems[::-1])
143 | 
144 | ==> [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
145 | ```
146 | The syntax `[x:y:z]` means \"take every `z`th element of a list from index `x` to index `y`\". When `z` is negative, it indicates going backwards. When `x` isn't specified, it defaults to the first element of the list in the direction you are traversing the list. When `y` isn't specified, it defaults to the last element of the list. So if we want to take every 2th element of a list, we use `[::2]`.
147 | 
148 | ```python
149 | evens = elems[::2]
150 | print(evens)
151 | 
152 | reversed_evens = elems[-2::-2]
153 | print(reversed_evens)
154 | 
155 | ==> [0, 2, 4, 6, 8]
156 |     [8, 6, 4, 2, 0]
157 | ```
158 | 
159 | We can also use slicing to delete all the even numbers in the list.
160 | 
161 | ```python
162 | del elems[::2]
163 | print(elems)
164 | 
165 | ==> [1, 3, 5, 7, 9]
166 | ```
167 | 
168 | ### 2.3 Insertion
169 | We can change the value of an element in a list to another value.
170 | 
171 | ```python
172 | elems = list(range(10))
173 | elems[1] = 10
174 | print(elems)
175 | 
176 | ==> [0, 10, 2, 3, 4, 5, 6, 7, 8, 9]
177 | ```
178 | 
179 | If we want to replace the element at an index with multiple elements, e.g. replace the value `1` with 3 values `20, 30, 40`:
180 | 
181 | ```python
182 | elems = list(range(10))
183 | elems[1:2] = [20, 30, 40]
184 | print(elems)
185 | 
186 | ==> [0, 20, 30, 40, 2, 3, 4, 5, 6, 7, 8, 9]
187 | ```
188 | 
189 | If we want to insert 3 values `0.2, 0.3, 0.5` between element at index 0 and element at index 1:
190 | 
191 | ```python
192 | elems = list(range(10))
193 | elems[1:1] = [0.2, 0.3, 0.5]
194 | print(elems)
195 | 
196 | ==> [0, 0.2, 0.3, 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9]
197 | ```
198 | 
199 | ### 2.4 Flattening
200 | We can flatten a list of lists using `sum`.
201 | 
202 | ```python
203 | list_of_lists = [[1], [2, 3], [4, 5, 6]]
204 | sum(list_of_lists, [])
205 | 
206 | ==> [1, 2, 3, 4, 5, 6]
207 | ```
208 | 
209 | If we have nested lists, we can recursively flatten it. That's another beauty of lambda functions -- we can use it in the same line as its creation.
210 | 
211 | ```python
212 | nested_lists = [[1, 2], [[3, 4], [5, 6], [[7, 8], [9, 10], [[11, [12, 13]]]]]]
213 | flatten = lambda x: [y for l in x for y in flatten(l)] if type(x) is list else [x]
214 | flatten(nested_lists)
215 | 
216 | # This line of code is from
217 | # https://github.com/sahands/python-by-example/blob/master/python-by-example.rst#flattening-lists
218 | ```
219 | 
220 | ### 2.5 List vs generator
221 | To illustrate the difference between a list and a generator, let's look at an example of creating n-grams out of a list of tokens.
222 | 
223 | One way to create n-grams is to use a sliding window.
224 | 
225 | ```python
226 | tokens = ['i', 'want', 'to', 'go', 'to', 'school']
227 | 
228 | def ngrams(tokens, n):
229 |     length = len(tokens)
230 |     grams = []
231 |     for i in range(length - n + 1):
232 |         grams.append(tokens[i:i+n])
233 |     return grams
234 | 
235 | print(ngrams(tokens, 3))
236 | 
237 | ==> [['i', 'want', 'to'],
238 |      ['want', 'to', 'go'],
239 |      ['to', 'go', 'to'],
240 |      ['go', 'to', 'school']]
241 | ```
242 | 
243 | In the above example, we have to store all the n-grams at the same time. If the text has m tokens, then the memory requirement is `O(nm)`, which can be problematic when m is large.
244 | 
245 | Instead of using a list to store all n-grams, we can use a generator that generates the next n-gram when it's asked for. This is known as lazy evaluation. We can make the function `ngrams` returns a generator using the keyword `yield`. Then the memory requirement is `O(m+n)`.
246 | 
247 | ```python
248 | def ngrams(tokens, n):
249 |     length = len(tokens)
250 |     for i in range(length - n + 1):
251 |         yield tokens[i:i+n]
252 | 
253 | ngrams_generator = ngrams(tokens, 3)
254 | print(ngrams_generator)
255 | 
256 | ==> <generator object ngrams at 0x1069b26d0>
257 | 
258 | for ngram in ngrams_generator:
259 |     print(ngram)
260 | 
261 | ==> ['i', 'want', 'to']
262 |     ['want', 'to', 'go']
263 |     ['to', 'go', 'to']
264 |     ['go', 'to', 'school']
265 | ```
266 | 
267 | Another way to generate n-grams is to use slices to create lists: `[0, 1, ..., -n]`, `[1, 2, ..., -n+1]`, ..., `[n-1, n, ..., -1]`, and then `zip` them together.
268 | 
269 | ```python
270 | def ngrams(tokens, n):
271 |     length = len(tokens)
272 |     slices = (tokens[i:length-n+i+1] for i in range(n))
273 |     return zip(*slices)
274 | 
275 | ngrams_generator = ngrams(tokens, 3)
276 | print(ngrams_generator)
277 | 
278 | ==> <zip object at 0x1069a7dc8> # zip objects are generators
279 | 
280 | for ngram in ngrams_generator:
281 |     print(ngram)
282 | 
283 | ==> ('i', 'want', 'to')
284 |     ('want', 'to', 'go')
285 |     ('to', 'go', 'to')
286 |     ('go', 'to', 'school')
287 | ```
288 | 
289 | Note that to create slices, we use `(tokens[...] for i in range(n))` instead of `[tokens[...] for i in range(n)]`. `[]` is the normal list comprehension that returns a list. `()` returns a generator.
290 | 
291 | ## 3. Classes and magic methods
292 | In Python, magic methods are prefixed and suffixed with the double underscore `__`, also known as dunder. The most wellknown magic method is probably `__init__`.
293 | 
294 | ```python
295 | class Node:
296 |     """ A struct to denote the node of a binary tree.
297 |     It contains a value and pointers to left and right children.
298 |     """
299 |     def __init__(self, value, left=None, right=None):
300 |         self.value = value
301 |         self.left = left
302 |         self.right = right
303 | ```
304 | 
305 | When we try to print out a Node object, however, it's not very interpretable.
306 | 
307 | ```python
308 | root = Node(5)
309 | print(root) # <__main__.Node object at 0x1069c4518>
310 | ```
311 | 
312 | Ideally, when user prints out a node, we want to print out the node's value and the values of its children if it has children. To do so, we use the magic method `__repr__`, which must return a printable object, like string.
313 | 
314 | ```python
315 | class Node:
316 |     """ A struct to denote the node of a binary tree.
317 |     It contains a value and pointers to left and right children.
318 |     """
319 |     def __init__(self, value, left=None, right=None):
320 |         self.value = value
321 |         self.left = left
322 |         self.right = right
323 | 
324 |     def __repr__(self):
325 |         strings = [f'value: {self.value}']
326 |         strings.append(f'left: {self.left.value}' if self.left else 'left: None')
327 |         strings.append(f'right: {self.right.value}' if self.right else 'right: None')
328 |         return ', '.join(strings)
329 | 
330 | left = Node(4)
331 | root = Node(5, left)
332 | print(root) # value: 5, left: 4, right: None
333 | ```
334 | 
335 | We'd also like to compare two nodes by comparing their values. To do so, we overload the operator `==` with `__eq__`, `<` with `__lt__`, and `>=` with `__ge__`.
336 | 
337 | ```python
338 | class Node:
339 |     """ A struct to denote the node of a binary tree.
340 |     It contains a value and pointers to left and right children.
341 |     """
342 |     def __init__(self, value, left=None, right=None):
343 |         self.value = value
344 |         self.left = left
345 |         self.right = right
346 | 
347 |     def __eq__(self, other):
348 |         return self.value == other.value
349 | 
350 |     def __lt__(self, other):
351 |         return self.value < other.value
352 | 
353 |     def __ge__(self, other):
354 |         return self.value >= other.value
355 | 
356 | 
357 | left = Node(4)
358 | root = Node(5, left)
359 | print(left == root) # False
360 | print(left < root) # True
361 | print(left >= root) # False
362 | ```
363 | 
364 | For a comprehensive list of supported magic methods [here](https://www.tutorialsteacher.com/python/magic-methods-in-python) or see the official Python documentation [here](https://docs.python.org/3/reference/datamodel.html#special-method-names) (slightly harder to read).
365 | 
366 | Some of the methods that I highly recommend:
367 | 
368 | - `__len__`: to overload the `len()` function.
369 | - `__str__`: to overload the `str()` function.
370 | - `__iter__`: if you want to your objects to be iterators. This also allows you to call `next()` on your object.
371 | 
372 | For classes like Node where we know for sure all the attributes they can support (in the case of Node, they are `value`, `left`, and `right`), we might want to use `__slots__` to denote those values for both performance boost and memory saving. For a comprehensive understanding of pros and cons of `__slots__`, see this [absolutely amazing answer by Aaron Hall on StackOverflow](https://stackoverflow.com/a/28059785/5029595).
373 | 
374 | ```python
375 | class Node:
376 |     """ A struct to denote the node of a binary tree.
377 |     It contains a value and pointers to left and right children.
378 |     """
379 |     __slots__ = ('value', 'left', 'right')
380 |     def __init__(self, value, left=None, right=None):
381 |         self.value = value
382 |         self.left = left
383 |         self.right = right
384 | ```
385 | ## 4. local namespace, object's attributes
386 | The `locals()` function returns a dictionary containing the variables defined in the local namespace.
387 | 
388 | ```python
389 | class Model1:
390 |     def __init__(self, hidden_size=100, num_layers=3, learning_rate=3e-4):
391 |         print(locals())
392 |         self.hidden_size = hidden_size
393 |         self.num_layers = num_layers
394 |         self.learning_rate = learning_rate
395 | 
396 | model1 = Model1()
397 | 
398 | ==> {'learning_rate': 0.0003, 'num_layers': 3, 'hidden_size': 100, 'self': <__main__.Model1 object at 0x1069b1470>}
399 | ```
400 | 
401 | All attributes of an object are stored in its `__dict__`.
402 | ```python
403 | print(model1.__dict__)
404 | 
405 | ==> {'hidden_size': 100, 'num_layers': 3, 'learning_rate': 0.0003}
406 | ```
407 | 
408 | Note that manually assigning each of the arguments to an attribute can be quite tiring when the list of the arguments is large. To avoid this, we can directly assign the list of arguments to the object's `__dict__`.
409 | 
410 | ```python
411 | class Model2:
412 |     def __init__(self, hidden_size=100, num_layers=3, learning_rate=3e-4):
413 |         params = locals()
414 |         del params['self']
415 |         self.__dict__ = params
416 | 
417 | model2 = Model2()
418 | print(model2.__dict__)
419 | 
420 | ==> {'learning_rate': 0.0003, 'num_layers': 3, 'hidden_size': 100}
421 | ```
422 | 
423 | This can be especially convenient when the object is initiated using the catch-all `**kwargs`, though the use of `**kwargs` should be reduced to the minimum.
424 | 
425 | ```python
426 | class Model3:
427 |     def __init__(self, **kwargs):
428 |         self.__dict__ = kwargs
429 | 
430 | model3 = Model3(hidden_size=100, num_layers=3, learning_rate=3e-4)
431 | print(model3.__dict__)
432 | 
433 | ==> {'hidden_size': 100, 'num_layers': 3, 'learning_rate': 0.0003}
434 | ```
435 | 
436 | ## 5. Wild import
437 | Often, you run into this wild import `*` that looks something like this:
438 | 
439 | `file.py`
440 | ```python
441 |     from parts import *
442 | ```
443 | 
444 | This is irresponsible because it will import everything in module, even the imports of that module. For example, if `parts.py` looks like this:
445 | 
446 | `parts.py`
447 | ```python
448 | import numpy
449 | import tensorflow
450 | 
451 | class Encoder:
452 |     ...
453 | 
454 | class Decoder:
455 |     ...
456 | 
457 | class Loss:
458 |     ...
459 | 
460 | def helper(*args, **kwargs):
461 |     ...
462 | 
463 | def utils(*args, **kwargs):
464 |     ...
465 | ```
466 | 
467 | Since `parts.py` doesn't have `__all__` specified, `file.py` will import Encoder, Decoder, Loss, utils, helper together with numpy and tensorflow.
468 | 
469 | If we intend that only Encoder, Decoder, and Loss are ever to be imported and used in another module, we should specify that in `parts.py` using the `__all__` keyword.
470 | 
471 | `parts.py`
472 | ```python
473 |  __all__ = ['Encoder', 'Decoder', 'Loss']
474 | import numpy
475 | import tensorflow
476 | 
477 | class Encoder:
478 |     ...
479 | ```
480 | Now, if some user irresponsibly does a wild import with `parts`, they can only import Encoder, Decoder, Loss. Personally, I also find `__all__` helpful as it gives me an overview of the module.
481 | 
482 | ## 6. Decorator to time your functions
483 | It's often useful to know how long it takes a function to run, e.g. when you need to compare the performance of two algorithms that do the same thing. One naive way is to call `time.time()` at the begin and end of each function and print out the difference.
484 | 
485 | For example: compare two algorithms to calculate the n-th Fibonacci number, one uses memoization and one doesn't.
486 | 
487 | ```python
488 | def fib_helper(n):
489 |     if n < 2:
490 |         return n
491 |     return fib_helper(n - 1) + fib_helper(n - 2)
492 | 
493 | def fib(n):
494 |     """ fib is a wrapper function so that later we can change its behavior
495 |     at the top level without affecting the behavior at every recursion step.
496 |     """
497 |     return fib_helper(n)
498 | 
499 | def fib_m_helper(n, computed):
500 |     if n in computed:
501 |         return computed[n]
502 |     computed[n] = fib_m_helper(n - 1, computed) + fib_m_helper(n - 2, computed)
503 |     return computed[n]
504 | 
505 | def fib_m(n):
506 |     return fib_m_helper(n, {0: 0, 1: 1})
507 | ```
508 | 
509 | Let's make sure that `fib` and `fib_m` are functionally equivalent.
510 | 
511 | ```python
512 | for n in range(20):
513 |     assert fib(n) == fib_m(n)
514 | ```
515 | 
516 | ```python
517 | import time
518 | 
519 | start = time.time()
520 | fib(30)
521 | print(f'Without memoization, it takes {time.time() - start:7f} seconds.')
522 | 
523 | ==> Without memoization, it takes 0.267569 seconds.
524 | 
525 | start = time.time()
526 | fib_m(30)
527 | print(f'With memoization, it takes {time.time() - start:.7f} seconds.')
528 | 
529 | ==> With memoization, it takes 0.0000713 seconds.
530 | ```
531 | 
532 | If you want to time multiple functions, it can be a drag having to write the same code over and over again. It'd be nice to have a way to specify how to change any function in the same way. In this case would be to call time.time() at the beginning and the end of each function, and print out the time difference.
533 | 
534 | This is exactly what decorators do. They allow programmers to change the behavior of a function or class. Here's an example to create a decorator `timeit`.
535 | 
536 | ```python
537 | def timeit(fn): 
538 |     # *args and **kwargs are to support positional and named arguments of fn
539 |     def get_time(*args, **kwargs): 
540 |         start = time.time() 
541 |         output = fn(*args, **kwargs)
542 |         print(f"Time taken in {fn.__name__}: {time.time() - start:.7f}")
543 |         return output  # make sure that the decorator returns the output of fn
544 |     return get_time 
545 | ```
546 | 
547 | Add the decorator `@timeit` to your functions.
548 | 
549 | ```python
550 | @timeit
551 | def fib(n):
552 |     return fib_helper(n)
553 | 
554 | @timeit
555 | def fib_m(n):
556 |     return fib_m_helper(n, {0: 0, 1: 1})
557 | 
558 | fib(30)
559 | fib_m(30)
560 | 
561 | ==> Time taken in fib: 0.2787242
562 | ==> Time taken in fib_m: 0.0000138
563 | ```
564 | 
565 | ## 7. Caching with @functools.lru_cache
566 | Memoization is a form of cache: we cache the previously calculated Fibonacci numbers so that we don't have to calculate them again.
567 | 
568 | Caching is such an important technique that Python provides a built-in decorator to give your function the caching capacity. If you want `fib_helper` to reuse the previously calculated Fibonacci numbers, you can just add the decorator `lru_cache` from `functools`. `lru` stands for "least recently used". For more information on cache, see [here](https://docs.python.org/3/library/functools.html).
569 | 
570 | ```python
571 | import functools
572 | 
573 | @functools.lru_cache()
574 | def fib_helper(n):
575 |     if n < 2:
576 |         return n
577 |     return fib_helper(n - 1) + fib_helper(n - 2)
578 | 
579 | @timeit
580 | def fib(n):
581 |     """ fib is a wrapper function so that later we can change its behavior
582 |     at the top level without affecting the behavior at every recursion step.
583 |     """
584 |     return fib_helper(n)
585 | 
586 | fib(50)
587 | fib_m(50)
588 | 
589 | ==> Time taken in fib: 0.0000412
590 | ==> Time taken in fib_m: 0.0000281
591 | ```
592 | 


--------------------------------------------------------------------------------
/cool-python-tips.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# A gentle guide to the Python features that I didn't know existed or was too afraid to use."
   8 |    ]
   9 |   },
  10 |   {
  11 |    "cell_type": "markdown",
  12 |    "metadata": {},
  13 |    "source": [
  14 |     "# 1. Lambda, map, filter, reduce\n",
  15 |     "\n",
  16 |     "The lambda keyword is used to create inline functions. `square_fn` and `square_ld` below are identical:"
  17 |    ]
  18 |   },
  19 |   {
  20 |    "cell_type": "code",
  21 |    "execution_count": 1,
  22 |    "metadata": {},
  23 |    "outputs": [],
  24 |    "source": [
  25 |     "def square_fn(x):\n",
  26 |     "    return x * x\n",
  27 |     "\n",
  28 |     "square_ld = lambda x: x * x\n",
  29 |     "\n",
  30 |     "for i in range(10):\n",
  31 |     "    assert square_fn(i) == square_ld(i)"
  32 |    ]
  33 |   },
  34 |   {
  35 |    "cell_type": "markdown",
  36 |    "metadata": {},
  37 |    "source": [
  38 |     "Its quick declaration makes `lambda` functions ideal for use in callbacks, and when functions are to be passed as arguments to other functions. They are especially useful when used in conjunction with functions like `map`, `filter`, and `reduce`.\n",
  39 |     "\n",
  40 |     "`map(fn, iterable)` applies the `fn` to all elements of the `iterable` (e.g. list, set, dictionary, tuple, string) and returns a map object."
  41 |    ]
  42 |   },
  43 |   {
  44 |    "cell_type": "code",
  45 |    "execution_count": 2,
  46 |    "metadata": {},
  47 |    "outputs": [
  48 |     {
  49 |      "name": "stdout",
  50 |      "output_type": "stream",
  51 |      "text": [
  52 |       "[0.1111111111111111, 2263.0408163265306, 1.0851472983570953, 1.384083044982699, 0.4444444444444444]\n"
  53 |      ]
  54 |     }
  55 |    ],
  56 |    "source": [
  57 |     "nums = [1/3, 333/7, 2323/2230, 40/34, 2/3]\n",
  58 |     "nums_squared = [num * num for num in nums]\n",
  59 |     "print(nums_squared)"
  60 |    ]
  61 |   },
  62 |   {
  63 |    "cell_type": "markdown",
  64 |    "metadata": {},
  65 |    "source": [
  66 |     "This is the same as calling using `map` with a callback function."
  67 |    ]
  68 |   },
  69 |   {
  70 |    "cell_type": "code",
  71 |    "execution_count": 3,
  72 |    "metadata": {},
  73 |    "outputs": [
  74 |     {
  75 |      "name": "stdout",
  76 |      "output_type": "stream",
  77 |      "text": [
  78 |       "[0.1111111111111111, 2263.0408163265306, 1.0851472983570953, 1.384083044982699, 0.4444444444444444]\n"
  79 |      ]
  80 |     }
  81 |    ],
  82 |    "source": [
  83 |     "nums_squared_1 = map(square_fn, nums)\n",
  84 |     "nums_squared_2 = map(lambda x: x * x, nums)\n",
  85 |     "print(list(nums_squared_1))"
  86 |    ]
  87 |   },
  88 |   {
  89 |    "cell_type": "markdown",
  90 |    "metadata": {},
  91 |    "source": [
  92 |     "You can also use `map` with more than one iterable. For example, if you want to calculate the mean squared error of a simple linear function `f(x) = ax + b` with the true label `labels`, these two methods are equivalent:"
  93 |    ]
  94 |   },
  95 |   {
  96 |    "cell_type": "code",
  97 |    "execution_count": 4,
  98 |    "metadata": {},
  99 |    "outputs": [
 100 |     {
 101 |      "name": "stdout",
 102 |      "output_type": "stream",
 103 |      "text": [
 104 |       "0.35089172119045514 0.35089172119045514\n"
 105 |      ]
 106 |     }
 107 |    ],
 108 |    "source": [
 109 |     "a, b = 3, -0.5\n",
 110 |     "xs = [2, 3, 4, 5]\n",
 111 |     "labels = [6.4, 8.9, 10.9, 15.3]\n",
 112 |     "\n",
 113 |     "# Method 1: using a loop\n",
 114 |     "errors = []\n",
 115 |     "for i, x in enumerate(xs):\n",
 116 |     "    errors.append((a * x + b - labels[i]) ** 2)\n",
 117 |     "result1 = sum(errors) ** 0.5 / len(xs)\n",
 118 |     "\n",
 119 |     "# Method 2: using map\n",
 120 |     "diffs = map(lambda x, y: (a * x + b - y) ** 2, xs, labels)\n",
 121 |     "result2 = sum(diffs) ** 0.5 / len(xs)\n",
 122 |     "\n",
 123 |     "print(result1, result2)"
 124 |    ]
 125 |   },
 126 |   {
 127 |    "cell_type": "markdown",
 128 |    "metadata": {},
 129 |    "source": [
 130 |     "Note that objects returned by `map` and `filter` are iterators, which means that their values aren't stored but generated as needed. After you've called `sum(diffs)`, `diffs` becomes empty. If you want to keep all elements in `diffs`, convert it to a list using `list(diffs)`."
 131 |    ]
 132 |   },
 133 |   {
 134 |    "cell_type": "markdown",
 135 |    "metadata": {},
 136 |    "source": [
 137 |     "`filter(fn, iterable)` works the same way as `map`, except that `fn` returns a boolean value and `filter` returns all the elements of the `iterable` for which the `fn` returns True."
 138 |    ]
 139 |   },
 140 |   {
 141 |    "cell_type": "code",
 142 |    "execution_count": 5,
 143 |    "metadata": {},
 144 |    "outputs": [
 145 |     {
 146 |      "name": "stdout",
 147 |      "output_type": "stream",
 148 |      "text": [
 149 |       "[0.8100000000000006, 0.6400000000000011]\n"
 150 |      ]
 151 |     }
 152 |    ],
 153 |    "source": [
 154 |     "bad_preds = filter(lambda x: x > 0.5, errors)\n",
 155 |     "print(list(bad_preds))"
 156 |    ]
 157 |   },
 158 |   {
 159 |    "cell_type": "markdown",
 160 |    "metadata": {},
 161 |    "source": [
 162 |     "`reduce(fn, iterable, initializer)` is used when we want to iteratively apply an operator to all elements in a list. For example, if we want to calculate the product of all elements in a list:"
 163 |    ]
 164 |   },
 165 |   {
 166 |    "cell_type": "code",
 167 |    "execution_count": 6,
 168 |    "metadata": {},
 169 |    "outputs": [
 170 |     {
 171 |      "name": "stdout",
 172 |      "output_type": "stream",
 173 |      "text": [
 174 |       "12.95564683272412\n"
 175 |      ]
 176 |     }
 177 |    ],
 178 |    "source": [
 179 |     "product = 1\n",
 180 |     "for num in nums:\n",
 181 |     "    product *= num\n",
 182 |     "print(product)"
 183 |    ]
 184 |   },
 185 |   {
 186 |    "cell_type": "markdown",
 187 |    "metadata": {},
 188 |    "source": [
 189 |     "This is equivalent to:"
 190 |    ]
 191 |   },
 192 |   {
 193 |    "cell_type": "code",
 194 |    "execution_count": 7,
 195 |    "metadata": {},
 196 |    "outputs": [
 197 |     {
 198 |      "name": "stdout",
 199 |      "output_type": "stream",
 200 |      "text": [
 201 |       "12.95564683272412\n"
 202 |      ]
 203 |     }
 204 |    ],
 205 |    "source": [
 206 |     "from functools import reduce\n",
 207 |     "product = reduce(lambda x, y: x * y, nums)\n",
 208 |     "print(product)"
 209 |    ]
 210 |   },
 211 |   {
 212 |    "cell_type": "markdown",
 213 |    "metadata": {},
 214 |    "source": [
 215 |     "### Note on the performance of lambda functions\n",
 216 |     "\n",
 217 |     "Lambda functions are meant for one time use. Each time `lambda x: dosomething(x)` is called, the function has to be created, which hurts the performance if you call `lambda x: dosomething(x)` multiple times (e.g. when you pass it inside `reduce`).\n",
 218 |     "\n",
 219 |     "When you assign a name to the lambda function as in `fn = lambda x: dosomething(x)`, its performance is slightly slower than the same function defined using `def`, but the difference is negligible. See [here](https://stackoverflow.com/questions/26540885/lambda-is-slower-than-function-call-in-python-why).\n",
 220 |     "\n",
 221 |     "Even though I find lambdas cool, I personally recommend using named functions when you can for the sake of clarity."
 222 |    ]
 223 |   },
 224 |   {
 225 |    "cell_type": "markdown",
 226 |    "metadata": {},
 227 |    "source": [
 228 |     "# 2. List manipulation\n",
 229 |     "Python lists are super cool.\n",
 230 |     "\n",
 231 |     "## 2.1 Unpacking\n",
 232 |     "We can unpack a list by each element like this:"
 233 |    ]
 234 |   },
 235 |   {
 236 |    "cell_type": "code",
 237 |    "execution_count": 8,
 238 |    "metadata": {},
 239 |    "outputs": [
 240 |     {
 241 |      "name": "stdout",
 242 |      "output_type": "stream",
 243 |      "text": [
 244 |       "1 2 3 4\n"
 245 |      ]
 246 |     }
 247 |    ],
 248 |    "source": [
 249 |     "elems = [1, 2, 3, 4]\n",
 250 |     "a, b, c, d = elems\n",
 251 |     "print(a, b, c, d)"
 252 |    ]
 253 |   },
 254 |   {
 255 |    "cell_type": "markdown",
 256 |    "metadata": {},
 257 |    "source": [
 258 |     "We can also unpack a list like this:"
 259 |    ]
 260 |   },
 261 |   {
 262 |    "cell_type": "code",
 263 |    "execution_count": 9,
 264 |    "metadata": {},
 265 |    "outputs": [
 266 |     {
 267 |      "name": "stdout",
 268 |      "output_type": "stream",
 269 |      "text": [
 270 |       "1\n",
 271 |       "[2, 3]\n",
 272 |       "4\n"
 273 |      ]
 274 |     }
 275 |    ],
 276 |    "source": [
 277 |     "a, *new_elems, d = elems\n",
 278 |     "print(a)\n",
 279 |     "print(new_elems)\n",
 280 |     "print(d)"
 281 |    ]
 282 |   },
 283 |   {
 284 |    "cell_type": "markdown",
 285 |    "metadata": {},
 286 |    "source": [
 287 |     "## 2.2 Slicing\n",
 288 |     "We know that we can reverse a list using `[::-1]`."
 289 |    ]
 290 |   },
 291 |   {
 292 |    "cell_type": "code",
 293 |    "execution_count": 10,
 294 |    "metadata": {},
 295 |    "outputs": [
 296 |     {
 297 |      "name": "stdout",
 298 |      "output_type": "stream",
 299 |      "text": [
 300 |       "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n",
 301 |       "[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]\n"
 302 |      ]
 303 |     }
 304 |    ],
 305 |    "source": [
 306 |     "elems = list(range(10))\n",
 307 |     "print(elems)\n",
 308 |     "print(elems[::-1])"
 309 |    ]
 310 |   },
 311 |   {
 312 |    "cell_type": "markdown",
 313 |    "metadata": {},
 314 |    "source": [
 315 |     "The syntax `[x:y:z]` means \"take every `z`th element of a list from index `x` to index `y`\". When `z` is negative, it indicates going backwards. When `x` isn't specified, it defaults to the first element of the list in the direction you are traversing the list. When `y` isn't specified, it defaults to the last element of the list. So if we want to take every 2th element of a list, we use `[::2]`."
 316 |    ]
 317 |   },
 318 |   {
 319 |    "cell_type": "code",
 320 |    "execution_count": 11,
 321 |    "metadata": {},
 322 |    "outputs": [
 323 |     {
 324 |      "name": "stdout",
 325 |      "output_type": "stream",
 326 |      "text": [
 327 |       "[0, 2, 4, 6, 8]\n",
 328 |       "[8, 6, 4, 2, 0]\n"
 329 |      ]
 330 |     }
 331 |    ],
 332 |    "source": [
 333 |     "evens = elems[::2]\n",
 334 |     "print(evens)\n",
 335 |     "\n",
 336 |     "reversed_evens = elems[-2::-2]\n",
 337 |     "print(reversed_evens)"
 338 |    ]
 339 |   },
 340 |   {
 341 |    "cell_type": "markdown",
 342 |    "metadata": {},
 343 |    "source": [
 344 |     "We can also use slicing to delete all the even numbers in the list."
 345 |    ]
 346 |   },
 347 |   {
 348 |    "cell_type": "code",
 349 |    "execution_count": 12,
 350 |    "metadata": {},
 351 |    "outputs": [
 352 |     {
 353 |      "name": "stdout",
 354 |      "output_type": "stream",
 355 |      "text": [
 356 |       "[1, 3, 5, 7, 9]\n"
 357 |      ]
 358 |     }
 359 |    ],
 360 |    "source": [
 361 |     "del elems[::2]\n",
 362 |     "print(elems)"
 363 |    ]
 364 |   },
 365 |   {
 366 |    "cell_type": "markdown",
 367 |    "metadata": {},
 368 |    "source": [
 369 |     "## 2.3 Insertion\n",
 370 |     "We can change the value of an element in a list to another value."
 371 |    ]
 372 |   },
 373 |   {
 374 |    "cell_type": "code",
 375 |    "execution_count": 13,
 376 |    "metadata": {},
 377 |    "outputs": [
 378 |     {
 379 |      "name": "stdout",
 380 |      "output_type": "stream",
 381 |      "text": [
 382 |       "[0, 10, 2, 3, 4, 5, 6, 7, 8, 9]\n"
 383 |      ]
 384 |     }
 385 |    ],
 386 |    "source": [
 387 |     "elems = list(range(10))\n",
 388 |     "elems[1] = 10\n",
 389 |     "print(elems)"
 390 |    ]
 391 |   },
 392 |   {
 393 |    "cell_type": "markdown",
 394 |    "metadata": {},
 395 |    "source": [
 396 |     "If we want to replace the element at an index with multiple elements, e.g. replace the value `1` with 3 values `20, 30, 40`:"
 397 |    ]
 398 |   },
 399 |   {
 400 |    "cell_type": "code",
 401 |    "execution_count": 14,
 402 |    "metadata": {},
 403 |    "outputs": [
 404 |     {
 405 |      "name": "stdout",
 406 |      "output_type": "stream",
 407 |      "text": [
 408 |       "[0, 20, 30, 40, 2, 3, 4, 5, 6, 7, 8, 9]\n"
 409 |      ]
 410 |     }
 411 |    ],
 412 |    "source": [
 413 |     "elems = list(range(10))\n",
 414 |     "elems[1:2] = [20, 30, 40]\n",
 415 |     "print(elems)"
 416 |    ]
 417 |   },
 418 |   {
 419 |    "cell_type": "markdown",
 420 |    "metadata": {},
 421 |    "source": [
 422 |     "If we want to insert 3 values `0.2, 0.3, 0.5` between element at index 0 and element at index 1:"
 423 |    ]
 424 |   },
 425 |   {
 426 |    "cell_type": "code",
 427 |    "execution_count": 15,
 428 |    "metadata": {},
 429 |    "outputs": [
 430 |     {
 431 |      "name": "stdout",
 432 |      "output_type": "stream",
 433 |      "text": [
 434 |       "[0, 0.2, 0.3, 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n"
 435 |      ]
 436 |     }
 437 |    ],
 438 |    "source": [
 439 |     "elems = list(range(10))\n",
 440 |     "elems[1:1] = [0.2, 0.3, 0.5]\n",
 441 |     "print(elems)"
 442 |    ]
 443 |   },
 444 |   {
 445 |    "cell_type": "markdown",
 446 |    "metadata": {},
 447 |    "source": [
 448 |     "## 2.4 Flattening\n",
 449 |     "We can flatten a list of lists using `sum`."
 450 |    ]
 451 |   },
 452 |   {
 453 |    "cell_type": "code",
 454 |    "execution_count": 16,
 455 |    "metadata": {},
 456 |    "outputs": [
 457 |     {
 458 |      "data": {
 459 |       "text/plain": [
 460 |        "[1, 2, 3, 4, 5, 6]"
 461 |       ]
 462 |      },
 463 |      "execution_count": 16,
 464 |      "metadata": {},
 465 |      "output_type": "execute_result"
 466 |     }
 467 |    ],
 468 |    "source": [
 469 |     "list_of_lists = [[1], [2, 3], [4, 5, 6]]\n",
 470 |     "sum(list_of_lists, [])"
 471 |    ]
 472 |   },
 473 |   {
 474 |    "cell_type": "markdown",
 475 |    "metadata": {},
 476 |    "source": [
 477 |     "If we have nested lists, we can recursively flatten it. That's another beauty of lambda functions -- we can use it in the same line as its creation."
 478 |    ]
 479 |   },
 480 |   {
 481 |    "cell_type": "code",
 482 |    "execution_count": 17,
 483 |    "metadata": {},
 484 |    "outputs": [
 485 |     {
 486 |      "data": {
 487 |       "text/plain": [
 488 |        "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]"
 489 |       ]
 490 |      },
 491 |      "execution_count": 17,
 492 |      "metadata": {},
 493 |      "output_type": "execute_result"
 494 |     }
 495 |    ],
 496 |    "source": [
 497 |     "nested_lists = [[1, 2], [[3, 4], [5, 6], [[7, 8], [9, 10], [[11, [12, 13]]]]]]\n",
 498 |     "flatten = lambda x: [y for l in x for y in flatten(l)] if type(x) is list else [x]\n",
 499 |     "flatten(nested_lists)\n",
 500 |     "\n",
 501 |     "# This amazing line of code is from\n",
 502 |     "# https://github.com/sahands/python-by-example/blob/master/python-by-example.rst#flattening-lists"
 503 |    ]
 504 |   },
 505 |   {
 506 |    "cell_type": "markdown",
 507 |    "metadata": {},
 508 |    "source": [
 509 |     "## 2.5 List vs generator\n",
 510 |     "To illustrate the difference between a list and a generator, let's look at an example of creating n-grams out of a list of tokens.\n",
 511 |     "\n",
 512 |     "One way to create n-grams is to use a sliding window."
 513 |    ]
 514 |   },
 515 |   {
 516 |    "cell_type": "code",
 517 |    "execution_count": 18,
 518 |    "metadata": {},
 519 |    "outputs": [
 520 |     {
 521 |      "data": {
 522 |       "text/plain": [
 523 |        "[['i', 'want', 'to'],\n",
 524 |        " ['want', 'to', 'go'],\n",
 525 |        " ['to', 'go', 'to'],\n",
 526 |        " ['go', 'to', 'school']]"
 527 |       ]
 528 |      },
 529 |      "execution_count": 18,
 530 |      "metadata": {},
 531 |      "output_type": "execute_result"
 532 |     }
 533 |    ],
 534 |    "source": [
 535 |     "tokens = ['i', 'want', 'to', 'go', 'to', 'school']\n",
 536 |     "\n",
 537 |     "def ngrams(tokens, n):\n",
 538 |     "    length = len(tokens)\n",
 539 |     "    grams = []\n",
 540 |     "    for i in range(length - n + 1):\n",
 541 |     "        grams.append(tokens[i:i+n])\n",
 542 |     "    return grams\n",
 543 |     "\n",
 544 |     "ngrams(tokens, 3)"
 545 |    ]
 546 |   },
 547 |   {
 548 |    "cell_type": "markdown",
 549 |    "metadata": {},
 550 |    "source": [
 551 |     "In the above example, we have to store all the n-grams at the same time. If the text has m tokens, then the memory requirement is `O(nm)`, which can be problematic when m is large.\n",
 552 |     "\n",
 553 |     "Instead of using a list to store all n-grams, we can use a generator that generates the next n-gram when it's asked for. This is known as lazy evaluation. We can make the function `ngrams` returns a generator using the keyword `yield`. Then the memory requirement is `O(m+n)`."
 554 |    ]
 555 |   },
 556 |   {
 557 |    "cell_type": "code",
 558 |    "execution_count": 19,
 559 |    "metadata": {},
 560 |    "outputs": [
 561 |     {
 562 |      "name": "stdout",
 563 |      "output_type": "stream",
 564 |      "text": [
 565 |       "<generator object ngrams at 0x10f2011d0>\n",
 566 |       "['i', 'want', 'to']\n",
 567 |       "['want', 'to', 'go']\n",
 568 |       "['to', 'go', 'to']\n",
 569 |       "['go', 'to', 'school']\n"
 570 |      ]
 571 |     }
 572 |    ],
 573 |    "source": [
 574 |     "def ngrams(tokens, n):\n",
 575 |     "    length = len(tokens)\n",
 576 |     "    for i in range(length - n + 1):\n",
 577 |     "        yield tokens[i:i+n]\n",
 578 |     "\n",
 579 |     "ngrams_generator = ngrams(tokens, 3)\n",
 580 |     "print(ngrams_generator)\n",
 581 |     "for ngram in ngrams_generator:\n",
 582 |     "    print(ngram)"
 583 |    ]
 584 |   },
 585 |   {
 586 |    "cell_type": "markdown",
 587 |    "metadata": {},
 588 |    "source": [
 589 |     "Another way to generate n-grams is to use slices to create lists: `[0, 1, ..., -n]`, `[1, 2, ..., -n+1]`, ..., `[n-1, n, ..., -1]`, and then `zip` them together."
 590 |    ]
 591 |   },
 592 |   {
 593 |    "cell_type": "code",
 594 |    "execution_count": 20,
 595 |    "metadata": {},
 596 |    "outputs": [
 597 |     {
 598 |      "name": "stdout",
 599 |      "output_type": "stream",
 600 |      "text": [
 601 |       "<zip object at 0x10f216aa0>\n",
 602 |       "('i', 'want', 'to')\n",
 603 |       "('want', 'to', 'go')\n",
 604 |       "('to', 'go', 'to')\n",
 605 |       "('go', 'to', 'school')\n"
 606 |      ]
 607 |     }
 608 |    ],
 609 |    "source": [
 610 |     "def ngrams(tokens, n):\n",
 611 |     "    length = len(tokens)\n",
 612 |     "    slices = (tokens[i:length-n+i+1] for i in range(n))\n",
 613 |     "    return zip(*slices)\n",
 614 |     "\n",
 615 |     "ngrams_generator = ngrams(tokens, 3)\n",
 616 |     "print(ngrams_generator) # zip objects are generators\n",
 617 |     "for ngram in ngrams_generator:\n",
 618 |     "    print(ngram)"
 619 |    ]
 620 |   },
 621 |   {
 622 |    "cell_type": "markdown",
 623 |    "metadata": {},
 624 |    "source": [
 625 |     "Note that to create slices, we use `(tokens[...] for i in range(n))` instead of `[tokens[...] for i in range(n)]`. `[]` is the normal list comprehension that returns a list. `()` returns a generator."
 626 |    ]
 627 |   },
 628 |   {
 629 |    "cell_type": "markdown",
 630 |    "metadata": {},
 631 |    "source": [
 632 |     "# 3. Classes and magic methods\n",
 633 |     "In Python, magic methods are prefixed and suffixed with the double underscore `__`, also known as dunder. The most wellknown magic method is probably `__init__`."
 634 |    ]
 635 |   },
 636 |   {
 637 |    "cell_type": "code",
 638 |    "execution_count": 21,
 639 |    "metadata": {},
 640 |    "outputs": [],
 641 |    "source": [
 642 |     "class Node:\n",
 643 |     "    \"\"\" A struct to denote the node of a binary tree.\n",
 644 |     "    It contains a value and pointers to left and right children.\n",
 645 |     "    \"\"\"\n",
 646 |     "    def __init__(self, value, left=None, right=None):\n",
 647 |     "        self.value = value\n",
 648 |     "        self.left = left\n",
 649 |     "        self.right = right"
 650 |    ]
 651 |   },
 652 |   {
 653 |    "cell_type": "markdown",
 654 |    "metadata": {},
 655 |    "source": [
 656 |     "When we try to print out a Node object, however, it's not very interpretable."
 657 |    ]
 658 |   },
 659 |   {
 660 |    "cell_type": "code",
 661 |    "execution_count": 22,
 662 |    "metadata": {},
 663 |    "outputs": [
 664 |     {
 665 |      "name": "stdout",
 666 |      "output_type": "stream",
 667 |      "text": [
 668 |       "<__main__.Node object at 0x10e1a8890>\n"
 669 |      ]
 670 |     }
 671 |    ],
 672 |    "source": [
 673 |     "root = Node(5)\n",
 674 |     "print(root)"
 675 |    ]
 676 |   },
 677 |   {
 678 |    "cell_type": "markdown",
 679 |    "metadata": {},
 680 |    "source": [
 681 |     "Ideally, when user prints out a node, we want to print out the node's value and the values of its children if it has children. To do so, we use the magic method `__repr__`, which must return a printable object, like string."
 682 |    ]
 683 |   },
 684 |   {
 685 |    "cell_type": "code",
 686 |    "execution_count": 23,
 687 |    "metadata": {},
 688 |    "outputs": [
 689 |     {
 690 |      "name": "stdout",
 691 |      "output_type": "stream",
 692 |      "text": [
 693 |       "value: 5, left: 4, right: None\n"
 694 |      ]
 695 |     }
 696 |    ],
 697 |    "source": [
 698 |     "class Node:\n",
 699 |     "    \"\"\" A struct to denote the node of a binary tree.\n",
 700 |     "    It contains a value and pointers to left and right children.\n",
 701 |     "    \"\"\"\n",
 702 |     "    def __init__(self, value, left=None, right=None):\n",
 703 |     "        self.value = value\n",
 704 |     "        self.left = left\n",
 705 |     "        self.right = right\n",
 706 |     "        \n",
 707 |     "    def __repr__(self):    \n",
 708 |     "        strings = [f'value: {self.value}']\n",
 709 |     "        strings.append(f'left: {self.left.value}' if self.left else 'left: None')\n",
 710 |     "        strings.append(f'right: {self.right.value}' if self.right else 'right: None')\n",
 711 |     "        return ', '.join(strings)\n",
 712 |     "\n",
 713 |     "left = Node(4)\n",
 714 |     "root = Node(5, left)\n",
 715 |     "print(root)\n"
 716 |    ]
 717 |   },
 718 |   {
 719 |    "cell_type": "markdown",
 720 |    "metadata": {},
 721 |    "source": [
 722 |     "We'd also like to compare two nodes by comparing their values. To do so, we overload the operator `==` with `__eq__`, `<` with `__lt__`, and `>=` with `__ge__`."
 723 |    ]
 724 |   },
 725 |   {
 726 |    "cell_type": "code",
 727 |    "execution_count": 24,
 728 |    "metadata": {},
 729 |    "outputs": [
 730 |     {
 731 |      "name": "stdout",
 732 |      "output_type": "stream",
 733 |      "text": [
 734 |       "False\n",
 735 |       "True\n",
 736 |       "False\n"
 737 |      ]
 738 |     }
 739 |    ],
 740 |    "source": [
 741 |     "class Node:\n",
 742 |     "    \"\"\" A struct to denote the node of a binary tree.\n",
 743 |     "    It contains a value and pointers to left and right children.\n",
 744 |     "    \"\"\"\n",
 745 |     "    def __init__(self, value, left=None, right=None):\n",
 746 |     "        self.value = value\n",
 747 |     "        self.left = left\n",
 748 |     "        self.right = right\n",
 749 |     "    \n",
 750 |     "    def __eq__(self, other):\n",
 751 |     "        return self.value == other.value\n",
 752 |     "    \n",
 753 |     "    def __lt__(self, other):\n",
 754 |     "        return self.value < other.value\n",
 755 |     "    \n",
 756 |     "    def __ge__(self, other):\n",
 757 |     "        return self.value >= other.value\n",
 758 |     "\n",
 759 |     "\n",
 760 |     "left = Node(4)\n",
 761 |     "root = Node(5, left)\n",
 762 |     "print(left == root)\n",
 763 |     "print(left < root)\n",
 764 |     "print(left >= root)"
 765 |    ]
 766 |   },
 767 |   {
 768 |    "cell_type": "markdown",
 769 |    "metadata": {},
 770 |    "source": [
 771 |     "For a comprehensive list of supported magic methods [here](https://www.tutorialsteacher.com/python/magic-methods-in-python) or see the official Python documentation [here](https://docs.python.org/3/reference/datamodel.html#special-method-names) (slightly harder to read).\n",
 772 |     "\n",
 773 |     "Some of the methods that I highly recommend:\n",
 774 |     "\n",
 775 |     "- `__len__`: to overload the `len()` function.\n",
 776 |     "- `__str__`: to overload the `str()` function.\n",
 777 |     "- `__iter__`: if you want to your objects to be iterators. This also allows you to call `next()` on your object."
 778 |    ]
 779 |   },
 780 |   {
 781 |    "cell_type": "markdown",
 782 |    "metadata": {},
 783 |    "source": [
 784 |     "For classes like Node where we know for sure all the attributes they can support (in the case of Node, they are `value`, `left`, and `right`), we might want to use `__slots__` to denote those values for both performance boost and memory saving. For a comprehensive understanding of pros and cons of `__slots__`, see this [absolutely amazing answer by Aaron Hall on StackOverflow](https://stackoverflow.com/a/28059785/5029595)."
 785 |    ]
 786 |   },
 787 |   {
 788 |    "cell_type": "code",
 789 |    "execution_count": 25,
 790 |    "metadata": {},
 791 |    "outputs": [],
 792 |    "source": [
 793 |     "class Node:\n",
 794 |     "    \"\"\" A struct to denote the node of a binary tree.\n",
 795 |     "    It contains a value and pointers to left and right children.\n",
 796 |     "    \"\"\"\n",
 797 |     "    __slots__ = ('value', 'left', 'right')\n",
 798 |     "    def __init__(self, value, left=None, right=None):\n",
 799 |     "        self.value = value\n",
 800 |     "        self.left = left\n",
 801 |     "        self.right = right"
 802 |    ]
 803 |   },
 804 |   {
 805 |    "cell_type": "markdown",
 806 |    "metadata": {},
 807 |    "source": [
 808 |     "# 4. local namespace, object's attributes\n",
 809 |     "\n",
 810 |     "The `locals()` function returns a dictionary containing the variables defined in the local namespace."
 811 |    ]
 812 |   },
 813 |   {
 814 |    "cell_type": "code",
 815 |    "execution_count": 26,
 816 |    "metadata": {},
 817 |    "outputs": [
 818 |     {
 819 |      "name": "stdout",
 820 |      "output_type": "stream",
 821 |      "text": [
 822 |       "{'self': <__main__.Model1 object at 0x10f210590>, 'hidden_size': 100, 'num_layers': 3, 'learning_rate': 0.0003}\n"
 823 |      ]
 824 |     }
 825 |    ],
 826 |    "source": [
 827 |     "class Model1:\n",
 828 |     "    def __init__(self, hidden_size=100, num_layers=3, learning_rate=3e-4):\n",
 829 |     "        print(locals())\n",
 830 |     "        self.hidden_size = hidden_size\n",
 831 |     "        self.num_layers = num_layers\n",
 832 |     "        self.learning_rate = learning_rate\n",
 833 |     "\n",
 834 |     "model1 = Model1()"
 835 |    ]
 836 |   },
 837 |   {
 838 |    "cell_type": "markdown",
 839 |    "metadata": {},
 840 |    "source": [
 841 |     "All attributes of an object are stored in its `__dict__`."
 842 |    ]
 843 |   },
 844 |   {
 845 |    "cell_type": "code",
 846 |    "execution_count": 27,
 847 |    "metadata": {},
 848 |    "outputs": [
 849 |     {
 850 |      "data": {
 851 |       "text/plain": [
 852 |        "{'hidden_size': 100, 'num_layers': 3, 'learning_rate': 0.0003}"
 853 |       ]
 854 |      },
 855 |      "execution_count": 27,
 856 |      "metadata": {},
 857 |      "output_type": "execute_result"
 858 |     }
 859 |    ],
 860 |    "source": [
 861 |     "model1.__dict__"
 862 |    ]
 863 |   },
 864 |   {
 865 |    "cell_type": "markdown",
 866 |    "metadata": {},
 867 |    "source": [
 868 |     "Note that manually assigning each of the arguments to an attribute can be quite tiring when the list of the arguments is large. To avoid this, we can directly assign the list of arguments to the object's `__dict__`."
 869 |    ]
 870 |   },
 871 |   {
 872 |    "cell_type": "code",
 873 |    "execution_count": 28,
 874 |    "metadata": {},
 875 |    "outputs": [
 876 |     {
 877 |      "data": {
 878 |       "text/plain": [
 879 |        "{'hidden_size': 100, 'num_layers': 3, 'learning_rate': 0.0003}"
 880 |       ]
 881 |      },
 882 |      "execution_count": 28,
 883 |      "metadata": {},
 884 |      "output_type": "execute_result"
 885 |     }
 886 |    ],
 887 |    "source": [
 888 |     "class Model2:\n",
 889 |     "    def __init__(self, hidden_size=100, num_layers=3, learning_rate=3e-4):\n",
 890 |     "        params = locals()\n",
 891 |     "        del params['self']\n",
 892 |     "        self.__dict__ = params\n",
 893 |     "\n",
 894 |     "model2 = Model2()\n",
 895 |     "model2.__dict__"
 896 |    ]
 897 |   },
 898 |   {
 899 |    "cell_type": "markdown",
 900 |    "metadata": {},
 901 |    "source": [
 902 |     "This can be especially convenient when the object is initiated using the catch-all `**kwargs`."
 903 |    ]
 904 |   },
 905 |   {
 906 |    "cell_type": "code",
 907 |    "execution_count": 29,
 908 |    "metadata": {},
 909 |    "outputs": [
 910 |     {
 911 |      "data": {
 912 |       "text/plain": [
 913 |        "{'hidden_size': 100, 'num_layers': 3, 'learning_rate': 0.0003}"
 914 |       ]
 915 |      },
 916 |      "execution_count": 29,
 917 |      "metadata": {},
 918 |      "output_type": "execute_result"
 919 |     }
 920 |    ],
 921 |    "source": [
 922 |     "class Model3:\n",
 923 |     "    def __init__(self, **kwargs):\n",
 924 |     "        self.__dict__ = kwargs\n",
 925 |     "\n",
 926 |     "model3 = Model3(hidden_size=100, num_layers=3, learning_rate=3e-4)\n",
 927 |     "model3.__dict__"
 928 |    ]
 929 |   },
 930 |   {
 931 |    "cell_type": "markdown",
 932 |    "metadata": {},
 933 |    "source": [
 934 |     "# 5. Wildcard import\n",
 935 |     "Often, you run into this wildcard import `*` that looks something like this:\n",
 936 |     "\n",
 937 |     "`file.py`\n",
 938 |     "    \n",
 939 |     "    from parts import *\n",
 940 |     "\n",
 941 |     "This is irresponsible because it will import everything in module, even the imports of that module. For example, if `parts.py` looks like this:\n",
 942 |     "\n",
 943 |     "`parts.py`\n",
 944 |     "\n",
 945 |     "    import numpy\n",
 946 |     "    import tensorflow\n",
 947 |     "    \n",
 948 |     "    class Encoder:\n",
 949 |     "        ...\n",
 950 |     "    \n",
 951 |     "    class Decoder:\n",
 952 |     "        ...\n",
 953 |     "        \n",
 954 |     "    class Loss:\n",
 955 |     "        ...\n",
 956 |     "    \n",
 957 |     "    def helper(*args, **kwargs):\n",
 958 |     "        ...\n",
 959 |     "    \n",
 960 |     "    def utils(*args, **kwargs):\n",
 961 |     "        ...\n",
 962 |     "\n",
 963 |     "Since `parts.py` doesn't have `__all__` specified, `file.py` will import Encoder, Decoder, Loss, utils, helper together with numpy and tensorflow.\n",
 964 |     "\n",
 965 |     "If we intend that only Encoder, Decoder, and Loss are ever to be imported and used in another module, we should specify that in `parts.py` using the `__all__` keyword.\n",
 966 |     "\n",
 967 |     "`parts.py`\n",
 968 |     "    \n",
 969 |     "    __all__ = ['Encoder', 'Decoder', 'Loss']\n",
 970 |     "    import numpy\n",
 971 |     "    import tensorflow\n",
 972 |     "    \n",
 973 |     "    class Encoder:\n",
 974 |     "        ...\n",
 975 |     "\n",
 976 |     "Now, if some user irresponsibly does a wildcard import with `parts`, they can only import Encoder, Decoder, Loss. Personally, I also find `__all__` helpful as it gives me an overview of the module."
 977 |    ]
 978 |   },
 979 |   {
 980 |    "cell_type": "markdown",
 981 |    "metadata": {},
 982 |    "source": [
 983 |     "# 6. Decorator to time your functions\n",
 984 |     "\n",
 985 |     "It's often useful to know how long it takes a function to run, e.g. when you need to compare the performance of two algorithms that do the same thing. One naive way is to call `time.time()` at the begin and end of each function and print out the difference.\n",
 986 |     "\n",
 987 |     "For example: compare two algorithms to calculate the n-th Fibonacci number, one uses memoization and one doesn't."
 988 |    ]
 989 |   },
 990 |   {
 991 |    "cell_type": "code",
 992 |    "execution_count": 30,
 993 |    "metadata": {},
 994 |    "outputs": [],
 995 |    "source": [
 996 |     "def fib_helper(n):\n",
 997 |     "    if n < 2:\n",
 998 |     "        return n\n",
 999 |     "    return fib_helper(n - 1) + fib_helper(n - 2)\n",
1000 |     "\n",
1001 |     "def fib(n):\n",
1002 |     "    \"\"\" fib is a wrapper function so that later we can change its behavior\n",
1003 |     "    at the top level without affecting the behavior at every recursion step.\n",
1004 |     "    \"\"\"\n",
1005 |     "    return fib_helper(n)\n",
1006 |     "\n",
1007 |     "def fib_m_helper(n, computed):\n",
1008 |     "    if n in computed:\n",
1009 |     "        return computed[n]\n",
1010 |     "    computed[n] = fib_m_helper(n - 1, computed) + fib_m_helper(n - 2, computed)\n",
1011 |     "    return computed[n]\n",
1012 |     "\n",
1013 |     "def fib_m(n):\n",
1014 |     "    return fib_m_helper(n, {0: 0, 1: 1})"
1015 |    ]
1016 |   },
1017 |   {
1018 |    "cell_type": "markdown",
1019 |    "metadata": {},
1020 |    "source": [
1021 |     "Let's make sure that `fib` and `fib_m` are functionally equivalent."
1022 |    ]
1023 |   },
1024 |   {
1025 |    "cell_type": "code",
1026 |    "execution_count": 31,
1027 |    "metadata": {},
1028 |    "outputs": [],
1029 |    "source": [
1030 |     "for n in range(20):\n",
1031 |     "    assert fib(n) == fib_m(n)"
1032 |    ]
1033 |   },
1034 |   {
1035 |    "cell_type": "code",
1036 |    "execution_count": 32,
1037 |    "metadata": {},
1038 |    "outputs": [
1039 |     {
1040 |      "name": "stdout",
1041 |      "output_type": "stream",
1042 |      "text": [
1043 |       "Without memoization, it takes 0.267569 seconds.\n",
1044 |       "With memoization, it takes 0.0000713 seconds.\n"
1045 |      ]
1046 |     }
1047 |    ],
1048 |    "source": [
1049 |     "import time\n",
1050 |     "\n",
1051 |     "start = time.time()\n",
1052 |     "fib(30)\n",
1053 |     "print(f'Without memoization, it takes {time.time() - start:7f} seconds.')\n",
1054 |     "\n",
1055 |     "start = time.time()\n",
1056 |     "fib_m(30)\n",
1057 |     "print(f'With memoization, it takes {time.time() - start:.7f} seconds.')\n"
1058 |    ]
1059 |   },
1060 |   {
1061 |    "cell_type": "markdown",
1062 |    "metadata": {},
1063 |    "source": [
1064 |     "If you want to time multiple functions, it can be a drag having to write the same code over and over again. It'd be nice to have a way to specify how to change any function in the same way. In this case would be to call time.time() at the beginning and the end of each function, and print out the time difference.\n",
1065 |     "\n",
1066 |     "This is exactly what decorators do. They allow programmers to change the behavior of a function or class. Here's an example to create a decorator `timeit`."
1067 |    ]
1068 |   },
1069 |   {
1070 |    "cell_type": "code",
1071 |    "execution_count": 33,
1072 |    "metadata": {},
1073 |    "outputs": [],
1074 |    "source": [
1075 |     "def timeit(fn): \n",
1076 |     "    # *args and **kwargs are to support positional and named arguments of fn\n",
1077 |     "    def get_time(*args, **kwargs): \n",
1078 |     "        start = time.time() \n",
1079 |     "        output = fn(*args, **kwargs)\n",
1080 |     "        print(f\"Time taken in {fn.__name__}: {time.time() - start:.7f}\")\n",
1081 |     "        return output  # make sure that the decorator returns the output of fn\n",
1082 |     "    return get_time "
1083 |    ]
1084 |   },
1085 |   {
1086 |    "cell_type": "markdown",
1087 |    "metadata": {},
1088 |    "source": [
1089 |     "Add the decorator `@timeit` to your functions."
1090 |    ]
1091 |   },
1092 |   {
1093 |    "cell_type": "code",
1094 |    "execution_count": 34,
1095 |    "metadata": {},
1096 |    "outputs": [],
1097 |    "source": [
1098 |     "@timeit\n",
1099 |     "def fib(n):\n",
1100 |     "    return fib_helper(n)\n",
1101 |     "\n",
1102 |     "@timeit\n",
1103 |     "def fib_m(n):\n",
1104 |     "    return fib_m_helper(n, {0: 0, 1: 1})"
1105 |    ]
1106 |   },
1107 |   {
1108 |    "cell_type": "code",
1109 |    "execution_count": 35,
1110 |    "metadata": {},
1111 |    "outputs": [
1112 |     {
1113 |      "name": "stdout",
1114 |      "output_type": "stream",
1115 |      "text": [
1116 |       "Time taken in fib: 0.2787242\n",
1117 |       "Time taken in fib_m: 0.0000138\n"
1118 |      ]
1119 |     },
1120 |     {
1121 |      "data": {
1122 |       "text/plain": [
1123 |        "832040"
1124 |       ]
1125 |      },
1126 |      "execution_count": 35,
1127 |      "metadata": {},
1128 |      "output_type": "execute_result"
1129 |     }
1130 |    ],
1131 |    "source": [
1132 |     "fib(30)\n",
1133 |     "fib_m(30)"
1134 |    ]
1135 |   },
1136 |   {
1137 |    "cell_type": "markdown",
1138 |    "metadata": {},
1139 |    "source": [
1140 |     "# 7. Caching with @functools.lru_cache\n",
1141 |     "Memoization is a form of cache: we cache the previously calculated Fibonacci numbers so that we don't have to calculate them again.\n",
1142 |     "\n",
1143 |     "Caching is such an important technique that Python provides a built-in decorator to give your function the caching capacity. If you want `fib_helper` to reuse the previously calculated Fibonacci numbers, you can just add the decorator `lru_cache` from `functools`. `lru` stands for \"least recently used\". For more information on cache, see [here](https://docs.python.org/3/library/functools.html)."
1144 |    ]
1145 |   },
1146 |   {
1147 |    "cell_type": "code",
1148 |    "execution_count": 38,
1149 |    "metadata": {},
1150 |    "outputs": [],
1151 |    "source": [
1152 |     "import functools\n",
1153 |     "\n",
1154 |     "@functools.lru_cache()\n",
1155 |     "def fib_helper(n):\n",
1156 |     "    if n < 2:\n",
1157 |     "        return n\n",
1158 |     "    return fib_helper(n - 1) + fib_helper(n - 2)\n",
1159 |     "\n",
1160 |     "@timeit\n",
1161 |     "def fib(n):\n",
1162 |     "    \"\"\" fib is a wrapper function so that later we can change its behavior\n",
1163 |     "    at the top level without affecting the behavior at every recursion step.\n",
1164 |     "    \"\"\"\n",
1165 |     "    return fib_helper(n)"
1166 |    ]
1167 |   },
1168 |   {
1169 |    "cell_type": "code",
1170 |    "execution_count": 39,
1171 |    "metadata": {},
1172 |    "outputs": [
1173 |     {
1174 |      "name": "stdout",
1175 |      "output_type": "stream",
1176 |      "text": [
1177 |       "Time taken in fib: 0.0000412\n",
1178 |       "Time taken in fib_m: 0.0000281\n"
1179 |      ]
1180 |     },
1181 |     {
1182 |      "data": {
1183 |       "text/plain": [
1184 |        "12586269025"
1185 |       ]
1186 |      },
1187 |      "execution_count": 39,
1188 |      "metadata": {},
1189 |      "output_type": "execute_result"
1190 |     }
1191 |    ],
1192 |    "source": [
1193 |     "fib(50)\n",
1194 |     "fib_m(50)"
1195 |    ]
1196 |   }
1197 |  ],
1198 |  "metadata": {
1199 |   "kernelspec": {
1200 |    "display_name": "Python 3",
1201 |    "language": "python",
1202 |    "name": "python3"
1203 |   },
1204 |   "language_info": {
1205 |    "codemirror_mode": {
1206 |     "name": "ipython",
1207 |     "version": 3
1208 |    },
1209 |    "file_extension": ".py",
1210 |    "mimetype": "text/x-python",
1211 |    "name": "python",
1212 |    "nbconvert_exporter": "python",
1213 |    "pygments_lexer": "ipython3",
1214 |    "version": "3.7.4"
1215 |   }
1216 |  },
1217 |  "nbformat": 4,
1218 |  "nbformat_minor": 2
1219 | }
1220 | 


--------------------------------------------------------------------------------