├── .github └── ISSUE_TEMPLATE │ └── question-request.md ├── .gitignore ├── LICENSE ├── README.md ├── T10W13 └── README.md ├── T1W3 ├── README.md ├── alice_knn.py ├── assets │ └── knn_bst.jpg ├── bob_knn.py └── q1.py ├── T2W4 ├── README.md ├── data │ ├── iris.csv │ └── iris.names └── dtc.py ├── T3W5 ├── Intro_to_Support_Vector_Machines.ipynb ├── README.md ├── data │ ├── iris.csv │ └── iris.names └── linreg.py ├── T4aW6 ├── README.md └── Tutorial_4_FAQ.pdf ├── T4bW7 └── README.md ├── T5W8 └── README.md ├── T6W9 └── README.md ├── T7W10 └── README.md ├── T8W11 └── README.md ├── T9W12 └── README.md └── misc └── CS3244_Midterm_Cheatsheet.pdf /.github/ISSUE_TEMPLATE/question-request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Question Request 3 | about: Suggest questions you want answers to! 4 | title: Question Suggestions! 5 | labels: question 6 | assignees: rish-16 7 | 8 | --- 9 | 10 | **Drop your question(s) here.** 11 | Are Neural Networks better than Decision Trees? 12 | 13 | **What topic(s) is this question from?** 14 | Neural Networks, Decision Trees 15 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Rishabh Anand 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # CS3244-Tutorial-Material 2 | All supplementary material used by me while TA-ing **[CS3244: Machine Learning](https://nusmods.com/modules/CS3244/machine-learning)** at NUS School of Computing. 3 | 4 | ## What is this? 5 | I teach **TG-06**, the tutorial that takes place every **Monday, 1200-1300** in AY21/22 Semester 1. It is *fully online* this semester. 6 | 7 | > Unless the syllabus has drastically changed, I believe the material covered here is relevant for future AYs as well (eg: AY22/23++). The module might be deprecated soon so do take note! In future iterations of SOC's Intro to ML module, I still feel the material herein is good enough for preparation purposes. 8 | 9 | This repository contains code, figures, and miscelleaneous items that aid me in teaching my class. The main source of reference *should* be the lecture notes and tutorial questions created by the CS3244 Professors and Teaching Staff. 10 | 11 | > Official tutorial solutions will be released at the end of every week. 12 | 13 | ## Contents 14 | 15 | Here's a list of what I've covered / I'll be covering in my tutorials: 16 | 17 | - **[T1W3](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T1W3):** k-Nearest Neighbours 18 | - **[T2W4](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T2W4):** Decision Trees 19 | - **[T3W5](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T3W5):** Linear Models 20 | - **[T4aW6](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T4aW6):** Bias-Variance Tradeoff 21 | - **[T4bW7](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T4bW7):** Regularisation & Validation 22 | - **[T5W8](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T5W8):** Evaluation Metrics 23 | - **[T6W9](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T6W9):** Visualisation & Dimensionality Reduction (Approach TA Pranavan) 24 | - **[T7W10](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T7W9):** Perceptrons and Neural Networks 25 | - **[T8W11](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T8W11):** CNNs and RNNs 26 | - **[T9W12](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T9W12):** Explainable AI 27 | - **[T10W13](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T10W13):** Unsupervised Learning 28 | 29 | > The link to the slides for all my tutorials can be found in the `README.md` in each week's respective folder. 30 | 31 | ## Exam Resources 32 | I've prepared some extra resources that might aid you in your exam preparation. You can find the files here: 33 | 34 | - [**Midterm Cheatsheet:**](https://github.com/rish-16/CS3244-Tutorial-Material/blob/main/misc/CS3244_Midterm_Cheatsheet.pdf) Lectures `1a: Intro & Class Org.` to `6: Bias Variance Tradeoff` 35 | 36 | ## Contributions 37 | If there are any issues or suggestions, feel free to raise an Issue or PR. All meaningful contributions welcome! 38 | 39 | ## License 40 | [MIT](https://github.com/rish-16/CS3244-Tutorial-Material/blob/main/LICENSE) 41 | -------------------------------------------------------------------------------- /T10W13/README.md: -------------------------------------------------------------------------------- 1 | # T10W13: Unsupervised Learning 2 | 3 | In T10W13, we cover Unsupervised Learning. The content recap and tutorial solutions can be found on the slides [here](https://docs.google.com/presentation/d/1zfhCQrQMMMdCdSbLmCy9x9brN4VoTyLkHfDu6QPQwNM/edit?usp=sharing). -------------------------------------------------------------------------------- /T1W3/README.md: -------------------------------------------------------------------------------- 1 | # Tutorial 1 Week 3: k-Nearest Neighbours 2 | 3 | In T1W3, I cover the k-Nearest Neighbours (k-NN) algorithm. Find the tutorial slides [here](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1). 4 | 5 | ## Contents 6 | This repo contains the code used to answer Questions 2, 3, 4. 7 | 8 | ### Question 2a 9 | Here's the ranking table used to classify the new point `(1, 1)` using 3-NN: 10 | ``` 11 | Rank Point Distance Label 12 | 1 (0, 1) 1.000 1 13 | 2 (1, 0) 1.000 1 14 | 3 (1, 2) 1.000 1 15 | 4 (0, 2) 1.414 0 16 | 5 (2, 2) 1.414 0 17 | 6 (-1, 1) 2.000 0 18 | 7 (1, -1) 2.000 0 19 | 8 (2, 3) 2.236 1 20 | 21 | Rank Point Distance Label 22 | 1 (0, 1) 1.000 1 23 | 2 (1, 0) 1.000 1 24 | 3 (1, 2) 1.000 1 25 | 26 | The new point (1, 1) belongs to class 1 using 3-NN. 27 | ``` 28 | 29 | Here's the ranking table used to classify the new point `(1, 1)` using 7-NN: 30 | ``` 31 | Rank Point Distance Label 32 | 1 (0, 1) 1.000 1 33 | 2 (1, 0) 1.000 1 34 | 3 (1, 2) 1.000 1 35 | 4 (0, 2) 1.414 0 36 | 5 (2, 2) 1.414 0 37 | 6 (-1, 1) 2.000 0 38 | 7 (1, -1) 2.000 0 39 | 8 (2, 3) 2.236 1 40 | 41 | Rank Point Distance Label 42 | 1 (0, 1) 1.000 1 43 | 2 (1, 0) 1.000 1 44 | 3 (1, 2) 1.000 1 45 | 4 (0, 2) 1.414 0 46 | 5 (2, 2) 1.414 0 47 | 6 (-1, 1) 2.000 0 48 | 7 (1, -1) 2.000 0 49 | 50 | The new point (1, 1) belongs to class 0 using 7-NN. 51 | ``` 52 | 53 | ### Question 2b 54 | Larger values of `k` will lead to smoother decision boundaries. This leads to lower chances of overfitting (Covered in `T3W5`). So, the order is: 55 | 56 | ``` 57 | k_l < k_c < k_r 58 | ``` 59 | 60 | ### Question 2c 61 | Time taken to run inference on test dataset for vanilla `k-NN` is indepedent of `k`. Altogether, we'll still be taking `m * t` time given a dataset of `m` samples. 62 | 63 | --- 64 | 65 | ### Question 3a 66 | Both algorithms are correct. Alice's algorithm runs in `O(n(d+k))` while that of Bob runs in `O(ndk)`. Alice's algorithm is much faster. 67 | 68 | For implementations, check out `alice_knn.py` and `bob_knn.py`. 69 | 70 | ### Question 3b 71 | Maintain a Balanced BST (Min/Max Heap) with `k` nodes where the BST tracks the top `k` smallest distances. This would reduce the running time to `O(n(d + logk))`. Here's how you can do it: 72 | 73 | 1. Calculate distances between all the `n` points and the new observation. This takes `O(nd)` 74 | 75 | 2. Add the first `k` distances into a Balanced BST 76 | 77 | 3. Look at the `n-k` unadded distances and iterate through them 78 | 79 | 4. If current distance is > BST root, ignore and move to the next one. This takes `O(1)` 80 | 81 | 5. If current distance is < BST root, remove root, insert current distance, and move on. This takes `O(n * logk)` for all `n` samples 82 | 83 | 6. By the end, you'll have the correct answers occupying all the `k` nodes in the BST. This takes `O(nd + nlogk)` in total 84 | 85 | > Essentially, we are taking the first `k` distances that may be incorrect and replacing them one by one with the correct `k` distances. 86 | 87 |

88 | 89 |

90 | 91 | ``` 92 | if d_(k+1) < d_root: 93 | d_root = d_(k+1) 94 | ``` 95 | 96 | --- 97 | 98 | ### Question 4 99 | No. The difference between the two ranges give `0.4` and `10º Celcius`. This means the `Temperature` variable will dominate the k-NN when calculating the Euclidean Distance, minimising the impact or effect of the `Humidity` variable. 100 | 101 | We can minimise the effect of this disproportion by **normalising** or **standardising** the inputs to a suitable range that won't affect the distance metric immensely. This will be covered in future classes. 102 | -------------------------------------------------------------------------------- /T1W3/alice_knn.py: -------------------------------------------------------------------------------- 1 | import math 2 | from pprint import pprint 3 | 4 | class Point: 5 | def __init__(self, x, y, label=None): 6 | self.x = x 7 | self.y = y 8 | self.label = label 9 | 10 | def euclidean_distance(self, pt): 11 | ''' 12 | Calculates Euclidean Distance 13 | [(x1 - x2)^2 + (y1 - y2)^2] ** 0.5 14 | ''' 15 | return math.sqrt( 16 | (self.x - pt.x)**2 + (self.y - pt.y)**2 17 | ) 18 | 19 | def __str__(self): 20 | return "({}, {}) | {}".format(self.x, self.y, self.label) 21 | 22 | def __repr__(self): 23 | return "({}, {}) | {}".format(self.x, self.y, self.label) 24 | 25 | points = [ 26 | Point(-1, 1, 0), 27 | Point(0, 1, 1), 28 | Point(0, 2, 0), 29 | Point(1, -1, 0), 30 | Point(1, 0, 1), 31 | Point(1, 2, 1), 32 | Point(2, 2, 0), 33 | Point(2, 3, 1) 34 | ] 35 | 36 | new = Point(1, 1) 37 | k = 3 38 | 39 | S = [0 for _ in range(len(points))] 40 | D = list(map(lambda pt : pt.euclidean_distance(new), points)) # O(nd) 41 | answers = [] 42 | 43 | # O(nk) 44 | for i in range(len(points)): 45 | for j in range(k): 46 | i_min = min(range(len(D)), key=D.__getitem__) 47 | min_D = min(D) 48 | 49 | if S[i_min] == 0: 50 | S[i_min] = 1 51 | D[i_min] = float('inf') # past smallest will not be picked again 52 | answers.append(i_min) 53 | 54 | for i in answers[:k]: 55 | print (points[i]) -------------------------------------------------------------------------------- /T1W3/assets/knn_bst.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rish-16/CS3244-Tutorial-Material/7681d6ad96354c3d672e5ef526dfd16f03eb5d0c/T1W3/assets/knn_bst.jpg -------------------------------------------------------------------------------- /T1W3/bob_knn.py: -------------------------------------------------------------------------------- 1 | import math 2 | from pprint import pprint 3 | 4 | class Point: 5 | def __init__(self, x, y, label=None): 6 | self.x = x 7 | self.y = y 8 | self.label = label 9 | 10 | def euclidean_distance(self, pt): 11 | ''' 12 | Calculates Euclidean Distance 13 | [(x1 - x2)^2 + (y1 - y2)^2] ** 0.5 14 | ''' 15 | return math.sqrt( 16 | (self.x - pt.x)**2 + (self.y - pt.y)**2 17 | ) 18 | 19 | def __str__(self): 20 | return "({}, {}) | {}".format(self.x, self.y, self.label) 21 | 22 | def __repr__(self): 23 | return "({}, {}) | {}".format(self.x, self.y, self.label) 24 | 25 | points = [ 26 | Point(-1, 1, 0), 27 | Point(0, 1, 1), 28 | Point(0, 2, 0), 29 | Point(1, -1, 0), 30 | Point(1, 0, 1), 31 | Point(1, 2, 1), 32 | Point(2, 2, 0), 33 | Point(2, 3, 1) 34 | ] 35 | 36 | new = Point(1, 1) 37 | k = 3 38 | 39 | S = [0 for _ in range(len(points))] 40 | 41 | # O(k(nd)) 42 | for i in range(k): # O(k) 43 | filtered_pts = [points[i] for i in range(len(S)) if S[i] == 0] # O(n) 44 | D = list(map(lambda pt : pt.euclidean_distance(new), filtered_pts)) # O(nd) 45 | i_min = min(range(len(D)), key=D.__getitem__) # O(n) 46 | S[i_min] = 1 47 | 48 | for i in (range(len(S))): 49 | if S[i] == 1: 50 | print (points[i], points[i].euclidean_distance(new)) -------------------------------------------------------------------------------- /T1W3/q1.py: -------------------------------------------------------------------------------- 1 | import math 2 | from pprint import pprint 3 | 4 | class Point: 5 | def __init__(self, x, y, label=None): 6 | self.x = x 7 | self.y = y 8 | self.label = label 9 | 10 | def euclidean_distance(self, pt): 11 | ''' 12 | Calculates Euclidean Distance 13 | [(x1 - x2)^2 + (y1 - y2)^2] ** 0.5 14 | ''' 15 | return math.sqrt( 16 | (self.x - pt.x)**2 + (self.y - pt.y)**2 17 | ) 18 | 19 | points = [ 20 | Point(-1, 1, 0), 21 | Point(0, 1, 1), 22 | Point(0, 2, 0), 23 | Point(1, -1, 0), 24 | Point(1, 0, 1), 25 | Point(1, 2, 1), 26 | Point(2, 2, 0), 27 | Point(2, 3, 1) 28 | ] 29 | 30 | new = Point(1, 1) 31 | k = 7 32 | 33 | # k-NN algorithm 34 | distances = list(map( 35 | lambda pt : [pt.euclidean_distance(new), pt.x, pt.y, pt.label], 36 | points 37 | )) 38 | 39 | sorted_distances = sorted(distances, key=lambda pt : pt[0]) 40 | 41 | print ('{:<10}{:<10}{:<15}{:<5}'.format("Rank", "Point", "Distance", "Label")) 42 | 43 | for i, rec in enumerate(sorted_distances, 0): 44 | print ('{:<10}{:<10}{:<15.3f}{:<5}'.format( 45 | i+1, 46 | "(" + str(rec[1]) + ", " + str(rec[2])+ ")", 47 | rec[0], 48 | rec[3] 49 | )) 50 | 51 | print () 52 | print ('{:<10}{:<10}{:<15}{:<5}'.format("Rank", "Point", "Distance", "Label")) 53 | 54 | for rec in sorted_distances[:k]: 55 | print ('{:<10}{:<10}{:<15.3f}{:<5}'.format( 56 | i+1, 57 | "(" + str(rec[1]) + ", " + str(rec[2])+ ")", 58 | rec[0], 59 | rec[3] 60 | )) 61 | 62 | print ("\nThe new point (1, 1) belongs to class 1.") -------------------------------------------------------------------------------- /T2W4/README.md: -------------------------------------------------------------------------------- 1 | # Tutorial 2 Week 4: Decision Trees 2 | 3 | In T2W4, I cover the Decision Tree algorithm. Find the tutorial slides [here](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1). 4 | 5 | I have included the implementation of a Decision Tree classifier in pure Python using `numpy` and `pandas` in `dtc.py`. You can call it in a similar fashion to that in `sklearn` using the `DecisionTreeClassifier` object. I train it on the popular *Iris Type Classification Dataset* found in `data/iris.csv`. 6 | 7 | ## Contents 8 | This repo contains the code used to answer Questions 1, 2, 3, and 4. 9 | 10 | ### Question 1a 11 | 12 | ``` 13 | S_1 14 | / \ 15 | 0/ \1 16 | F=0 S_2 17 | / \ 18 | 0/ \1 19 | F=1 S_3 20 | / \ 21 | 0/ \1 22 | F=1 F=0 23 | ``` 24 | 25 | ### Question 1b 26 | Function `F` can be represented as `AND(S_2, S_3)`. We can build as tree that's of depth 2: 27 | 28 | ``` 29 | S_2 30 | / \ 31 | 0/ \1 32 | S_3 S_3 33 | / \ / \ 34 | 0/ \1 0/ \1 35 | F=0 F=1 F=1 F=1 36 | ``` 37 | 38 | If your memory of the `AND` gate is fuzzy, here's a tabular summary: 39 | 40 | | **A** | **B** | **AND** | 41 | |-------|-------|---------| 42 | | 0 | 0 | 0 | 43 | | 0 | 1 | 0 | 44 | | 1 | 0 | 0 | 45 | | 1 | 1 | 1 | 46 | 47 | ### Question 1c 48 | 49 | If your memory of the `XOR` gate is fuzzy, here's a tabular summary: 50 | 51 | | **A** | **B** | **XOR** | 52 | |-------|-------|---------| 53 | | 0 | 0 | 0 | 54 | | 0 | 1 | 1 | 55 | | 1 | 0 | 1 | 56 | | 1 | 1 | 0 | 57 | 58 | To implement this `XOR` gate, we'd need `2^d` leaf nodes and `2^d - 1` internal nodes. It grows exponentially with `d`. Using a Decision Tree is not scalable. Pruning is not possible because we need to consider every single input (ie. a feature) – we can't just ignore any of them. 59 | 60 | You can, however, implement `AND` and `OR` gates using a DT since pruning is possible. We dont need to consider all our inputs. For example, for an `AND` gate, if any one of our inputs is `0`, the result is `0` regardless of the other inputs. Likewise, if any feature in an `OR` gate is `1`, the result is `1` regardless of the other inputs. 61 | 62 | --- 63 | 64 | ### Question 2a 65 | The features are as follows: 66 | 67 | - `Income` 68 | - `Credit History` 69 | - `Debt` 70 | 71 | The label is `Decision`. 72 | 73 | At each level, the main question we will be asking is, 74 | 75 | 76 | > Which feature to choose such that splitting via that gives us the "greatest purity" ie. the most even split between samples. 77 | 78 | The tree would look like so: 79 | 80 | ``` 81 | CrHi? 82 | / | \ 83 | Bad/ Good| \Unknown 84 | / | \ 85 | Rej App Income? 86 | / | \ 87 | / | \ 88 | 0-5K/ 5-10K| \10K+ 89 | Debt App App 90 | / \ 91 | Low/ \High 92 | App Rej 93 | ``` 94 | 95 | *Note: Refer to the slides for more on Information Gain and Entropy. We covered Claude Shannon's Information Theory in this class!* 96 | 97 | ### Question 2b 98 | 99 | Tree 1: 100 | ``` 101 | CrHi? 102 | / | \ 103 | Good/ Bad| \Unknown 104 | App Rej Income? 105 | / | \ 106 | 0-5K/ 5-10K| \10K+ 107 | App App App 108 | ``` 109 | 110 | Tree 2: 111 | ``` 112 | CrHi? 113 | / | \ 114 | Good/ Bad| \Unknown 115 | App Rej Debt? 116 | / \ 117 | Low/ \High 118 | App App 119 | ``` 120 | 121 | Tree 3: 122 | ``` 123 | Income? 124 | 125 | / | \ 126 | / | \ 127 | 0-5K/ 5-10K| \10K+ 128 | Debt? Debt? Debt? 129 | / \ / \ / \ 130 | Low/ \High Low/ \High Low/ \High 131 | App Rej App App App App 132 | ``` 133 | 134 | ### Question 2c 135 | > Of course, you must convert (encode) these strings like `GOOD`, `BAD`, `HIGH`, `LOW` to numeric values. So `GOOD = 1` and `BAD = 0`, for example. Same goes for the labels. ML Models ***DO NOT*** work with raw strings, only numbers. 136 | 137 | `DT($4K, GOOD CH, HIGH debt) = Approve` 138 | 139 | > Hint: Just follow path down to the leaf in your DT Classifier from Question 2a. 140 | 141 | If we use our 3 DTs, the results will be the following: 142 | 143 | ``` 144 | Tree 1: Approve 145 | Tree 2: Approve 146 | Tree 3: Reject 147 | ``` 148 | 149 | If we use uniform voting (every tree gets equal say ie. majority voting), we `Approve` the application since 2/3 classifiers agree. 150 | 151 | --- 152 | 153 | ### Question 3a 154 | Debt depends on Income. Person A with income of $5K and a debt of $4K, and Person B with income $15K and a debt of $4K, results in Person A being in `HIGH` debt while Person B is in `LOW` debt. Debt is categorical and Income is a quantifiable, continuous variable. This makes the explainability ambiguous. 155 | 156 | ### Question 3b 157 | Empirically, Decision Trees are bad performers on datasets with missing values. To calculate metrics like Information Gain and Entropy, it is nice to have all the information in front of us. Missing data makes these measures unreliable, making the DT classifier inaccurate. Replacing missing values with alternatives (`mean`, `max`, `min`, `mode`, etc.) could easily skew your data (think of it as "poisoning your dataset"). Also, dropping the affected rows makes the dataset smaller and non-representative of those specific cases (your model won't know what to do in those cases anymore). 158 | 159 | ### Question 3c 160 | Decision Trees do not consider temporal (time-related) features. You are introducing heavy class imbalance into your dataset by appending rows of data with a `REJECT` decision. Your model might overfit on this new biased dataset. Always try to maintain a good balance of `positive` and `negative` cases in your dataset to allow for better generalisation. 161 | 162 | > More on class imbalance and overfitting in future weeks! 163 | 164 | --- 165 | 166 | ### Question 4 167 | Refer to the working covered in class. You can find the official working in the [slides](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1). -------------------------------------------------------------------------------- /T2W4/data/iris.csv: -------------------------------------------------------------------------------- 1 | sepal_length, sepal_width, petal_length, petal_width, class 2 | 5.1,3.5,1.4,0.2,Iris-setosa 3 | 4.9,3.0,1.4,0.2,Iris-setosa 4 | 4.7,3.2,1.3,0.2,Iris-setosa 5 | 4.6,3.1,1.5,0.2,Iris-setosa 6 | 5.0,3.6,1.4,0.2,Iris-setosa 7 | 5.4,3.9,1.7,0.4,Iris-setosa 8 | 4.6,3.4,1.4,0.3,Iris-setosa 9 | 5.0,3.4,1.5,0.2,Iris-setosa 10 | 4.4,2.9,1.4,0.2,Iris-setosa 11 | 4.9,3.1,1.5,0.1,Iris-setosa 12 | 5.4,3.7,1.5,0.2,Iris-setosa 13 | 4.8,3.4,1.6,0.2,Iris-setosa 14 | 4.8,3.0,1.4,0.1,Iris-setosa 15 | 4.3,3.0,1.1,0.1,Iris-setosa 16 | 5.8,4.0,1.2,0.2,Iris-setosa 17 | 5.7,4.4,1.5,0.4,Iris-setosa 18 | 5.4,3.9,1.3,0.4,Iris-setosa 19 | 5.1,3.5,1.4,0.3,Iris-setosa 20 | 5.7,3.8,1.7,0.3,Iris-setosa 21 | 5.1,3.8,1.5,0.3,Iris-setosa 22 | 5.4,3.4,1.7,0.2,Iris-setosa 23 | 5.1,3.7,1.5,0.4,Iris-setosa 24 | 4.6,3.6,1.0,0.2,Iris-setosa 25 | 5.1,3.3,1.7,0.5,Iris-setosa 26 | 4.8,3.4,1.9,0.2,Iris-setosa 27 | 5.0,3.0,1.6,0.2,Iris-setosa 28 | 5.0,3.4,1.6,0.4,Iris-setosa 29 | 5.2,3.5,1.5,0.2,Iris-setosa 30 | 5.2,3.4,1.4,0.2,Iris-setosa 31 | 4.7,3.2,1.6,0.2,Iris-setosa 32 | 4.8,3.1,1.6,0.2,Iris-setosa 33 | 5.4,3.4,1.5,0.4,Iris-setosa 34 | 5.2,4.1,1.5,0.1,Iris-setosa 35 | 5.5,4.2,1.4,0.2,Iris-setosa 36 | 4.9,3.1,1.5,0.1,Iris-setosa 37 | 5.0,3.2,1.2,0.2,Iris-setosa 38 | 5.5,3.5,1.3,0.2,Iris-setosa 39 | 4.9,3.1,1.5,0.1,Iris-setosa 40 | 4.4,3.0,1.3,0.2,Iris-setosa 41 | 5.1,3.4,1.5,0.2,Iris-setosa 42 | 5.0,3.5,1.3,0.3,Iris-setosa 43 | 4.5,2.3,1.3,0.3,Iris-setosa 44 | 4.4,3.2,1.3,0.2,Iris-setosa 45 | 5.0,3.5,1.6,0.6,Iris-setosa 46 | 5.1,3.8,1.9,0.4,Iris-setosa 47 | 4.8,3.0,1.4,0.3,Iris-setosa 48 | 5.1,3.8,1.6,0.2,Iris-setosa 49 | 4.6,3.2,1.4,0.2,Iris-setosa 50 | 5.3,3.7,1.5,0.2,Iris-setosa 51 | 5.0,3.3,1.4,0.2,Iris-setosa 52 | 7.0,3.2,4.7,1.4,Iris-versicolor 53 | 6.4,3.2,4.5,1.5,Iris-versicolor 54 | 6.9,3.1,4.9,1.5,Iris-versicolor 55 | 5.5,2.3,4.0,1.3,Iris-versicolor 56 | 6.5,2.8,4.6,1.5,Iris-versicolor 57 | 5.7,2.8,4.5,1.3,Iris-versicolor 58 | 6.3,3.3,4.7,1.6,Iris-versicolor 59 | 4.9,2.4,3.3,1.0,Iris-versicolor 60 | 6.6,2.9,4.6,1.3,Iris-versicolor 61 | 5.2,2.7,3.9,1.4,Iris-versicolor 62 | 5.0,2.0,3.5,1.0,Iris-versicolor 63 | 5.9,3.0,4.2,1.5,Iris-versicolor 64 | 6.0,2.2,4.0,1.0,Iris-versicolor 65 | 6.1,2.9,4.7,1.4,Iris-versicolor 66 | 5.6,2.9,3.6,1.3,Iris-versicolor 67 | 6.7,3.1,4.4,1.4,Iris-versicolor 68 | 5.6,3.0,4.5,1.5,Iris-versicolor 69 | 5.8,2.7,4.1,1.0,Iris-versicolor 70 | 6.2,2.2,4.5,1.5,Iris-versicolor 71 | 5.6,2.5,3.9,1.1,Iris-versicolor 72 | 5.9,3.2,4.8,1.8,Iris-versicolor 73 | 6.1,2.8,4.0,1.3,Iris-versicolor 74 | 6.3,2.5,4.9,1.5,Iris-versicolor 75 | 6.1,2.8,4.7,1.2,Iris-versicolor 76 | 6.4,2.9,4.3,1.3,Iris-versicolor 77 | 6.6,3.0,4.4,1.4,Iris-versicolor 78 | 6.8,2.8,4.8,1.4,Iris-versicolor 79 | 6.7,3.0,5.0,1.7,Iris-versicolor 80 | 6.0,2.9,4.5,1.5,Iris-versicolor 81 | 5.7,2.6,3.5,1.0,Iris-versicolor 82 | 5.5,2.4,3.8,1.1,Iris-versicolor 83 | 5.5,2.4,3.7,1.0,Iris-versicolor 84 | 5.8,2.7,3.9,1.2,Iris-versicolor 85 | 6.0,2.7,5.1,1.6,Iris-versicolor 86 | 5.4,3.0,4.5,1.5,Iris-versicolor 87 | 6.0,3.4,4.5,1.6,Iris-versicolor 88 | 6.7,3.1,4.7,1.5,Iris-versicolor 89 | 6.3,2.3,4.4,1.3,Iris-versicolor 90 | 5.6,3.0,4.1,1.3,Iris-versicolor 91 | 5.5,2.5,4.0,1.3,Iris-versicolor 92 | 5.5,2.6,4.4,1.2,Iris-versicolor 93 | 6.1,3.0,4.6,1.4,Iris-versicolor 94 | 5.8,2.6,4.0,1.2,Iris-versicolor 95 | 5.0,2.3,3.3,1.0,Iris-versicolor 96 | 5.6,2.7,4.2,1.3,Iris-versicolor 97 | 5.7,3.0,4.2,1.2,Iris-versicolor 98 | 5.7,2.9,4.2,1.3,Iris-versicolor 99 | 6.2,2.9,4.3,1.3,Iris-versicolor 100 | 5.1,2.5,3.0,1.1,Iris-versicolor 101 | 5.7,2.8,4.1,1.3,Iris-versicolor 102 | 6.3,3.3,6.0,2.5,Iris-virginica 103 | 5.8,2.7,5.1,1.9,Iris-virginica 104 | 7.1,3.0,5.9,2.1,Iris-virginica 105 | 6.3,2.9,5.6,1.8,Iris-virginica 106 | 6.5,3.0,5.8,2.2,Iris-virginica 107 | 7.6,3.0,6.6,2.1,Iris-virginica 108 | 4.9,2.5,4.5,1.7,Iris-virginica 109 | 7.3,2.9,6.3,1.8,Iris-virginica 110 | 6.7,2.5,5.8,1.8,Iris-virginica 111 | 7.2,3.6,6.1,2.5,Iris-virginica 112 | 6.5,3.2,5.1,2.0,Iris-virginica 113 | 6.4,2.7,5.3,1.9,Iris-virginica 114 | 6.8,3.0,5.5,2.1,Iris-virginica 115 | 5.7,2.5,5.0,2.0,Iris-virginica 116 | 5.8,2.8,5.1,2.4,Iris-virginica 117 | 6.4,3.2,5.3,2.3,Iris-virginica 118 | 6.5,3.0,5.5,1.8,Iris-virginica 119 | 7.7,3.8,6.7,2.2,Iris-virginica 120 | 7.7,2.6,6.9,2.3,Iris-virginica 121 | 6.0,2.2,5.0,1.5,Iris-virginica 122 | 6.9,3.2,5.7,2.3,Iris-virginica 123 | 5.6,2.8,4.9,2.0,Iris-virginica 124 | 7.7,2.8,6.7,2.0,Iris-virginica 125 | 6.3,2.7,4.9,1.8,Iris-virginica 126 | 6.7,3.3,5.7,2.1,Iris-virginica 127 | 7.2,3.2,6.0,1.8,Iris-virginica 128 | 6.2,2.8,4.8,1.8,Iris-virginica 129 | 6.1,3.0,4.9,1.8,Iris-virginica 130 | 6.4,2.8,5.6,2.1,Iris-virginica 131 | 7.2,3.0,5.8,1.6,Iris-virginica 132 | 7.4,2.8,6.1,1.9,Iris-virginica 133 | 7.9,3.8,6.4,2.0,Iris-virginica 134 | 6.4,2.8,5.6,2.2,Iris-virginica 135 | 6.3,2.8,5.1,1.5,Iris-virginica 136 | 6.1,2.6,5.6,1.4,Iris-virginica 137 | 7.7,3.0,6.1,2.3,Iris-virginica 138 | 6.3,3.4,5.6,2.4,Iris-virginica 139 | 6.4,3.1,5.5,1.8,Iris-virginica 140 | 6.0,3.0,4.8,1.8,Iris-virginica 141 | 6.9,3.1,5.4,2.1,Iris-virginica 142 | 6.7,3.1,5.6,2.4,Iris-virginica 143 | 6.9,3.1,5.1,2.3,Iris-virginica 144 | 5.8,2.7,5.1,1.9,Iris-virginica 145 | 6.8,3.2,5.9,2.3,Iris-virginica 146 | 6.7,3.3,5.7,2.5,Iris-virginica 147 | 6.7,3.0,5.2,2.3,Iris-virginica 148 | 6.3,2.5,5.0,1.9,Iris-virginica 149 | 6.5,3.0,5.2,2.0,Iris-virginica 150 | 6.2,3.4,5.4,2.3,Iris-virginica 151 | 5.9,3.0,5.1,1.8,Iris-virginica -------------------------------------------------------------------------------- /T2W4/data/iris.names: -------------------------------------------------------------------------------- 1 | 1. Title: Iris Plants Database 2 | Updated Sept 21 by C.Blake - Added discrepency information 3 | 4 | 2. Sources: 5 | (a) Creator: R.A. Fisher 6 | (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov) 7 | (c) Date: July, 1988 8 | 9 | 3. Past Usage: 10 | - Publications: too many to mention!!! Here are a few. 11 | 1. Fisher,R.A. "The use of multiple measurements in taxonomic problems" 12 | Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions 13 | to Mathematical Statistics" (John Wiley, NY, 1950). 14 | 2. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis. 15 | (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218. 16 | 3. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System 17 | Structure and Classification Rule for Recognition in Partially Exposed 18 | Environments". IEEE Transactions on Pattern Analysis and Machine 19 | Intelligence, Vol. PAMI-2, No. 1, 67-71. 20 | -- Results: 21 | -- very low misclassification rates (0% for the setosa class) 22 | 4. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE 23 | Transactions on Information Theory, May 1972, 431-433. 24 | -- Results: 25 | -- very low misclassification rates again 26 | 5. See also: 1988 MLC Proceedings, 54-64. Cheeseman et al's AUTOCLASS II 27 | conceptual clustering system finds 3 classes in the data. 28 | 29 | 4. Relevant Information: 30 | --- This is perhaps the best known database to be found in the pattern 31 | recognition literature. Fisher's paper is a classic in the field 32 | and is referenced frequently to this day. (See Duda & Hart, for 33 | example.) The data set contains 3 classes of 50 instances each, 34 | where each class refers to a type of iris plant. One class is 35 | linearly separable from the other 2; the latter are NOT linearly 36 | separable from each other. 37 | --- Predicted attribute: class of iris plant. 38 | --- This is an exceedingly simple domain. 39 | --- This data differs from the data presented in Fishers article 40 | (identified by Steve Chadwick, spchadwick@espeedaz.net ) 41 | The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa" 42 | where the error is in the fourth feature. 43 | The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa" 44 | where the errors are in the second and third features. 45 | 46 | 5. Number of Instances: 150 (50 in each of three classes) 47 | 48 | 6. Number of Attributes: 4 numeric, predictive attributes and the class 49 | 50 | 7. Attribute Information: 51 | 1. sepal length in cm 52 | 2. sepal width in cm 53 | 3. petal length in cm 54 | 4. petal width in cm 55 | 5. class: 56 | -- Iris Setosa 57 | -- Iris Versicolour 58 | -- Iris Virginica 59 | 60 | 8. Missing Attribute Values: None 61 | 62 | Summary Statistics: 63 | Min Max Mean SD Class Correlation 64 | sepal length: 4.3 7.9 5.84 0.83 0.7826 65 | sepal width: 2.0 4.4 3.05 0.43 -0.4194 66 | petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) 67 | petal width: 0.1 2.5 1.20 0.76 0.9565 (high!) 68 | 69 | 9. Class Distribution: 33.3% for each of 3 classes. 70 | -------------------------------------------------------------------------------- /T2W4/dtc.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | 4 | ''' 5 | Decision Trees are greedy algorithms 6 | that maximise the current Information Gain 7 | without backtracking or going back up to the root. 8 | 9 | Future splits are based on the current splits: 10 | split(t+1) = f(split(t)) 11 | 12 | At every level, the impurity of the dataset 13 | decreases. The entropy (randomness) decreases 14 | with the level. 15 | ''' 16 | 17 | class DTNode(): 18 | def __init__(self, feat_idx=None, bounds=None, left=None, right=None, info_gain=None, value=None): 19 | self.feat_idx = feat_idx 20 | self.bounds = bounds 21 | self.left = left 22 | self.right = right 23 | self.info_gain = info_gain 24 | self.value = value 25 | 26 | class DecisionTreeClassifier(): 27 | def __init__(self, depth=2, min_split=2): 28 | self.root = None 29 | self.depth = depth 30 | self.min_split = min_split 31 | 32 | def build_tree(self, dataset, cur_depth=0): 33 | x, y = dataset[:, :-1], dataset[:, -1] 34 | n, n_dim = x.shape 35 | 36 | # recursively build the subtrees 37 | if n >= self.min_split and cur_depth <= self.depth: 38 | best_split = self.get_best_split(dataset, n, n_dim) 39 | 40 | if best_split['info_gain'] > 0: 41 | left_tree = self.build_tree(best_split['left'], cur_depth+1) 42 | right_tree = self.build_tree(best_split['right'], cur_depth+1) 43 | 44 | return DTNode(best_split['feat_idx'], best_split['bounds'], left_tree, right_tree, best_split['info_gain']) 45 | 46 | y = list(y) 47 | value = max(y, key=y.count) # class label = majority count at leaves 48 | 49 | return DTNode(value=value) 50 | 51 | def get_best_split(self, dataset, n, n_dim): 52 | best_split = {} 53 | max_info_gain = -float('inf') 54 | 55 | for idx in range(n_dim): 56 | feat_val = dataset[:, idx] 57 | possible_boundss = np.unique(feat_val) 58 | 59 | for thresh in possible_boundss: 60 | # data_left, data_right = self.split(dataset, idx, thresh) 61 | data_left = np.array([row for row in dataset if row[idx] <= thresh]) 62 | data_right = np.array([row for row in dataset if row[idx] > thresh]) 63 | 64 | if len(data_left) > 0 and len(data_right) > 0: 65 | y, left_y, right_y = dataset[:, -1], data_left[:, -1], data_right[:, -1] 66 | cur_info_gain = self.get_info_gain(y, left_y, right_y) 67 | 68 | if cur_info_gain > max_info_gain: 69 | best_split['feat_idx'] = idx 70 | best_split['bounds'] = thresh 71 | best_split['left'] = data_left 72 | best_split['right'] = data_right 73 | best_split['info_gain'] = cur_info_gain 74 | max_info_gain = cur_info_gain 75 | 76 | return best_split 77 | 78 | def get_info_gain(self, parent, left, right): 79 | weight_left = len(left) / len(parent) 80 | weight_right = len(right) / len(parent) 81 | 82 | info_gain = self.get_entropy(parent) - (weight_left * self.get_entropy(left) + weight_right * self.get_entropy(right)) 83 | 84 | return info_gain 85 | 86 | def get_entropy(self, y): 87 | labels = np.unique(y) 88 | entropy = 0 89 | for cls in labels: 90 | p_cls = len(y[y == cls]) / len(y) 91 | entropy += -p_cls * np.log2(p_cls) 92 | 93 | return entropy 94 | 95 | def fit(self, x, y): 96 | dataset = np.concatenate((x, y), axis=1) 97 | self.root = self.build_tree(dataset) 98 | 99 | def make_pred(self, x, root): 100 | if root.value != None: 101 | return root.value 102 | 103 | feat_val = x[root.feat_idx] 104 | 105 | if feat_val <= root.bounds: 106 | return self.make_pred(x, root.left) 107 | else: 108 | return self.make_pred(x, root.right) 109 | 110 | def predict(self, x): 111 | return [self.make_pred(i, self.root) for i in x] 112 | 113 | cols = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'] 114 | df = pd.read_csv('./data/iris.csv', skiprows=1, header=0, names=cols) 115 | 116 | # replace class strings with integer indices 117 | df['class'] = df['class'].str.replace('Iris-setosa', '0') 118 | df['class'] = df['class'].str.replace('Iris-versicolor', '1') 119 | df['class'] = df['class'].str.replace('Iris-virginica', '2') 120 | df['class'] = df['class'].map(lambda x : int(x)) 121 | 122 | X = df.iloc[:, :-1].values 123 | Y = df.iloc[:, -1].values.reshape(-1, 1) 124 | X = np.array(X) 125 | Y = np.array(Y) 126 | 127 | clf = DecisionTreeClassifier() 128 | clf.fit(X, Y) # split this into training and testing datasets 129 | 130 | def print_tree(root=None, indent=" "): 131 | if root.value != None: 132 | print (root.value) 133 | else: 134 | print ("x_" + str(root.feat_idx), '<=', root.bounds, ":", format(root.info_gain, '0.4f')) 135 | print (indent + "left: ", end="") 136 | print_tree(root.left, indent + indent) 137 | print (indent + "right: ", end="") 138 | print_tree(root.right, indent + indent) 139 | 140 | print_tree(clf.root) -------------------------------------------------------------------------------- /T3W5/Intro_to_Support_Vector_Machines.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Support Vector Machines – An In-depth Tutorial", 7 | "provenance": [], 8 | "collapsed_sections": [] 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | } 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "code", 21 | "metadata": { 22 | "id": "05sNYyltOOMD" 23 | }, 24 | "source": [ 25 | "import numpy as np\n", 26 | "import pandas as pd\n", 27 | "from sklearn import datasets" 28 | ], 29 | "execution_count": 5, 30 | "outputs": [] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "metadata": { 35 | "id": "eoCyq5f7OVwZ" 36 | }, 37 | "source": [ 38 | "class SupperVectorClassifier:\n", 39 | " def __init__(self):\n", 40 | " self.w = None\n", 41 | " self.iterations = 1000\n", 42 | "\n", 43 | " def hinge_loss(self, YHAT, Y):\n", 44 | " '''\n", 45 | " Hinge Loss from lecture. No changes made.\n", 46 | " '''\n", 47 | "\n", 48 | " distances = 1 - (Y * YHAT)\n", 49 | " distances[distances < 0] = 0 # everywhere it's the correct prediction, give a loss of 0\n", 50 | " return np.sum(distances) / len(YHAT) # average loss\n", 51 | "\n", 52 | " def gradient_descent(self, X, Y, loss):\n", 53 | " '''\n", 54 | " Vanilla gradient descent. \n", 55 | " \n", 56 | " You can switch this to SGD as well to improve performance.\n", 57 | " '''\n", 58 | "\n", 59 | " grads = {}\n", 60 | " loss = 1 - (Y * np.dot(X, self.w))\n", 61 | " dw = np.zeros(len(self.w))\n", 62 | " \n", 63 | " for ind, d in enumerate(loss):\n", 64 | " if max(0, d) == 0:\n", 65 | " di = self.w\n", 66 | " else:\n", 67 | " di = self.w - (Y[ind] * X[ind])\n", 68 | " dw += di\n", 69 | " \n", 70 | " dw = dw / len(Y) # get the average gradient\n", 71 | " grads['dw'] = dw\n", 72 | "\n", 73 | " return grads\n", 74 | "\n", 75 | " def update(self, grads, alpha):\n", 76 | " '''\n", 77 | " Performs the actual update step in gradient descent.\n", 78 | "\n", 79 | " grads : gradient of loss wrt weights\n", 80 | " alpha : learning rate\n", 81 | " '''\n", 82 | " self.w = self.w - alpha * grads['dw']\n", 83 | "\n", 84 | " def fit(self, X, Y, alpha=1e-2):\n", 85 | " '''\n", 86 | " Fits the model on the given dataset.\n", 87 | "\n", 88 | " X: data samples\n", 89 | " Y: binary labels (1 or 0)\n", 90 | " alpha: step size / learning rate\n", 91 | " '''\n", 92 | "\n", 93 | " # reset the parameters for every call to fit\n", 94 | " self.w = np.random.rand(X[0].shape[-1]) # get the number of features per sample\n", 95 | "\n", 96 | " # perform the N iterations of learning\n", 97 | " for i in range(self.iterations):\n", 98 | " # forward pass\n", 99 | " YHAT = np.dot(X, self.w)\n", 100 | " loss = self.hinge_loss(YHAT, Y)\n", 101 | "\n", 102 | " if i % 20 == 0:\n", 103 | " print (\"Iteration: {} | Loss: {}\".format(i, loss))\n", 104 | "\n", 105 | " # backward pass\n", 106 | " grads = self.gradient_descent(X, Y, loss) # calculate gradient wrt parameters\n", 107 | " self.update(grads, alpha) # optimise the parameters\n", 108 | " \n", 109 | " def predict(self, X):\n", 110 | " # simply compute forward pass\n", 111 | " return np.dot(X, self.w)\n", 112 | "\n", 113 | " def evaluate(self, X_test, Y_test):\n", 114 | " '''\n", 115 | " Returns the accuracy of the model.\n", 116 | " '''\n", 117 | " pred = self.predict(X_test)\n", 118 | "\n", 119 | " # anything negative gets label -1, anything positive gets label 1\n", 120 | " pred[pred < 0] = -1 \n", 121 | " pred[pred >= 0] = 1\n", 122 | " correct = 0\n", 123 | "\n", 124 | " for i in range(len(Y_test)):\n", 125 | " if pred[i] == Y_test[i]:\n", 126 | " correct += 1\n", 127 | "\n", 128 | " return correct / len(Y_test) # get final accuracy based on number of correct samples" 129 | ], 130 | "execution_count": 46, 131 | "outputs": [] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "metadata": { 136 | "id": "vKilb4EqQq6K" 137 | }, 138 | "source": [ 139 | "from sklearn.model_selection import train_test_split\n", 140 | "\n", 141 | "X, Y = datasets.load_breast_cancer(return_X_y=True)\n", 142 | "Y[Y == 0] = -1 # switch labels from [0, 1] to [-1, 1]\n", 143 | "\n", 144 | "X_train, X_test, Y_train, Y_test = train_test_split(X, Y)" 145 | ], 146 | "execution_count": 50, 147 | "outputs": [] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "metadata": { 152 | "colab": { 153 | "base_uri": "https://localhost:8080/" 154 | }, 155 | "id": "YywW366QQ6L8", 156 | "outputId": "720d4170-9a26-4c51-959f-7ab81b044bdf" 157 | }, 158 | "source": [ 159 | "model = SupperVectorClassifier()\n", 160 | "model.fit(X_train, Y_train)" 161 | ], 162 | "execution_count": 52, 163 | "outputs": [ 164 | { 165 | "output_type": "stream", 166 | "name": "stdout", 167 | "text": [ 168 | "Iteration: 0 | Loss: 416.2350522151881\n", 169 | "Iteration: 20 | Loss: 634.7471121710325\n", 170 | "Iteration: 40 | Loss: 1377.959569044146\n", 171 | "Iteration: 60 | Loss: 337.2871462020824\n", 172 | "Iteration: 80 | Loss: 174.50232848883326\n", 173 | "Iteration: 100 | Loss: 144.48051916551537\n", 174 | "Iteration: 120 | Loss: 144.97272350980242\n", 175 | "Iteration: 140 | Loss: 157.49100221912383\n", 176 | "Iteration: 160 | Loss: 188.68354119350255\n", 177 | "Iteration: 180 | Loss: 198.83279680794266\n", 178 | "Iteration: 200 | Loss: 201.37397024816923\n", 179 | "Iteration: 220 | Loss: 209.18699964584258\n", 180 | "Iteration: 240 | Loss: 211.24550460677744\n", 181 | "Iteration: 260 | Loss: 224.75921771134932\n", 182 | "Iteration: 280 | Loss: 207.41067721227247\n", 183 | "Iteration: 300 | Loss: 202.9874547965538\n", 184 | "Iteration: 320 | Loss: 233.5225806996785\n", 185 | "Iteration: 340 | Loss: 209.41894810505434\n", 186 | "Iteration: 360 | Loss: 227.86193173406267\n", 187 | "Iteration: 380 | Loss: 220.0275230523279\n", 188 | "Iteration: 400 | Loss: 209.93813706106957\n", 189 | "Iteration: 420 | Loss: 224.69843019249313\n", 190 | "Iteration: 440 | Loss: 207.14690567298172\n", 191 | "Iteration: 460 | Loss: 222.13044724748374\n", 192 | "Iteration: 480 | Loss: 206.75958921219885\n", 193 | "Iteration: 500 | Loss: 224.10527985394174\n", 194 | "Iteration: 520 | Loss: 212.29554595340238\n", 195 | "Iteration: 540 | Loss: 222.9296852741817\n", 196 | "Iteration: 560 | Loss: 211.17164394816842\n", 197 | "Iteration: 580 | Loss: 221.71447794927437\n", 198 | "Iteration: 600 | Loss: 219.52250985351424\n", 199 | "Iteration: 620 | Loss: 212.20466048180376\n", 200 | "Iteration: 640 | Loss: 222.5975165742218\n", 201 | "Iteration: 660 | Loss: 207.1530791563107\n", 202 | "Iteration: 680 | Loss: 224.41752826545314\n", 203 | "Iteration: 700 | Loss: 212.5541881669893\n", 204 | "Iteration: 720 | Loss: 207.96840920838758\n", 205 | "Iteration: 740 | Loss: 202.79349339026604\n", 206 | "Iteration: 760 | Loss: 224.18299606836823\n", 207 | "Iteration: 780 | Loss: 223.15410816749386\n", 208 | "Iteration: 800 | Loss: 227.64698698681198\n", 209 | "Iteration: 820 | Loss: 213.35496804814935\n", 210 | "Iteration: 840 | Loss: 223.82125080471178\n", 211 | "Iteration: 860 | Loss: 211.88036740995827\n", 212 | "Iteration: 880 | Loss: 203.0702406461759\n", 213 | "Iteration: 900 | Loss: 224.43132254807512\n", 214 | "Iteration: 920 | Loss: 223.35580653357326\n", 215 | "Iteration: 940 | Loss: 227.815553049764\n", 216 | "Iteration: 960 | Loss: 219.9270821314675\n", 217 | "Iteration: 980 | Loss: 212.53249397223584\n" 218 | ] 219 | } 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "metadata": { 225 | "colab": { 226 | "base_uri": "https://localhost:8080/" 227 | }, 228 | "id": "l5deYnKJ_QC2", 229 | "outputId": "08d084e3-62e1-4170-e129-494051d5526e" 230 | }, 231 | "source": [ 232 | "acc = model.evaluate(X_test, Y_test)\n", 233 | "print (\"SVM is {:.3f}% accurate.\".format(acc * 100))" 234 | ], 235 | "execution_count": 53, 236 | "outputs": [ 237 | { 238 | "output_type": "stream", 239 | "name": "stdout", 240 | "text": [ 241 | "SVM is 83.916% accurate.\n" 242 | ] 243 | } 244 | ] 245 | } 246 | ] 247 | } -------------------------------------------------------------------------------- /T3W5/README.md: -------------------------------------------------------------------------------- 1 | # Tutorial 3 Week 5: Linear Models 2 | 3 | In T3W5, I cover the Linear Models. Find the tutorial slides [here](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1). 4 | 5 | This repo contains Python implementations of `LinearRegressionClassifier`, `LogisticRegressionClassifier`, and `SupportVectorClassifier`. You can call them in a similar fashion to related models from `sklearn`. I train them on the popular *Iris Type Classification Dataset* found in `data/iris.csv`, as well as the *Breast Cancer Classification Dataset* from `sklearn.datasets`. 6 | 7 | > You can find the SVM implementation in `Intro_to_Support_Vector_Machines.ipynb`. It has some more in-depth comments inside. 8 | 9 | ## Contents 10 | This repo contains the code used to answer Questions 1, 2, and 5. 11 | 12 | ### Question 1 13 | You cannot use **Mean Squared Error**. 14 | 15 | MSE is mainly used in the case of regression problems, not classification tasks (which is when Logistic Regression is used). 16 | 17 | - `Accuracy` shows us how "good" our model is on unseen data 18 | - `AUC-ROC` shows us the model's ability to tell apart positive and negative instances 19 | - `Log Loss` is used as the cost function for Logistic Regression. The aim is to minimise this over training. 20 | 21 | --- 22 | 23 | ### Question 2 24 | The **Normal Equation** from the lecture: 25 | 26 | ```bash 27 | ø = (1/(X.T * X)) * X.T * Y 28 | = [4 -5.5 -7 7].T 29 | 30 | y_hat = 4 - 5.5x1 - 7x2 + 7x3 31 | ``` 32 | 33 | --- 34 | 35 | ### Question 3 36 | > Check the slides for annotated solutions for all the equations here. 37 | 38 | --- 39 | 40 | ### Question 4 41 | > Check the slides for diagrams and answers to these questions. 42 | 43 | --- 44 | 45 | ### Question 5a 46 | A symmetric matrix is one that's equal to its transpose. 47 | 48 | > Look at the slides for an annotated proof. 49 | 50 | ### Question 5b 51 | You have to use induction for this. Consider the base case and then move on to the inductive step. 52 | 53 | > Look at the slides for an annotated proof. 54 | 55 | ### Question 5c 56 | Similar to `5b`, we have to use induction. We use the concept of **Idempotency** again. 57 | 58 | > Look at the slides for an annotated proof. 59 | 60 | ### Question 5d 61 | We know that `trace(AB) = trace(BA)`. For symmetric and idempotent matrices, `rank(A) = trace(A)`. 62 | 63 | > Look at the slides for an annotated proof. -------------------------------------------------------------------------------- /T3W5/data/iris.csv: -------------------------------------------------------------------------------- 1 | sepal_length, sepal_width, petal_length, petal_width, class 2 | 5.1,3.5,1.4,0.2,Iris-setosa 3 | 4.9,3.0,1.4,0.2,Iris-setosa 4 | 4.7,3.2,1.3,0.2,Iris-setosa 5 | 4.6,3.1,1.5,0.2,Iris-setosa 6 | 5.0,3.6,1.4,0.2,Iris-setosa 7 | 5.4,3.9,1.7,0.4,Iris-setosa 8 | 4.6,3.4,1.4,0.3,Iris-setosa 9 | 5.0,3.4,1.5,0.2,Iris-setosa 10 | 4.4,2.9,1.4,0.2,Iris-setosa 11 | 4.9,3.1,1.5,0.1,Iris-setosa 12 | 5.4,3.7,1.5,0.2,Iris-setosa 13 | 4.8,3.4,1.6,0.2,Iris-setosa 14 | 4.8,3.0,1.4,0.1,Iris-setosa 15 | 4.3,3.0,1.1,0.1,Iris-setosa 16 | 5.8,4.0,1.2,0.2,Iris-setosa 17 | 5.7,4.4,1.5,0.4,Iris-setosa 18 | 5.4,3.9,1.3,0.4,Iris-setosa 19 | 5.1,3.5,1.4,0.3,Iris-setosa 20 | 5.7,3.8,1.7,0.3,Iris-setosa 21 | 5.1,3.8,1.5,0.3,Iris-setosa 22 | 5.4,3.4,1.7,0.2,Iris-setosa 23 | 5.1,3.7,1.5,0.4,Iris-setosa 24 | 4.6,3.6,1.0,0.2,Iris-setosa 25 | 5.1,3.3,1.7,0.5,Iris-setosa 26 | 4.8,3.4,1.9,0.2,Iris-setosa 27 | 5.0,3.0,1.6,0.2,Iris-setosa 28 | 5.0,3.4,1.6,0.4,Iris-setosa 29 | 5.2,3.5,1.5,0.2,Iris-setosa 30 | 5.2,3.4,1.4,0.2,Iris-setosa 31 | 4.7,3.2,1.6,0.2,Iris-setosa 32 | 4.8,3.1,1.6,0.2,Iris-setosa 33 | 5.4,3.4,1.5,0.4,Iris-setosa 34 | 5.2,4.1,1.5,0.1,Iris-setosa 35 | 5.5,4.2,1.4,0.2,Iris-setosa 36 | 4.9,3.1,1.5,0.1,Iris-setosa 37 | 5.0,3.2,1.2,0.2,Iris-setosa 38 | 5.5,3.5,1.3,0.2,Iris-setosa 39 | 4.9,3.1,1.5,0.1,Iris-setosa 40 | 4.4,3.0,1.3,0.2,Iris-setosa 41 | 5.1,3.4,1.5,0.2,Iris-setosa 42 | 5.0,3.5,1.3,0.3,Iris-setosa 43 | 4.5,2.3,1.3,0.3,Iris-setosa 44 | 4.4,3.2,1.3,0.2,Iris-setosa 45 | 5.0,3.5,1.6,0.6,Iris-setosa 46 | 5.1,3.8,1.9,0.4,Iris-setosa 47 | 4.8,3.0,1.4,0.3,Iris-setosa 48 | 5.1,3.8,1.6,0.2,Iris-setosa 49 | 4.6,3.2,1.4,0.2,Iris-setosa 50 | 5.3,3.7,1.5,0.2,Iris-setosa 51 | 5.0,3.3,1.4,0.2,Iris-setosa 52 | 7.0,3.2,4.7,1.4,Iris-versicolor 53 | 6.4,3.2,4.5,1.5,Iris-versicolor 54 | 6.9,3.1,4.9,1.5,Iris-versicolor 55 | 5.5,2.3,4.0,1.3,Iris-versicolor 56 | 6.5,2.8,4.6,1.5,Iris-versicolor 57 | 5.7,2.8,4.5,1.3,Iris-versicolor 58 | 6.3,3.3,4.7,1.6,Iris-versicolor 59 | 4.9,2.4,3.3,1.0,Iris-versicolor 60 | 6.6,2.9,4.6,1.3,Iris-versicolor 61 | 5.2,2.7,3.9,1.4,Iris-versicolor 62 | 5.0,2.0,3.5,1.0,Iris-versicolor 63 | 5.9,3.0,4.2,1.5,Iris-versicolor 64 | 6.0,2.2,4.0,1.0,Iris-versicolor 65 | 6.1,2.9,4.7,1.4,Iris-versicolor 66 | 5.6,2.9,3.6,1.3,Iris-versicolor 67 | 6.7,3.1,4.4,1.4,Iris-versicolor 68 | 5.6,3.0,4.5,1.5,Iris-versicolor 69 | 5.8,2.7,4.1,1.0,Iris-versicolor 70 | 6.2,2.2,4.5,1.5,Iris-versicolor 71 | 5.6,2.5,3.9,1.1,Iris-versicolor 72 | 5.9,3.2,4.8,1.8,Iris-versicolor 73 | 6.1,2.8,4.0,1.3,Iris-versicolor 74 | 6.3,2.5,4.9,1.5,Iris-versicolor 75 | 6.1,2.8,4.7,1.2,Iris-versicolor 76 | 6.4,2.9,4.3,1.3,Iris-versicolor 77 | 6.6,3.0,4.4,1.4,Iris-versicolor 78 | 6.8,2.8,4.8,1.4,Iris-versicolor 79 | 6.7,3.0,5.0,1.7,Iris-versicolor 80 | 6.0,2.9,4.5,1.5,Iris-versicolor 81 | 5.7,2.6,3.5,1.0,Iris-versicolor 82 | 5.5,2.4,3.8,1.1,Iris-versicolor 83 | 5.5,2.4,3.7,1.0,Iris-versicolor 84 | 5.8,2.7,3.9,1.2,Iris-versicolor 85 | 6.0,2.7,5.1,1.6,Iris-versicolor 86 | 5.4,3.0,4.5,1.5,Iris-versicolor 87 | 6.0,3.4,4.5,1.6,Iris-versicolor 88 | 6.7,3.1,4.7,1.5,Iris-versicolor 89 | 6.3,2.3,4.4,1.3,Iris-versicolor 90 | 5.6,3.0,4.1,1.3,Iris-versicolor 91 | 5.5,2.5,4.0,1.3,Iris-versicolor 92 | 5.5,2.6,4.4,1.2,Iris-versicolor 93 | 6.1,3.0,4.6,1.4,Iris-versicolor 94 | 5.8,2.6,4.0,1.2,Iris-versicolor 95 | 5.0,2.3,3.3,1.0,Iris-versicolor 96 | 5.6,2.7,4.2,1.3,Iris-versicolor 97 | 5.7,3.0,4.2,1.2,Iris-versicolor 98 | 5.7,2.9,4.2,1.3,Iris-versicolor 99 | 6.2,2.9,4.3,1.3,Iris-versicolor 100 | 5.1,2.5,3.0,1.1,Iris-versicolor 101 | 5.7,2.8,4.1,1.3,Iris-versicolor 102 | 6.3,3.3,6.0,2.5,Iris-virginica 103 | 5.8,2.7,5.1,1.9,Iris-virginica 104 | 7.1,3.0,5.9,2.1,Iris-virginica 105 | 6.3,2.9,5.6,1.8,Iris-virginica 106 | 6.5,3.0,5.8,2.2,Iris-virginica 107 | 7.6,3.0,6.6,2.1,Iris-virginica 108 | 4.9,2.5,4.5,1.7,Iris-virginica 109 | 7.3,2.9,6.3,1.8,Iris-virginica 110 | 6.7,2.5,5.8,1.8,Iris-virginica 111 | 7.2,3.6,6.1,2.5,Iris-virginica 112 | 6.5,3.2,5.1,2.0,Iris-virginica 113 | 6.4,2.7,5.3,1.9,Iris-virginica 114 | 6.8,3.0,5.5,2.1,Iris-virginica 115 | 5.7,2.5,5.0,2.0,Iris-virginica 116 | 5.8,2.8,5.1,2.4,Iris-virginica 117 | 6.4,3.2,5.3,2.3,Iris-virginica 118 | 6.5,3.0,5.5,1.8,Iris-virginica 119 | 7.7,3.8,6.7,2.2,Iris-virginica 120 | 7.7,2.6,6.9,2.3,Iris-virginica 121 | 6.0,2.2,5.0,1.5,Iris-virginica 122 | 6.9,3.2,5.7,2.3,Iris-virginica 123 | 5.6,2.8,4.9,2.0,Iris-virginica 124 | 7.7,2.8,6.7,2.0,Iris-virginica 125 | 6.3,2.7,4.9,1.8,Iris-virginica 126 | 6.7,3.3,5.7,2.1,Iris-virginica 127 | 7.2,3.2,6.0,1.8,Iris-virginica 128 | 6.2,2.8,4.8,1.8,Iris-virginica 129 | 6.1,3.0,4.9,1.8,Iris-virginica 130 | 6.4,2.8,5.6,2.1,Iris-virginica 131 | 7.2,3.0,5.8,1.6,Iris-virginica 132 | 7.4,2.8,6.1,1.9,Iris-virginica 133 | 7.9,3.8,6.4,2.0,Iris-virginica 134 | 6.4,2.8,5.6,2.2,Iris-virginica 135 | 6.3,2.8,5.1,1.5,Iris-virginica 136 | 6.1,2.6,5.6,1.4,Iris-virginica 137 | 7.7,3.0,6.1,2.3,Iris-virginica 138 | 6.3,3.4,5.6,2.4,Iris-virginica 139 | 6.4,3.1,5.5,1.8,Iris-virginica 140 | 6.0,3.0,4.8,1.8,Iris-virginica 141 | 6.9,3.1,5.4,2.1,Iris-virginica 142 | 6.7,3.1,5.6,2.4,Iris-virginica 143 | 6.9,3.1,5.1,2.3,Iris-virginica 144 | 5.8,2.7,5.1,1.9,Iris-virginica 145 | 6.8,3.2,5.9,2.3,Iris-virginica 146 | 6.7,3.3,5.7,2.5,Iris-virginica 147 | 6.7,3.0,5.2,2.3,Iris-virginica 148 | 6.3,2.5,5.0,1.9,Iris-virginica 149 | 6.5,3.0,5.2,2.0,Iris-virginica 150 | 6.2,3.4,5.4,2.3,Iris-virginica 151 | 5.9,3.0,5.1,1.8,Iris-virginica -------------------------------------------------------------------------------- /T3W5/data/iris.names: -------------------------------------------------------------------------------- 1 | 1. Title: Iris Plants Database 2 | Updated Sept 21 by C.Blake - Added discrepency information 3 | 4 | 2. Sources: 5 | (a) Creator: R.A. Fisher 6 | (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov) 7 | (c) Date: July, 1988 8 | 9 | 3. Past Usage: 10 | - Publications: too many to mention!!! Here are a few. 11 | 1. Fisher,R.A. "The use of multiple measurements in taxonomic problems" 12 | Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions 13 | to Mathematical Statistics" (John Wiley, NY, 1950). 14 | 2. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis. 15 | (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218. 16 | 3. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System 17 | Structure and Classification Rule for Recognition in Partially Exposed 18 | Environments". IEEE Transactions on Pattern Analysis and Machine 19 | Intelligence, Vol. PAMI-2, No. 1, 67-71. 20 | -- Results: 21 | -- very low misclassification rates (0% for the setosa class) 22 | 4. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE 23 | Transactions on Information Theory, May 1972, 431-433. 24 | -- Results: 25 | -- very low misclassification rates again 26 | 5. See also: 1988 MLC Proceedings, 54-64. Cheeseman et al's AUTOCLASS II 27 | conceptual clustering system finds 3 classes in the data. 28 | 29 | 4. Relevant Information: 30 | --- This is perhaps the best known database to be found in the pattern 31 | recognition literature. Fisher's paper is a classic in the field 32 | and is referenced frequently to this day. (See Duda & Hart, for 33 | example.) The data set contains 3 classes of 50 instances each, 34 | where each class refers to a type of iris plant. One class is 35 | linearly separable from the other 2; the latter are NOT linearly 36 | separable from each other. 37 | --- Predicted attribute: class of iris plant. 38 | --- This is an exceedingly simple domain. 39 | --- This data differs from the data presented in Fishers article 40 | (identified by Steve Chadwick, spchadwick@espeedaz.net ) 41 | The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa" 42 | where the error is in the fourth feature. 43 | The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa" 44 | where the errors are in the second and third features. 45 | 46 | 5. Number of Instances: 150 (50 in each of three classes) 47 | 48 | 6. Number of Attributes: 4 numeric, predictive attributes and the class 49 | 50 | 7. Attribute Information: 51 | 1. sepal length in cm 52 | 2. sepal width in cm 53 | 3. petal length in cm 54 | 4. petal width in cm 55 | 5. class: 56 | -- Iris Setosa 57 | -- Iris Versicolour 58 | -- Iris Virginica 59 | 60 | 8. Missing Attribute Values: None 61 | 62 | Summary Statistics: 63 | Min Max Mean SD Class Correlation 64 | sepal length: 4.3 7.9 5.84 0.83 0.7826 65 | sepal width: 2.0 4.4 3.05 0.43 -0.4194 66 | petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) 67 | petal width: 0.1 2.5 1.20 0.76 0.9565 (high!) 68 | 69 | 9. Class Distribution: 33.3% for each of 3 classes. 70 | -------------------------------------------------------------------------------- /T3W5/linreg.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import matplotlib.pyplot as plt 4 | 5 | class LinearRegressionClassifier(object): 6 | def __init__(self): 7 | self.alpha = 1e-2 8 | self.iterations = 1000 9 | self.losses = [] 10 | self.weights = None 11 | self.bias = None 12 | 13 | def forward(self, x): 14 | return np.dot(x, self.weights) + self.bias 15 | 16 | def backward(self, x, y_hat, y): 17 | m, d = x.shape 18 | y_hat = y_hat.reshape([m]) 19 | y = y.reshape([m]) 20 | 21 | partial_w = (1 / x.shape[0]) * (2 * np.dot(x.T, (y_hat - y))) 22 | partial_b = (1 / x.shape[0]) * (2 * np.sum(y_hat - y)) 23 | 24 | return [partial_w, partial_b] 25 | 26 | def MSELoss(self, y_hat, y): 27 | return (1/y.shape[0]) * np.sum(np.square(y_hat - y)) 28 | 29 | def update(self, grad): 30 | self.weights = self.weights - (self.alpha * grad[0]) 31 | self.bias = self.bias - (self.alpha * grad[1]) 32 | 33 | def fit(self, x, y): 34 | self.weights = np.random.uniform(0, 1, x.shape[1]) 35 | self.bias = np.random.uniform(0, 1, 1) 36 | self.losses = [] 37 | 38 | for i in range(self.iterations): 39 | y_hat = self.forward(x) 40 | 41 | loss = self.MSELoss(y_hat, y) 42 | self.losses.append(loss) 43 | 44 | grad = self.backward(x, y_hat, y) 45 | 46 | self.update(grad) 47 | 48 | def predict(self, x): 49 | return x 50 | 51 | def plot(self): 52 | plt.plot(range(self.iterations), self.losses, color="red") 53 | plt.title("Loss on Iris Dataset for {} iterations".format(self.iterations)) 54 | plt.xlabel("Iteration") 55 | plt.ylabel("Loss") 56 | plt.show() 57 | 58 | cols = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'] 59 | df = pd.read_csv('./data/iris.csv', skiprows=1, header=0, names=cols) 60 | 61 | # replace class strings with integer indices 62 | df['class'] = df['class'].str.replace('Iris-setosa', '0') 63 | df['class'] = df['class'].str.replace('Iris-versicolor', '1') 64 | df['class'] = df['class'].str.replace('Iris-virginica', '2') 65 | df['class'] = df['class'].map(lambda x : int(x)) 66 | 67 | X = df.iloc[:, :-1].values 68 | Y = df.iloc[:, -1].values.reshape(-1, 1) 69 | X = np.array(X) 70 | Y = np.array(Y) 71 | 72 | linreg = LinearRegressionClassifier() 73 | linreg.fit(X, Y) 74 | linreg.plot() -------------------------------------------------------------------------------- /T4aW6/README.md: -------------------------------------------------------------------------------- 1 | # Tutorial 4 Week 6: Bias Variance Tradeoff 2 | 3 | In T4W6, I cover Bias-Variance Tradeoff. Find the tutorial slides [here](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1). 4 | 5 | > This tutorial was pretty difficult. I've attached a `FAQ.pdf` file that seeks to clarify certain details on this week's topics. 6 | 7 | ## Contents 8 | This repo contains answers for Questions 1 and 2. 9 | 10 | ### Question 1a 11 | Number of data points: Yes 12 | Amount of Noise: No 13 | Complexity of Target: No 14 | 15 | ### Question 1b 16 | Determinstic noise will increase as it gets harder for `H` to model `f`. Stochastic noise remains the same as it is independent of `H` and `f`. There is a greater chance of overfitting. 17 | 18 | ### Question 1c 19 | Determinstic noise will decrease as it gets eaier for `H` to model `f`. Stochastic noise remains the same as it is independent of `H` and `f`. There is a greater chance of overfitting. 20 | 21 | --- 22 | 23 | ### Question 2a 24 | Each blue point is the average training accuracy for an arbitrary value of `C`. It's the average of all the `10` accuracies for the 10-FCV. 25 | 26 | Each green point is the average validation accuracy for an arbitrary value of `C`. It's the average of all the `10` validation accuracies for the 10-FCV. 27 | 28 | ### Question 2b 29 | Each blue region represents the **variance** of the training accuracy for a value of `C`. It is calculated by getting the variance of all `10` accuracies for the 10-FCV. 30 | 31 | Similarly, the green region is the **variance** of the validation accuracy for a value of `C`. It's the variance of all `10` accuracies for the 10-FCV. 32 | 33 | ### Question 2c 34 | The best validation accuracy is reached when `C = 1`. 35 | 36 | > High training accuracy DOES NOT indicate high validation/testing accuracies. Always perform your train-test process to see if the model has generalised well to the unseen data before doing anything with the model (like deploying to production or using it IRL). 37 | 38 | --- 39 | 40 | ### Question 3a 41 | 42 | > The annotated proofs for this question can be found on the slides. 43 | 44 | ### Question 3b 45 | 1. Smaller `k` values with everything else constant, will increase the variance. 46 | 2. As `k` increases, bias increases. As the number of instances increases, we will be considering points further away from `x0` (closeness decreases) and the resulting predictions will move away from `f(x0)`. 47 | 48 | > Bias = Closeness to Truth 49 | 50 | --- 51 | 52 | ### Question 4 53 | 54 | > The annotated proofs for this question can be found on the slides. -------------------------------------------------------------------------------- /T4aW6/Tutorial_4_FAQ.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rish-16/CS3244-Tutorial-Material/7681d6ad96354c3d672e5ef526dfd16f03eb5d0c/T4aW6/Tutorial_4_FAQ.pdf -------------------------------------------------------------------------------- /T4bW7/README.md: -------------------------------------------------------------------------------- 1 | # Tutorial 4b Week 7: Regularisation and Validation 2 | 3 | > Hope you had a productive Recess Week! Let's try getting that 'A' for midterms :D 4 | 5 | In T4bW7, I cover the Regularisation and Validation. Find the tutorial slides [here](https://docs.google.com/presentation/d/1eE1In5ZS19YKgN3DN9VjNhBavHQoMaKB9NjZ-hreTG0/edit?usp=sharing). 6 | 7 | ## Contents 8 | This repo contains the code used to answer Questions 1, 2, and 3. 9 | 10 | --- 11 | 12 | ### Question 1a 13 | Training time : `m^2 * log(m)` 14 | 15 | In **LOO-CV**, one fold is one sample. There are `m-1` training samples and `1` testing sample, each performed `m` times for each sample. That means every sample gets its chance of being the testing sample. This is for _one_ model. 16 | 17 | Number of models: `30`
18 | Number of training samples: `m-1`
19 | Number of testing samples: `1`
20 |
21 | Total time: `30 * m * (m-1)^2 * log(m-1)` 22 | 23 | ### Question 1b 24 | 25 | In **10-FCV**, each fold has `m/10` samples inside. There are 9 training folds and 1 testing fold. Each `m/10`-sized fold gets its chance of being the testing fold. This is for _one_ model. 26 | 27 | Number of models: `30`
28 | Number of training samples: `9`
29 | Number of testing samples: `1`
30 | Training time for entire dataset of `m` samples : `m^2 * log(m)`
31 |
32 | Total time: `[30 * 10 * (9m/10)^2 * log(9m/10)] + [m^2 * log(m)]` 33 | 34 | --- 35 | 36 | ## How to read contour plots 37 | Before we get into Question 2, let's understand the figures given to us. 38 |

39 | The ellipses are contour plots that represent the altitudes of the function. Think of it as the graph surface coming out of the paper in 3 dimensions (like a volcano on paper). The lower the number next to a circle, the lower the altitude, and vice versa. 40 | 41 | 1. Find the minimum value of `Reg. Penalty + MSE term` 42 | 2. Return the corresponding values of `(Theta0, Theta1)` 43 | 44 | > It's OKAY to guess here! The values are rough _guesstimates_. Just eyeball it. 45 | 46 | ### Question 2a 47 | No regularisation means we only look at the MSE term. Find the values of `(Theta0, Theta1)` such that the value of the MSE term is minimum. This occurs at the circle at altitude `0.2` on either graph. The center of that circle corresponds to `Theta0 = ~0.9` and `Theta1 = 0.5`. It's alright if the value fluctuates `± 0.5` from the correct answer. 48 | 49 | ### Question 2b 50 | Look at graph 1. There are possible sums to consider: 51 | 52 | 1. Minimum MSE + Flexible Reg Penalty = `0.1 + 4.0 = 4.1` (NOPE) 53 | 2. Flexible MSE + Minimum Reg Penalty = `0.4 + 5.0 = 5.4` (NOPE) 54 | 3. Middle ground = `0.5 + 2.6 = 3.1` (CORRECT) 55 | 56 | > The minimum sum corresponds to the pair `(0.2, 0.25)` 57 | 58 | ### Question 2c 59 | Look at graph 2. There are possible sums to consider: 60 | 61 | 1. Minimum MSE + Flexible Reg Penalty = `0.1 + 9.0 = 9.1` (NOPE) 62 | 2. Middle ground = `2.5 + 2.2 = 4.7` (NOPE) 63 | 3. Flexible MSE + Minimum Reg Penalty = `0.0 + 4.4 = 4.4` (CORRECT) 64 | 65 | > The minimum sum corresponds to the pair `(0.0, 0.1)` 66 | 67 | --- 68 | 69 | ### Question 3a 70 | Time Series data is dependent on time. Breaking that natural order of time makes your data worthless. The best possible scenario is breaking the dataset without abrupt breakages in between. For example, you can store the past few days worth of temporal data points for training, and the future points for testing. 71 | 72 | > The value is in the time. Respect it. 73 | 74 | ### Question 3b 75 | Break your data into training, validation, and testing without switching the order of the samples or shuffling them. For example, suppose we have the following dataset with time going from `T1` to `T20`: 76 | 77 | ``` 78 | Dataset = [T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20] 79 | 80 | Training = [T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11] 81 | Validation = [T12, T13, T14, T15] 82 | Testing = [T16, T17, T18, T19, T20] 83 | ``` 84 | 85 | > Again, please respect the time component for temporal data. 86 | 87 | ### Question 3c 88 | Take adjacent pairs of data points for training and validation. 89 | 90 | ``` 91 | Dataset = [1, 2, 3, 4] 92 | 93 | Training = [1] | Validation = [2] 94 | Training = [2] | Validation = [3] 95 | Training = [3] | Validation = [4] 96 | ``` 97 | 98 | There are less-preferred alternatives: 99 | 100 | 1. `Training = [1] | Validation = [3]` -> decent model 101 | 2. `Training = [1, 2] | Validation = [3]` -> predicting too far into the future after limited training 102 | 3. `Training = [1, 2, 3] | Validation = [4]` -> better model but can't really compare to training fold in **1.** 103 | 104 | > The key is to break the dataset into comparable folds for training and testing that result in models that are not too different from one another. -------------------------------------------------------------------------------- /T5W8/README.md: -------------------------------------------------------------------------------- 1 | # Tutorial 5 Week 8: Evaluation Metrics 2 | 3 | In T5W8, I cover Evaluation Metrics. Find the tutorial slides [here](https://docs.google.com/presentation/d/19QigHWaB3GTnhyNfWkbXYpcfvUT2wPq1Zi2ft40GATM/edit?usp=sharing). 4 | 5 | > This chapter is very important to you as ML practitioner. It gives us tools to analyse how our model is doing after training. These methods give us an indication of which direction to head in when stuck. 6 | 7 | ## Contents 8 | This repo contains answers for Questions 1 and 2. 9 | 10 | ### Question 1a 11 | | Sample | Prediction | Label | 12 | |--------|------------|---------| 13 | | x1 | 0 (NEG) | 0 (NEG) | 14 | | x2 | 0 (NEG) | 1 (POS) | 15 | | x3 | 0 (NEG) | 1 (POS) | 16 | | x4 | 0 (NEG) | 0 (NEG) | 17 | | x5 | 0 (NEG) | 0 (NEG) | 18 | | x6 | 1 (POS) | 1 (POS) | 19 | | x7 | 1 (POS) | 1 (POS) | 20 | | x8 | 1 (POS) | 0 (NEG) | 21 | | x9 | 1 (POS) | 1 (POS) | 22 | | x10 | 1 (POS) | 1 (POS) | 23 | 24 | | Submetric | (Pred, Actual) | Score | 25 | |-----------|----------------|-------| 26 | | TP | (POS, POS) | 4 | 27 | | FP | (POS, NEG) | 1 | 28 | | TN | (NEG, NEG) | 3 | 29 | | FN | (NEG, POS) | 2 | 30 | 31 | ``` 32 | Precision = TP / (TP + FP) = 4/5 = 0.8 33 | 34 | Recall = TP / (TP + FN) = 4/6 = 0.67 35 | 36 | F1 = 2/(1/P + 1/R) = 2/(1/0.67 + 1/0.8) = 0.73 37 | ``` 38 | 39 | ### Question 1b 40 | Brute force method of calculating F1 scores for all model outputs as thresholds will take `O(m^2)`.

41 | 42 | 1. Sort all samples – `O(m logm)` 43 | 2. For the first threshold, find TP, FN, FP, FN and calculate F1 Score – `O(m)` 44 | 3. Next threshold will take `O(1)` since we can change the 4 values in 2. 45 | 4. After the first computation, it'll take `O(m-1) ~ O(m)` for remaining `m-1` samples 46 | 47 | Total optimised run time is `O(m logm)`. 48 | 49 | ### Question 1c 50 | Here, the number of thresholds are increased beyond number of samples in the dataset. 51 | 52 | 1. Sort all samples – `O(m logm)` 53 | 2. If we pick a threshold between two samples (in sorted order), they'll give the same F1 score 54 | 3. This means there can only be `(m+1)` possible F1 scores to consider 55 | 4. We can binary search for the best F1 score peak – O(logm) 56 | 57 | --- 58 | 59 | ### Question 2a – Micro 60 | | _Dog_ | POS_act | NEG_act | 61 | |--------------|---------|---------| 62 | | **POS_pred** | 10 | 3 | 63 | | **NEG_pred** | 6 | 26 | 64 | 65 | | _Cat_ | POS_act | NEG_act | 66 | |--------------|---------|---------| 67 | | **POS_pred** | 13 | 5 | 68 | | **NEG_pred** | 6 | 21 | 69 | 70 | | _Pig_ | POS_act | NEG_act | 71 | |--------------|---------|---------| 72 | | **POS_pred** | 7 | 7 | 73 | | **NEG_pred** | 3 | 28 | 74 | 75 | ### Question 2b 76 | | _Combined_ | POS_act | NEG_act | 77 | |--------------|---------|---------| 78 | | **POS_pred** | 30 | 15 | 79 | | **NEG_pred** | 15 | 75 | 80 | 81 | ``` 82 | Accuracy_micro = (TP + TN) / (TP + TN + FP + FN) = (30 + 75)/(30 + 75 + 15 + 15) = 0.778 83 | 84 | Precision_micro = TP / (TP + FP) = 30 / (30 + 15) = 0.667 85 | 86 | Recall_micro = TP / (TP + FN) = 30 / (30 + 15) = 0.667 87 | 88 | F1_micro = 2/(1/P + 1/R) = 2/(1/0.667 + 1/0.667) = 0.667 89 | ``` 90 | 91 | ### Question 2c – Macro 92 | | _Dog_ | POS_act | NEG_act | 93 | |--------------|---------|---------| 94 | | **POS_pred** | 10 | 3 | 95 | | **NEG_pred** | 6 | 26 | 96 | 97 | Precision_Dog = 10 / (10 + 3) = 0.769
98 | Recall_Dog = 10 / (10 + 6) = 0.625 99 | 100 | | _Cat_ | POS_act | NEG_act | 101 | |--------------|---------|---------| 102 | | **POS_pred** | 13 | 5 | 103 | | **NEG_pred** | 6 | 21 | 104 | 105 | Precision_Cat = 13 / (13 + 5) = 0.722
106 | Recall_Cat = 13 / (13 + 6) = 0.684 107 | 108 | | _Pig_ | POS_act | NEG_act | 109 | |--------------|---------|---------| 110 | | **POS_pred** | 7 | 7 | 111 | | **NEG_pred** | 3 | 28 | 112 | 113 | Precision_Pig = 7 / (7 + 7) = 0.5
114 | Recall_Pig = 7 / (7 + 3) = 0.7 115 | 116 | ``` 117 | Precision_micro = (P_Dog + P_Cat + P_Pig) / 3 = 0.664 118 | Recall_micro = (R_Dog + R_Cat + R_Pig) / 3 = 0.664 119 | ``` 120 | 121 | ### Question 2d 122 | | Class | TP | FP | 123 | |-------|-----|-----| 124 | | A | 9 | 1 | 125 | | B | 100 | 900 | 126 | | C | 9 | 1 | 127 | | D | 9 | 1 | 128 | 129 | ``` 130 | Precision_micro = (TP1 + TP2 + TP3) / [(TP1 + FP1) + (TP2 + FP2) + (TP3 + FP3)] 131 | = (9 + 100 + 9 + 9) / (10 + 1000 + 10 + 10) 132 | = 0.137 133 | ``` 134 | 135 | ``` 136 | Precision_A = Precision_C = Precision_D = 9 / 10 = 0.9 137 | Precision_B = 100 / 1000 = 0.1 138 | 139 | Precision_macro = (P1 + P2 + P3) / 4 140 | = (0.9 + 0.1 + 0.9 + 0.9) / 4 141 | = 0.7 142 | ``` 143 | 144 | We can see that `Precision_macro` >>> `Precision_micro`.

145 | 146 | - The model has high Precision for classes A, C, and D, with low Precision for class B 147 | - `Precision_macro` takes the average of all individual Precision values, treating each class equally 148 | - It does not consider the heavy imbalance in class B 149 | - `Precision_macro` is relatively higher as a result 150 | - `Precision_micro` doesn't treat classes equally 151 | - The imbalances are factored into the calculation 152 | - Class B has low Precision and makes up the majority of the dataset 153 | - `Precision_micro` is relatively lower as a result 154 | -------------------------------------------------------------------------------- /T6W9/README.md: -------------------------------------------------------------------------------- 1 | # T6W9: Visualisation and Dimensionality Reduction 2 | 3 | > I was busy tending to personal commitments during this session. TA Pranavan covered for me. Please approach him with any questions on this topic. 4 | 5 | In T6W9, TA Pranavan covers visualiation and Dimensionality Reduction techniques like PCA, LDA, and SMOTE. Please approach him on Slack for the slides. -------------------------------------------------------------------------------- /T7W10/README.md: -------------------------------------------------------------------------------- 1 | # T7W10: Perceptrons and Neural Networks 2 | 3 | In T7W10, we cover Perceptrons, Multilayer Perceptrons, and Artificial Neural Networks (ANN). The content recap and tutorial solutions can be found on the slides [here](https://docs.google.com/presentation/d/1-nG_AElHlAuWQz0EsGt3K9WSjZVQ1OBVne2tjfZCdR0/edit?usp=sharing). 4 | 5 | ## Data Handling Clinic: Session 1 6 | I and TA Pranavan host the first session of DHC where we cover the basics and intermediate features of `numpy` and `pandas`. 7 | 8 | > The recording will *NOT* be made available in efforts to encourage live attendance and participation on the Zoom call. 9 | 10 | However, you can find the lesson material here: 11 | - [DHC Presentation Slides](https://tinyurl.com/3244-dhc-slides) 12 | - [DHC Mastercopy Colab Notebook](https://tinyurl.com/3244-dhc-mastercopy) 13 | - [DHC Student's Copy Colab Notebook](https://tinyurl.com/3244-dhc-stdnt) -------------------------------------------------------------------------------- /T8W11/README.md: -------------------------------------------------------------------------------- 1 | # T8W11: CNNs and RNNs 2 | 3 | In T8W11, we cover Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). The content recap and tutorial solutions can be found on the slides [here](https://docs.google.com/presentation/d/1GnExFaXQQlO7wnlnCExNzHbvegDlv6G0JSqgNZ0Jp-g/edit?usp=sharing). -------------------------------------------------------------------------------- /T9W12/README.md: -------------------------------------------------------------------------------- 1 | # T9W12: Explainable AI 2 | 3 | In T9W12, we cover Explainable AI. The content recap and tutorial solutions can be found on the slides [here](https://docs.google.com/presentation/d/1XRdBCLYpUGqMWIdfubslUbIkbvE8KFCrcKrfXBXDcA4/edit?usp=sharing). -------------------------------------------------------------------------------- /misc/CS3244_Midterm_Cheatsheet.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rish-16/CS3244-Tutorial-Material/7681d6ad96354c3d672e5ef526dfd16f03eb5d0c/misc/CS3244_Midterm_Cheatsheet.pdf --------------------------------------------------------------------------------