├── .github
└── ISSUE_TEMPLATE
│ └── question-request.md
├── .gitignore
├── LICENSE
├── README.md
├── T10W13
└── README.md
├── T1W3
├── README.md
├── alice_knn.py
├── assets
│ └── knn_bst.jpg
├── bob_knn.py
└── q1.py
├── T2W4
├── README.md
├── data
│ ├── iris.csv
│ └── iris.names
└── dtc.py
├── T3W5
├── Intro_to_Support_Vector_Machines.ipynb
├── README.md
├── data
│ ├── iris.csv
│ └── iris.names
└── linreg.py
├── T4aW6
├── README.md
└── Tutorial_4_FAQ.pdf
├── T4bW7
└── README.md
├── T5W8
└── README.md
├── T6W9
└── README.md
├── T7W10
└── README.md
├── T8W11
└── README.md
├── T9W12
└── README.md
└── misc
└── CS3244_Midterm_Cheatsheet.pdf
/.github/ISSUE_TEMPLATE/question-request.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Question Request
3 | about: Suggest questions you want answers to!
4 | title: Question Suggestions!
5 | labels: question
6 | assignees: rish-16
7 |
8 | ---
9 |
10 | **Drop your question(s) here.**
11 | Are Neural Networks better than Decision Trees?
12 |
13 | **What topic(s) is this question from?**
14 | Neural Networks, Decision Trees
15 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | pip-wheel-metadata/
24 | share/python-wheels/
25 | *.egg-info/
26 | .installed.cfg
27 | *.egg
28 | MANIFEST
29 |
30 | # PyInstaller
31 | # Usually these files are written by a python script from a template
32 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
33 | *.manifest
34 | *.spec
35 |
36 | # Installer logs
37 | pip-log.txt
38 | pip-delete-this-directory.txt
39 |
40 | # Unit test / coverage reports
41 | htmlcov/
42 | .tox/
43 | .nox/
44 | .coverage
45 | .coverage.*
46 | .cache
47 | nosetests.xml
48 | coverage.xml
49 | *.cover
50 | *.py,cover
51 | .hypothesis/
52 | .pytest_cache/
53 |
54 | # Translations
55 | *.mo
56 | *.pot
57 |
58 | # Django stuff:
59 | *.log
60 | local_settings.py
61 | db.sqlite3
62 | db.sqlite3-journal
63 |
64 | # Flask stuff:
65 | instance/
66 | .webassets-cache
67 |
68 | # Scrapy stuff:
69 | .scrapy
70 |
71 | # Sphinx documentation
72 | docs/_build/
73 |
74 | # PyBuilder
75 | target/
76 |
77 | # Jupyter Notebook
78 | .ipynb_checkpoints
79 |
80 | # IPython
81 | profile_default/
82 | ipython_config.py
83 |
84 | # pyenv
85 | .python-version
86 |
87 | # pipenv
88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
91 | # install all needed dependencies.
92 | #Pipfile.lock
93 |
94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
95 | __pypackages__/
96 |
97 | # Celery stuff
98 | celerybeat-schedule
99 | celerybeat.pid
100 |
101 | # SageMath parsed files
102 | *.sage.py
103 |
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 |
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 |
117 | # Rope project settings
118 | .ropeproject
119 |
120 | # mkdocs documentation
121 | /site
122 |
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 |
128 | # Pyre type checker
129 | .pyre/
130 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2021 Rishabh Anand
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # CS3244-Tutorial-Material
2 | All supplementary material used by me while TA-ing **[CS3244: Machine Learning](https://nusmods.com/modules/CS3244/machine-learning)** at NUS School of Computing.
3 |
4 | ## What is this?
5 | I teach **TG-06**, the tutorial that takes place every **Monday, 1200-1300** in AY21/22 Semester 1. It is *fully online* this semester.
6 |
7 | > Unless the syllabus has drastically changed, I believe the material covered here is relevant for future AYs as well (eg: AY22/23++). The module might be deprecated soon so do take note! In future iterations of SOC's Intro to ML module, I still feel the material herein is good enough for preparation purposes.
8 |
9 | This repository contains code, figures, and miscelleaneous items that aid me in teaching my class. The main source of reference *should* be the lecture notes and tutorial questions created by the CS3244 Professors and Teaching Staff.
10 |
11 | > Official tutorial solutions will be released at the end of every week.
12 |
13 | ## Contents
14 |
15 | Here's a list of what I've covered / I'll be covering in my tutorials:
16 |
17 | - **[T1W3](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T1W3):** k-Nearest Neighbours
18 | - **[T2W4](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T2W4):** Decision Trees
19 | - **[T3W5](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T3W5):** Linear Models
20 | - **[T4aW6](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T4aW6):** Bias-Variance Tradeoff
21 | - **[T4bW7](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T4bW7):** Regularisation & Validation
22 | - **[T5W8](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T5W8):** Evaluation Metrics
23 | - **[T6W9](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T6W9):** Visualisation & Dimensionality Reduction (Approach TA Pranavan)
24 | - **[T7W10](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T7W9):** Perceptrons and Neural Networks
25 | - **[T8W11](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T8W11):** CNNs and RNNs
26 | - **[T9W12](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T9W12):** Explainable AI
27 | - **[T10W13](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T10W13):** Unsupervised Learning
28 |
29 | > The link to the slides for all my tutorials can be found in the `README.md` in each week's respective folder.
30 |
31 | ## Exam Resources
32 | I've prepared some extra resources that might aid you in your exam preparation. You can find the files here:
33 |
34 | - [**Midterm Cheatsheet:**](https://github.com/rish-16/CS3244-Tutorial-Material/blob/main/misc/CS3244_Midterm_Cheatsheet.pdf) Lectures `1a: Intro & Class Org.` to `6: Bias Variance Tradeoff`
35 |
36 | ## Contributions
37 | If there are any issues or suggestions, feel free to raise an Issue or PR. All meaningful contributions welcome!
38 |
39 | ## License
40 | [MIT](https://github.com/rish-16/CS3244-Tutorial-Material/blob/main/LICENSE)
41 |
--------------------------------------------------------------------------------
/T10W13/README.md:
--------------------------------------------------------------------------------
1 | # T10W13: Unsupervised Learning
2 |
3 | In T10W13, we cover Unsupervised Learning. The content recap and tutorial solutions can be found on the slides [here](https://docs.google.com/presentation/d/1zfhCQrQMMMdCdSbLmCy9x9brN4VoTyLkHfDu6QPQwNM/edit?usp=sharing).
--------------------------------------------------------------------------------
/T1W3/README.md:
--------------------------------------------------------------------------------
1 | # Tutorial 1 Week 3: k-Nearest Neighbours
2 |
3 | In T1W3, I cover the k-Nearest Neighbours (k-NN) algorithm. Find the tutorial slides [here](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1).
4 |
5 | ## Contents
6 | This repo contains the code used to answer Questions 2, 3, 4.
7 |
8 | ### Question 2a
9 | Here's the ranking table used to classify the new point `(1, 1)` using 3-NN:
10 | ```
11 | Rank Point Distance Label
12 | 1 (0, 1) 1.000 1
13 | 2 (1, 0) 1.000 1
14 | 3 (1, 2) 1.000 1
15 | 4 (0, 2) 1.414 0
16 | 5 (2, 2) 1.414 0
17 | 6 (-1, 1) 2.000 0
18 | 7 (1, -1) 2.000 0
19 | 8 (2, 3) 2.236 1
20 |
21 | Rank Point Distance Label
22 | 1 (0, 1) 1.000 1
23 | 2 (1, 0) 1.000 1
24 | 3 (1, 2) 1.000 1
25 |
26 | The new point (1, 1) belongs to class 1 using 3-NN.
27 | ```
28 |
29 | Here's the ranking table used to classify the new point `(1, 1)` using 7-NN:
30 | ```
31 | Rank Point Distance Label
32 | 1 (0, 1) 1.000 1
33 | 2 (1, 0) 1.000 1
34 | 3 (1, 2) 1.000 1
35 | 4 (0, 2) 1.414 0
36 | 5 (2, 2) 1.414 0
37 | 6 (-1, 1) 2.000 0
38 | 7 (1, -1) 2.000 0
39 | 8 (2, 3) 2.236 1
40 |
41 | Rank Point Distance Label
42 | 1 (0, 1) 1.000 1
43 | 2 (1, 0) 1.000 1
44 | 3 (1, 2) 1.000 1
45 | 4 (0, 2) 1.414 0
46 | 5 (2, 2) 1.414 0
47 | 6 (-1, 1) 2.000 0
48 | 7 (1, -1) 2.000 0
49 |
50 | The new point (1, 1) belongs to class 0 using 7-NN.
51 | ```
52 |
53 | ### Question 2b
54 | Larger values of `k` will lead to smoother decision boundaries. This leads to lower chances of overfitting (Covered in `T3W5`). So, the order is:
55 |
56 | ```
57 | k_l < k_c < k_r
58 | ```
59 |
60 | ### Question 2c
61 | Time taken to run inference on test dataset for vanilla `k-NN` is indepedent of `k`. Altogether, we'll still be taking `m * t` time given a dataset of `m` samples.
62 |
63 | ---
64 |
65 | ### Question 3a
66 | Both algorithms are correct. Alice's algorithm runs in `O(n(d+k))` while that of Bob runs in `O(ndk)`. Alice's algorithm is much faster.
67 |
68 | For implementations, check out `alice_knn.py` and `bob_knn.py`.
69 |
70 | ### Question 3b
71 | Maintain a Balanced BST (Min/Max Heap) with `k` nodes where the BST tracks the top `k` smallest distances. This would reduce the running time to `O(n(d + logk))`. Here's how you can do it:
72 |
73 | 1. Calculate distances between all the `n` points and the new observation. This takes `O(nd)`
74 |
75 | 2. Add the first `k` distances into a Balanced BST
76 |
77 | 3. Look at the `n-k` unadded distances and iterate through them
78 |
79 | 4. If current distance is > BST root, ignore and move to the next one. This takes `O(1)`
80 |
81 | 5. If current distance is < BST root, remove root, insert current distance, and move on. This takes `O(n * logk)` for all `n` samples
82 |
83 | 6. By the end, you'll have the correct answers occupying all the `k` nodes in the BST. This takes `O(nd + nlogk)` in total
84 |
85 | > Essentially, we are taking the first `k` distances that may be incorrect and replacing them one by one with the correct `k` distances.
86 |
87 |
88 |
89 |
90 |
91 | ```
92 | if d_(k+1) < d_root:
93 | d_root = d_(k+1)
94 | ```
95 |
96 | ---
97 |
98 | ### Question 4
99 | No. The difference between the two ranges give `0.4` and `10º Celcius`. This means the `Temperature` variable will dominate the k-NN when calculating the Euclidean Distance, minimising the impact or effect of the `Humidity` variable.
100 |
101 | We can minimise the effect of this disproportion by **normalising** or **standardising** the inputs to a suitable range that won't affect the distance metric immensely. This will be covered in future classes.
102 |
--------------------------------------------------------------------------------
/T1W3/alice_knn.py:
--------------------------------------------------------------------------------
1 | import math
2 | from pprint import pprint
3 |
4 | class Point:
5 | def __init__(self, x, y, label=None):
6 | self.x = x
7 | self.y = y
8 | self.label = label
9 |
10 | def euclidean_distance(self, pt):
11 | '''
12 | Calculates Euclidean Distance
13 | [(x1 - x2)^2 + (y1 - y2)^2] ** 0.5
14 | '''
15 | return math.sqrt(
16 | (self.x - pt.x)**2 + (self.y - pt.y)**2
17 | )
18 |
19 | def __str__(self):
20 | return "({}, {}) | {}".format(self.x, self.y, self.label)
21 |
22 | def __repr__(self):
23 | return "({}, {}) | {}".format(self.x, self.y, self.label)
24 |
25 | points = [
26 | Point(-1, 1, 0),
27 | Point(0, 1, 1),
28 | Point(0, 2, 0),
29 | Point(1, -1, 0),
30 | Point(1, 0, 1),
31 | Point(1, 2, 1),
32 | Point(2, 2, 0),
33 | Point(2, 3, 1)
34 | ]
35 |
36 | new = Point(1, 1)
37 | k = 3
38 |
39 | S = [0 for _ in range(len(points))]
40 | D = list(map(lambda pt : pt.euclidean_distance(new), points)) # O(nd)
41 | answers = []
42 |
43 | # O(nk)
44 | for i in range(len(points)):
45 | for j in range(k):
46 | i_min = min(range(len(D)), key=D.__getitem__)
47 | min_D = min(D)
48 |
49 | if S[i_min] == 0:
50 | S[i_min] = 1
51 | D[i_min] = float('inf') # past smallest will not be picked again
52 | answers.append(i_min)
53 |
54 | for i in answers[:k]:
55 | print (points[i])
--------------------------------------------------------------------------------
/T1W3/assets/knn_bst.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rish-16/CS3244-Tutorial-Material/7681d6ad96354c3d672e5ef526dfd16f03eb5d0c/T1W3/assets/knn_bst.jpg
--------------------------------------------------------------------------------
/T1W3/bob_knn.py:
--------------------------------------------------------------------------------
1 | import math
2 | from pprint import pprint
3 |
4 | class Point:
5 | def __init__(self, x, y, label=None):
6 | self.x = x
7 | self.y = y
8 | self.label = label
9 |
10 | def euclidean_distance(self, pt):
11 | '''
12 | Calculates Euclidean Distance
13 | [(x1 - x2)^2 + (y1 - y2)^2] ** 0.5
14 | '''
15 | return math.sqrt(
16 | (self.x - pt.x)**2 + (self.y - pt.y)**2
17 | )
18 |
19 | def __str__(self):
20 | return "({}, {}) | {}".format(self.x, self.y, self.label)
21 |
22 | def __repr__(self):
23 | return "({}, {}) | {}".format(self.x, self.y, self.label)
24 |
25 | points = [
26 | Point(-1, 1, 0),
27 | Point(0, 1, 1),
28 | Point(0, 2, 0),
29 | Point(1, -1, 0),
30 | Point(1, 0, 1),
31 | Point(1, 2, 1),
32 | Point(2, 2, 0),
33 | Point(2, 3, 1)
34 | ]
35 |
36 | new = Point(1, 1)
37 | k = 3
38 |
39 | S = [0 for _ in range(len(points))]
40 |
41 | # O(k(nd))
42 | for i in range(k): # O(k)
43 | filtered_pts = [points[i] for i in range(len(S)) if S[i] == 0] # O(n)
44 | D = list(map(lambda pt : pt.euclidean_distance(new), filtered_pts)) # O(nd)
45 | i_min = min(range(len(D)), key=D.__getitem__) # O(n)
46 | S[i_min] = 1
47 |
48 | for i in (range(len(S))):
49 | if S[i] == 1:
50 | print (points[i], points[i].euclidean_distance(new))
--------------------------------------------------------------------------------
/T1W3/q1.py:
--------------------------------------------------------------------------------
1 | import math
2 | from pprint import pprint
3 |
4 | class Point:
5 | def __init__(self, x, y, label=None):
6 | self.x = x
7 | self.y = y
8 | self.label = label
9 |
10 | def euclidean_distance(self, pt):
11 | '''
12 | Calculates Euclidean Distance
13 | [(x1 - x2)^2 + (y1 - y2)^2] ** 0.5
14 | '''
15 | return math.sqrt(
16 | (self.x - pt.x)**2 + (self.y - pt.y)**2
17 | )
18 |
19 | points = [
20 | Point(-1, 1, 0),
21 | Point(0, 1, 1),
22 | Point(0, 2, 0),
23 | Point(1, -1, 0),
24 | Point(1, 0, 1),
25 | Point(1, 2, 1),
26 | Point(2, 2, 0),
27 | Point(2, 3, 1)
28 | ]
29 |
30 | new = Point(1, 1)
31 | k = 7
32 |
33 | # k-NN algorithm
34 | distances = list(map(
35 | lambda pt : [pt.euclidean_distance(new), pt.x, pt.y, pt.label],
36 | points
37 | ))
38 |
39 | sorted_distances = sorted(distances, key=lambda pt : pt[0])
40 |
41 | print ('{:<10}{:<10}{:<15}{:<5}'.format("Rank", "Point", "Distance", "Label"))
42 |
43 | for i, rec in enumerate(sorted_distances, 0):
44 | print ('{:<10}{:<10}{:<15.3f}{:<5}'.format(
45 | i+1,
46 | "(" + str(rec[1]) + ", " + str(rec[2])+ ")",
47 | rec[0],
48 | rec[3]
49 | ))
50 |
51 | print ()
52 | print ('{:<10}{:<10}{:<15}{:<5}'.format("Rank", "Point", "Distance", "Label"))
53 |
54 | for rec in sorted_distances[:k]:
55 | print ('{:<10}{:<10}{:<15.3f}{:<5}'.format(
56 | i+1,
57 | "(" + str(rec[1]) + ", " + str(rec[2])+ ")",
58 | rec[0],
59 | rec[3]
60 | ))
61 |
62 | print ("\nThe new point (1, 1) belongs to class 1.")
--------------------------------------------------------------------------------
/T2W4/README.md:
--------------------------------------------------------------------------------
1 | # Tutorial 2 Week 4: Decision Trees
2 |
3 | In T2W4, I cover the Decision Tree algorithm. Find the tutorial slides [here](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1).
4 |
5 | I have included the implementation of a Decision Tree classifier in pure Python using `numpy` and `pandas` in `dtc.py`. You can call it in a similar fashion to that in `sklearn` using the `DecisionTreeClassifier` object. I train it on the popular *Iris Type Classification Dataset* found in `data/iris.csv`.
6 |
7 | ## Contents
8 | This repo contains the code used to answer Questions 1, 2, 3, and 4.
9 |
10 | ### Question 1a
11 |
12 | ```
13 | S_1
14 | / \
15 | 0/ \1
16 | F=0 S_2
17 | / \
18 | 0/ \1
19 | F=1 S_3
20 | / \
21 | 0/ \1
22 | F=1 F=0
23 | ```
24 |
25 | ### Question 1b
26 | Function `F` can be represented as `AND(S_2, S_3)`. We can build as tree that's of depth 2:
27 |
28 | ```
29 | S_2
30 | / \
31 | 0/ \1
32 | S_3 S_3
33 | / \ / \
34 | 0/ \1 0/ \1
35 | F=0 F=1 F=1 F=1
36 | ```
37 |
38 | If your memory of the `AND` gate is fuzzy, here's a tabular summary:
39 |
40 | | **A** | **B** | **AND** |
41 | |-------|-------|---------|
42 | | 0 | 0 | 0 |
43 | | 0 | 1 | 0 |
44 | | 1 | 0 | 0 |
45 | | 1 | 1 | 1 |
46 |
47 | ### Question 1c
48 |
49 | If your memory of the `XOR` gate is fuzzy, here's a tabular summary:
50 |
51 | | **A** | **B** | **XOR** |
52 | |-------|-------|---------|
53 | | 0 | 0 | 0 |
54 | | 0 | 1 | 1 |
55 | | 1 | 0 | 1 |
56 | | 1 | 1 | 0 |
57 |
58 | To implement this `XOR` gate, we'd need `2^d` leaf nodes and `2^d - 1` internal nodes. It grows exponentially with `d`. Using a Decision Tree is not scalable. Pruning is not possible because we need to consider every single input (ie. a feature) – we can't just ignore any of them.
59 |
60 | You can, however, implement `AND` and `OR` gates using a DT since pruning is possible. We dont need to consider all our inputs. For example, for an `AND` gate, if any one of our inputs is `0`, the result is `0` regardless of the other inputs. Likewise, if any feature in an `OR` gate is `1`, the result is `1` regardless of the other inputs.
61 |
62 | ---
63 |
64 | ### Question 2a
65 | The features are as follows:
66 |
67 | - `Income`
68 | - `Credit History`
69 | - `Debt`
70 |
71 | The label is `Decision`.
72 |
73 | At each level, the main question we will be asking is,
74 |
75 |
76 | > Which feature to choose such that splitting via that gives us the "greatest purity" ie. the most even split between samples.
77 |
78 | The tree would look like so:
79 |
80 | ```
81 | CrHi?
82 | / | \
83 | Bad/ Good| \Unknown
84 | / | \
85 | Rej App Income?
86 | / | \
87 | / | \
88 | 0-5K/ 5-10K| \10K+
89 | Debt App App
90 | / \
91 | Low/ \High
92 | App Rej
93 | ```
94 |
95 | *Note: Refer to the slides for more on Information Gain and Entropy. We covered Claude Shannon's Information Theory in this class!*
96 |
97 | ### Question 2b
98 |
99 | Tree 1:
100 | ```
101 | CrHi?
102 | / | \
103 | Good/ Bad| \Unknown
104 | App Rej Income?
105 | / | \
106 | 0-5K/ 5-10K| \10K+
107 | App App App
108 | ```
109 |
110 | Tree 2:
111 | ```
112 | CrHi?
113 | / | \
114 | Good/ Bad| \Unknown
115 | App Rej Debt?
116 | / \
117 | Low/ \High
118 | App App
119 | ```
120 |
121 | Tree 3:
122 | ```
123 | Income?
124 |
125 | / | \
126 | / | \
127 | 0-5K/ 5-10K| \10K+
128 | Debt? Debt? Debt?
129 | / \ / \ / \
130 | Low/ \High Low/ \High Low/ \High
131 | App Rej App App App App
132 | ```
133 |
134 | ### Question 2c
135 | > Of course, you must convert (encode) these strings like `GOOD`, `BAD`, `HIGH`, `LOW` to numeric values. So `GOOD = 1` and `BAD = 0`, for example. Same goes for the labels. ML Models ***DO NOT*** work with raw strings, only numbers.
136 |
137 | `DT($4K, GOOD CH, HIGH debt) = Approve`
138 |
139 | > Hint: Just follow path down to the leaf in your DT Classifier from Question 2a.
140 |
141 | If we use our 3 DTs, the results will be the following:
142 |
143 | ```
144 | Tree 1: Approve
145 | Tree 2: Approve
146 | Tree 3: Reject
147 | ```
148 |
149 | If we use uniform voting (every tree gets equal say ie. majority voting), we `Approve` the application since 2/3 classifiers agree.
150 |
151 | ---
152 |
153 | ### Question 3a
154 | Debt depends on Income. Person A with income of $5K and a debt of $4K, and Person B with income $15K and a debt of $4K, results in Person A being in `HIGH` debt while Person B is in `LOW` debt. Debt is categorical and Income is a quantifiable, continuous variable. This makes the explainability ambiguous.
155 |
156 | ### Question 3b
157 | Empirically, Decision Trees are bad performers on datasets with missing values. To calculate metrics like Information Gain and Entropy, it is nice to have all the information in front of us. Missing data makes these measures unreliable, making the DT classifier inaccurate. Replacing missing values with alternatives (`mean`, `max`, `min`, `mode`, etc.) could easily skew your data (think of it as "poisoning your dataset"). Also, dropping the affected rows makes the dataset smaller and non-representative of those specific cases (your model won't know what to do in those cases anymore).
158 |
159 | ### Question 3c
160 | Decision Trees do not consider temporal (time-related) features. You are introducing heavy class imbalance into your dataset by appending rows of data with a `REJECT` decision. Your model might overfit on this new biased dataset. Always try to maintain a good balance of `positive` and `negative` cases in your dataset to allow for better generalisation.
161 |
162 | > More on class imbalance and overfitting in future weeks!
163 |
164 | ---
165 |
166 | ### Question 4
167 | Refer to the working covered in class. You can find the official working in the [slides](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1).
--------------------------------------------------------------------------------
/T2W4/data/iris.csv:
--------------------------------------------------------------------------------
1 | sepal_length, sepal_width, petal_length, petal_width, class
2 | 5.1,3.5,1.4,0.2,Iris-setosa
3 | 4.9,3.0,1.4,0.2,Iris-setosa
4 | 4.7,3.2,1.3,0.2,Iris-setosa
5 | 4.6,3.1,1.5,0.2,Iris-setosa
6 | 5.0,3.6,1.4,0.2,Iris-setosa
7 | 5.4,3.9,1.7,0.4,Iris-setosa
8 | 4.6,3.4,1.4,0.3,Iris-setosa
9 | 5.0,3.4,1.5,0.2,Iris-setosa
10 | 4.4,2.9,1.4,0.2,Iris-setosa
11 | 4.9,3.1,1.5,0.1,Iris-setosa
12 | 5.4,3.7,1.5,0.2,Iris-setosa
13 | 4.8,3.4,1.6,0.2,Iris-setosa
14 | 4.8,3.0,1.4,0.1,Iris-setosa
15 | 4.3,3.0,1.1,0.1,Iris-setosa
16 | 5.8,4.0,1.2,0.2,Iris-setosa
17 | 5.7,4.4,1.5,0.4,Iris-setosa
18 | 5.4,3.9,1.3,0.4,Iris-setosa
19 | 5.1,3.5,1.4,0.3,Iris-setosa
20 | 5.7,3.8,1.7,0.3,Iris-setosa
21 | 5.1,3.8,1.5,0.3,Iris-setosa
22 | 5.4,3.4,1.7,0.2,Iris-setosa
23 | 5.1,3.7,1.5,0.4,Iris-setosa
24 | 4.6,3.6,1.0,0.2,Iris-setosa
25 | 5.1,3.3,1.7,0.5,Iris-setosa
26 | 4.8,3.4,1.9,0.2,Iris-setosa
27 | 5.0,3.0,1.6,0.2,Iris-setosa
28 | 5.0,3.4,1.6,0.4,Iris-setosa
29 | 5.2,3.5,1.5,0.2,Iris-setosa
30 | 5.2,3.4,1.4,0.2,Iris-setosa
31 | 4.7,3.2,1.6,0.2,Iris-setosa
32 | 4.8,3.1,1.6,0.2,Iris-setosa
33 | 5.4,3.4,1.5,0.4,Iris-setosa
34 | 5.2,4.1,1.5,0.1,Iris-setosa
35 | 5.5,4.2,1.4,0.2,Iris-setosa
36 | 4.9,3.1,1.5,0.1,Iris-setosa
37 | 5.0,3.2,1.2,0.2,Iris-setosa
38 | 5.5,3.5,1.3,0.2,Iris-setosa
39 | 4.9,3.1,1.5,0.1,Iris-setosa
40 | 4.4,3.0,1.3,0.2,Iris-setosa
41 | 5.1,3.4,1.5,0.2,Iris-setosa
42 | 5.0,3.5,1.3,0.3,Iris-setosa
43 | 4.5,2.3,1.3,0.3,Iris-setosa
44 | 4.4,3.2,1.3,0.2,Iris-setosa
45 | 5.0,3.5,1.6,0.6,Iris-setosa
46 | 5.1,3.8,1.9,0.4,Iris-setosa
47 | 4.8,3.0,1.4,0.3,Iris-setosa
48 | 5.1,3.8,1.6,0.2,Iris-setosa
49 | 4.6,3.2,1.4,0.2,Iris-setosa
50 | 5.3,3.7,1.5,0.2,Iris-setosa
51 | 5.0,3.3,1.4,0.2,Iris-setosa
52 | 7.0,3.2,4.7,1.4,Iris-versicolor
53 | 6.4,3.2,4.5,1.5,Iris-versicolor
54 | 6.9,3.1,4.9,1.5,Iris-versicolor
55 | 5.5,2.3,4.0,1.3,Iris-versicolor
56 | 6.5,2.8,4.6,1.5,Iris-versicolor
57 | 5.7,2.8,4.5,1.3,Iris-versicolor
58 | 6.3,3.3,4.7,1.6,Iris-versicolor
59 | 4.9,2.4,3.3,1.0,Iris-versicolor
60 | 6.6,2.9,4.6,1.3,Iris-versicolor
61 | 5.2,2.7,3.9,1.4,Iris-versicolor
62 | 5.0,2.0,3.5,1.0,Iris-versicolor
63 | 5.9,3.0,4.2,1.5,Iris-versicolor
64 | 6.0,2.2,4.0,1.0,Iris-versicolor
65 | 6.1,2.9,4.7,1.4,Iris-versicolor
66 | 5.6,2.9,3.6,1.3,Iris-versicolor
67 | 6.7,3.1,4.4,1.4,Iris-versicolor
68 | 5.6,3.0,4.5,1.5,Iris-versicolor
69 | 5.8,2.7,4.1,1.0,Iris-versicolor
70 | 6.2,2.2,4.5,1.5,Iris-versicolor
71 | 5.6,2.5,3.9,1.1,Iris-versicolor
72 | 5.9,3.2,4.8,1.8,Iris-versicolor
73 | 6.1,2.8,4.0,1.3,Iris-versicolor
74 | 6.3,2.5,4.9,1.5,Iris-versicolor
75 | 6.1,2.8,4.7,1.2,Iris-versicolor
76 | 6.4,2.9,4.3,1.3,Iris-versicolor
77 | 6.6,3.0,4.4,1.4,Iris-versicolor
78 | 6.8,2.8,4.8,1.4,Iris-versicolor
79 | 6.7,3.0,5.0,1.7,Iris-versicolor
80 | 6.0,2.9,4.5,1.5,Iris-versicolor
81 | 5.7,2.6,3.5,1.0,Iris-versicolor
82 | 5.5,2.4,3.8,1.1,Iris-versicolor
83 | 5.5,2.4,3.7,1.0,Iris-versicolor
84 | 5.8,2.7,3.9,1.2,Iris-versicolor
85 | 6.0,2.7,5.1,1.6,Iris-versicolor
86 | 5.4,3.0,4.5,1.5,Iris-versicolor
87 | 6.0,3.4,4.5,1.6,Iris-versicolor
88 | 6.7,3.1,4.7,1.5,Iris-versicolor
89 | 6.3,2.3,4.4,1.3,Iris-versicolor
90 | 5.6,3.0,4.1,1.3,Iris-versicolor
91 | 5.5,2.5,4.0,1.3,Iris-versicolor
92 | 5.5,2.6,4.4,1.2,Iris-versicolor
93 | 6.1,3.0,4.6,1.4,Iris-versicolor
94 | 5.8,2.6,4.0,1.2,Iris-versicolor
95 | 5.0,2.3,3.3,1.0,Iris-versicolor
96 | 5.6,2.7,4.2,1.3,Iris-versicolor
97 | 5.7,3.0,4.2,1.2,Iris-versicolor
98 | 5.7,2.9,4.2,1.3,Iris-versicolor
99 | 6.2,2.9,4.3,1.3,Iris-versicolor
100 | 5.1,2.5,3.0,1.1,Iris-versicolor
101 | 5.7,2.8,4.1,1.3,Iris-versicolor
102 | 6.3,3.3,6.0,2.5,Iris-virginica
103 | 5.8,2.7,5.1,1.9,Iris-virginica
104 | 7.1,3.0,5.9,2.1,Iris-virginica
105 | 6.3,2.9,5.6,1.8,Iris-virginica
106 | 6.5,3.0,5.8,2.2,Iris-virginica
107 | 7.6,3.0,6.6,2.1,Iris-virginica
108 | 4.9,2.5,4.5,1.7,Iris-virginica
109 | 7.3,2.9,6.3,1.8,Iris-virginica
110 | 6.7,2.5,5.8,1.8,Iris-virginica
111 | 7.2,3.6,6.1,2.5,Iris-virginica
112 | 6.5,3.2,5.1,2.0,Iris-virginica
113 | 6.4,2.7,5.3,1.9,Iris-virginica
114 | 6.8,3.0,5.5,2.1,Iris-virginica
115 | 5.7,2.5,5.0,2.0,Iris-virginica
116 | 5.8,2.8,5.1,2.4,Iris-virginica
117 | 6.4,3.2,5.3,2.3,Iris-virginica
118 | 6.5,3.0,5.5,1.8,Iris-virginica
119 | 7.7,3.8,6.7,2.2,Iris-virginica
120 | 7.7,2.6,6.9,2.3,Iris-virginica
121 | 6.0,2.2,5.0,1.5,Iris-virginica
122 | 6.9,3.2,5.7,2.3,Iris-virginica
123 | 5.6,2.8,4.9,2.0,Iris-virginica
124 | 7.7,2.8,6.7,2.0,Iris-virginica
125 | 6.3,2.7,4.9,1.8,Iris-virginica
126 | 6.7,3.3,5.7,2.1,Iris-virginica
127 | 7.2,3.2,6.0,1.8,Iris-virginica
128 | 6.2,2.8,4.8,1.8,Iris-virginica
129 | 6.1,3.0,4.9,1.8,Iris-virginica
130 | 6.4,2.8,5.6,2.1,Iris-virginica
131 | 7.2,3.0,5.8,1.6,Iris-virginica
132 | 7.4,2.8,6.1,1.9,Iris-virginica
133 | 7.9,3.8,6.4,2.0,Iris-virginica
134 | 6.4,2.8,5.6,2.2,Iris-virginica
135 | 6.3,2.8,5.1,1.5,Iris-virginica
136 | 6.1,2.6,5.6,1.4,Iris-virginica
137 | 7.7,3.0,6.1,2.3,Iris-virginica
138 | 6.3,3.4,5.6,2.4,Iris-virginica
139 | 6.4,3.1,5.5,1.8,Iris-virginica
140 | 6.0,3.0,4.8,1.8,Iris-virginica
141 | 6.9,3.1,5.4,2.1,Iris-virginica
142 | 6.7,3.1,5.6,2.4,Iris-virginica
143 | 6.9,3.1,5.1,2.3,Iris-virginica
144 | 5.8,2.7,5.1,1.9,Iris-virginica
145 | 6.8,3.2,5.9,2.3,Iris-virginica
146 | 6.7,3.3,5.7,2.5,Iris-virginica
147 | 6.7,3.0,5.2,2.3,Iris-virginica
148 | 6.3,2.5,5.0,1.9,Iris-virginica
149 | 6.5,3.0,5.2,2.0,Iris-virginica
150 | 6.2,3.4,5.4,2.3,Iris-virginica
151 | 5.9,3.0,5.1,1.8,Iris-virginica
--------------------------------------------------------------------------------
/T2W4/data/iris.names:
--------------------------------------------------------------------------------
1 | 1. Title: Iris Plants Database
2 | Updated Sept 21 by C.Blake - Added discrepency information
3 |
4 | 2. Sources:
5 | (a) Creator: R.A. Fisher
6 | (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
7 | (c) Date: July, 1988
8 |
9 | 3. Past Usage:
10 | - Publications: too many to mention!!! Here are a few.
11 | 1. Fisher,R.A. "The use of multiple measurements in taxonomic problems"
12 | Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions
13 | to Mathematical Statistics" (John Wiley, NY, 1950).
14 | 2. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
15 | (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
16 | 3. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
17 | Structure and Classification Rule for Recognition in Partially Exposed
18 | Environments". IEEE Transactions on Pattern Analysis and Machine
19 | Intelligence, Vol. PAMI-2, No. 1, 67-71.
20 | -- Results:
21 | -- very low misclassification rates (0% for the setosa class)
22 | 4. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE
23 | Transactions on Information Theory, May 1972, 431-433.
24 | -- Results:
25 | -- very low misclassification rates again
26 | 5. See also: 1988 MLC Proceedings, 54-64. Cheeseman et al's AUTOCLASS II
27 | conceptual clustering system finds 3 classes in the data.
28 |
29 | 4. Relevant Information:
30 | --- This is perhaps the best known database to be found in the pattern
31 | recognition literature. Fisher's paper is a classic in the field
32 | and is referenced frequently to this day. (See Duda & Hart, for
33 | example.) The data set contains 3 classes of 50 instances each,
34 | where each class refers to a type of iris plant. One class is
35 | linearly separable from the other 2; the latter are NOT linearly
36 | separable from each other.
37 | --- Predicted attribute: class of iris plant.
38 | --- This is an exceedingly simple domain.
39 | --- This data differs from the data presented in Fishers article
40 | (identified by Steve Chadwick, spchadwick@espeedaz.net )
41 | The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa"
42 | where the error is in the fourth feature.
43 | The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa"
44 | where the errors are in the second and third features.
45 |
46 | 5. Number of Instances: 150 (50 in each of three classes)
47 |
48 | 6. Number of Attributes: 4 numeric, predictive attributes and the class
49 |
50 | 7. Attribute Information:
51 | 1. sepal length in cm
52 | 2. sepal width in cm
53 | 3. petal length in cm
54 | 4. petal width in cm
55 | 5. class:
56 | -- Iris Setosa
57 | -- Iris Versicolour
58 | -- Iris Virginica
59 |
60 | 8. Missing Attribute Values: None
61 |
62 | Summary Statistics:
63 | Min Max Mean SD Class Correlation
64 | sepal length: 4.3 7.9 5.84 0.83 0.7826
65 | sepal width: 2.0 4.4 3.05 0.43 -0.4194
66 | petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
67 | petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
68 |
69 | 9. Class Distribution: 33.3% for each of 3 classes.
70 |
--------------------------------------------------------------------------------
/T2W4/dtc.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pandas as pd
3 |
4 | '''
5 | Decision Trees are greedy algorithms
6 | that maximise the current Information Gain
7 | without backtracking or going back up to the root.
8 |
9 | Future splits are based on the current splits:
10 | split(t+1) = f(split(t))
11 |
12 | At every level, the impurity of the dataset
13 | decreases. The entropy (randomness) decreases
14 | with the level.
15 | '''
16 |
17 | class DTNode():
18 | def __init__(self, feat_idx=None, bounds=None, left=None, right=None, info_gain=None, value=None):
19 | self.feat_idx = feat_idx
20 | self.bounds = bounds
21 | self.left = left
22 | self.right = right
23 | self.info_gain = info_gain
24 | self.value = value
25 |
26 | class DecisionTreeClassifier():
27 | def __init__(self, depth=2, min_split=2):
28 | self.root = None
29 | self.depth = depth
30 | self.min_split = min_split
31 |
32 | def build_tree(self, dataset, cur_depth=0):
33 | x, y = dataset[:, :-1], dataset[:, -1]
34 | n, n_dim = x.shape
35 |
36 | # recursively build the subtrees
37 | if n >= self.min_split and cur_depth <= self.depth:
38 | best_split = self.get_best_split(dataset, n, n_dim)
39 |
40 | if best_split['info_gain'] > 0:
41 | left_tree = self.build_tree(best_split['left'], cur_depth+1)
42 | right_tree = self.build_tree(best_split['right'], cur_depth+1)
43 |
44 | return DTNode(best_split['feat_idx'], best_split['bounds'], left_tree, right_tree, best_split['info_gain'])
45 |
46 | y = list(y)
47 | value = max(y, key=y.count) # class label = majority count at leaves
48 |
49 | return DTNode(value=value)
50 |
51 | def get_best_split(self, dataset, n, n_dim):
52 | best_split = {}
53 | max_info_gain = -float('inf')
54 |
55 | for idx in range(n_dim):
56 | feat_val = dataset[:, idx]
57 | possible_boundss = np.unique(feat_val)
58 |
59 | for thresh in possible_boundss:
60 | # data_left, data_right = self.split(dataset, idx, thresh)
61 | data_left = np.array([row for row in dataset if row[idx] <= thresh])
62 | data_right = np.array([row for row in dataset if row[idx] > thresh])
63 |
64 | if len(data_left) > 0 and len(data_right) > 0:
65 | y, left_y, right_y = dataset[:, -1], data_left[:, -1], data_right[:, -1]
66 | cur_info_gain = self.get_info_gain(y, left_y, right_y)
67 |
68 | if cur_info_gain > max_info_gain:
69 | best_split['feat_idx'] = idx
70 | best_split['bounds'] = thresh
71 | best_split['left'] = data_left
72 | best_split['right'] = data_right
73 | best_split['info_gain'] = cur_info_gain
74 | max_info_gain = cur_info_gain
75 |
76 | return best_split
77 |
78 | def get_info_gain(self, parent, left, right):
79 | weight_left = len(left) / len(parent)
80 | weight_right = len(right) / len(parent)
81 |
82 | info_gain = self.get_entropy(parent) - (weight_left * self.get_entropy(left) + weight_right * self.get_entropy(right))
83 |
84 | return info_gain
85 |
86 | def get_entropy(self, y):
87 | labels = np.unique(y)
88 | entropy = 0
89 | for cls in labels:
90 | p_cls = len(y[y == cls]) / len(y)
91 | entropy += -p_cls * np.log2(p_cls)
92 |
93 | return entropy
94 |
95 | def fit(self, x, y):
96 | dataset = np.concatenate((x, y), axis=1)
97 | self.root = self.build_tree(dataset)
98 |
99 | def make_pred(self, x, root):
100 | if root.value != None:
101 | return root.value
102 |
103 | feat_val = x[root.feat_idx]
104 |
105 | if feat_val <= root.bounds:
106 | return self.make_pred(x, root.left)
107 | else:
108 | return self.make_pred(x, root.right)
109 |
110 | def predict(self, x):
111 | return [self.make_pred(i, self.root) for i in x]
112 |
113 | cols = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
114 | df = pd.read_csv('./data/iris.csv', skiprows=1, header=0, names=cols)
115 |
116 | # replace class strings with integer indices
117 | df['class'] = df['class'].str.replace('Iris-setosa', '0')
118 | df['class'] = df['class'].str.replace('Iris-versicolor', '1')
119 | df['class'] = df['class'].str.replace('Iris-virginica', '2')
120 | df['class'] = df['class'].map(lambda x : int(x))
121 |
122 | X = df.iloc[:, :-1].values
123 | Y = df.iloc[:, -1].values.reshape(-1, 1)
124 | X = np.array(X)
125 | Y = np.array(Y)
126 |
127 | clf = DecisionTreeClassifier()
128 | clf.fit(X, Y) # split this into training and testing datasets
129 |
130 | def print_tree(root=None, indent=" "):
131 | if root.value != None:
132 | print (root.value)
133 | else:
134 | print ("x_" + str(root.feat_idx), '<=', root.bounds, ":", format(root.info_gain, '0.4f'))
135 | print (indent + "left: ", end="")
136 | print_tree(root.left, indent + indent)
137 | print (indent + "right: ", end="")
138 | print_tree(root.right, indent + indent)
139 |
140 | print_tree(clf.root)
--------------------------------------------------------------------------------
/T3W5/Intro_to_Support_Vector_Machines.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "Support Vector Machines – An In-depth Tutorial",
7 | "provenance": [],
8 | "collapsed_sections": []
9 | },
10 | "kernelspec": {
11 | "name": "python3",
12 | "display_name": "Python 3"
13 | },
14 | "language_info": {
15 | "name": "python"
16 | }
17 | },
18 | "cells": [
19 | {
20 | "cell_type": "code",
21 | "metadata": {
22 | "id": "05sNYyltOOMD"
23 | },
24 | "source": [
25 | "import numpy as np\n",
26 | "import pandas as pd\n",
27 | "from sklearn import datasets"
28 | ],
29 | "execution_count": 5,
30 | "outputs": []
31 | },
32 | {
33 | "cell_type": "code",
34 | "metadata": {
35 | "id": "eoCyq5f7OVwZ"
36 | },
37 | "source": [
38 | "class SupperVectorClassifier:\n",
39 | " def __init__(self):\n",
40 | " self.w = None\n",
41 | " self.iterations = 1000\n",
42 | "\n",
43 | " def hinge_loss(self, YHAT, Y):\n",
44 | " '''\n",
45 | " Hinge Loss from lecture. No changes made.\n",
46 | " '''\n",
47 | "\n",
48 | " distances = 1 - (Y * YHAT)\n",
49 | " distances[distances < 0] = 0 # everywhere it's the correct prediction, give a loss of 0\n",
50 | " return np.sum(distances) / len(YHAT) # average loss\n",
51 | "\n",
52 | " def gradient_descent(self, X, Y, loss):\n",
53 | " '''\n",
54 | " Vanilla gradient descent. \n",
55 | " \n",
56 | " You can switch this to SGD as well to improve performance.\n",
57 | " '''\n",
58 | "\n",
59 | " grads = {}\n",
60 | " loss = 1 - (Y * np.dot(X, self.w))\n",
61 | " dw = np.zeros(len(self.w))\n",
62 | " \n",
63 | " for ind, d in enumerate(loss):\n",
64 | " if max(0, d) == 0:\n",
65 | " di = self.w\n",
66 | " else:\n",
67 | " di = self.w - (Y[ind] * X[ind])\n",
68 | " dw += di\n",
69 | " \n",
70 | " dw = dw / len(Y) # get the average gradient\n",
71 | " grads['dw'] = dw\n",
72 | "\n",
73 | " return grads\n",
74 | "\n",
75 | " def update(self, grads, alpha):\n",
76 | " '''\n",
77 | " Performs the actual update step in gradient descent.\n",
78 | "\n",
79 | " grads : gradient of loss wrt weights\n",
80 | " alpha : learning rate\n",
81 | " '''\n",
82 | " self.w = self.w - alpha * grads['dw']\n",
83 | "\n",
84 | " def fit(self, X, Y, alpha=1e-2):\n",
85 | " '''\n",
86 | " Fits the model on the given dataset.\n",
87 | "\n",
88 | " X: data samples\n",
89 | " Y: binary labels (1 or 0)\n",
90 | " alpha: step size / learning rate\n",
91 | " '''\n",
92 | "\n",
93 | " # reset the parameters for every call to fit\n",
94 | " self.w = np.random.rand(X[0].shape[-1]) # get the number of features per sample\n",
95 | "\n",
96 | " # perform the N iterations of learning\n",
97 | " for i in range(self.iterations):\n",
98 | " # forward pass\n",
99 | " YHAT = np.dot(X, self.w)\n",
100 | " loss = self.hinge_loss(YHAT, Y)\n",
101 | "\n",
102 | " if i % 20 == 0:\n",
103 | " print (\"Iteration: {} | Loss: {}\".format(i, loss))\n",
104 | "\n",
105 | " # backward pass\n",
106 | " grads = self.gradient_descent(X, Y, loss) # calculate gradient wrt parameters\n",
107 | " self.update(grads, alpha) # optimise the parameters\n",
108 | " \n",
109 | " def predict(self, X):\n",
110 | " # simply compute forward pass\n",
111 | " return np.dot(X, self.w)\n",
112 | "\n",
113 | " def evaluate(self, X_test, Y_test):\n",
114 | " '''\n",
115 | " Returns the accuracy of the model.\n",
116 | " '''\n",
117 | " pred = self.predict(X_test)\n",
118 | "\n",
119 | " # anything negative gets label -1, anything positive gets label 1\n",
120 | " pred[pred < 0] = -1 \n",
121 | " pred[pred >= 0] = 1\n",
122 | " correct = 0\n",
123 | "\n",
124 | " for i in range(len(Y_test)):\n",
125 | " if pred[i] == Y_test[i]:\n",
126 | " correct += 1\n",
127 | "\n",
128 | " return correct / len(Y_test) # get final accuracy based on number of correct samples"
129 | ],
130 | "execution_count": 46,
131 | "outputs": []
132 | },
133 | {
134 | "cell_type": "code",
135 | "metadata": {
136 | "id": "vKilb4EqQq6K"
137 | },
138 | "source": [
139 | "from sklearn.model_selection import train_test_split\n",
140 | "\n",
141 | "X, Y = datasets.load_breast_cancer(return_X_y=True)\n",
142 | "Y[Y == 0] = -1 # switch labels from [0, 1] to [-1, 1]\n",
143 | "\n",
144 | "X_train, X_test, Y_train, Y_test = train_test_split(X, Y)"
145 | ],
146 | "execution_count": 50,
147 | "outputs": []
148 | },
149 | {
150 | "cell_type": "code",
151 | "metadata": {
152 | "colab": {
153 | "base_uri": "https://localhost:8080/"
154 | },
155 | "id": "YywW366QQ6L8",
156 | "outputId": "720d4170-9a26-4c51-959f-7ab81b044bdf"
157 | },
158 | "source": [
159 | "model = SupperVectorClassifier()\n",
160 | "model.fit(X_train, Y_train)"
161 | ],
162 | "execution_count": 52,
163 | "outputs": [
164 | {
165 | "output_type": "stream",
166 | "name": "stdout",
167 | "text": [
168 | "Iteration: 0 | Loss: 416.2350522151881\n",
169 | "Iteration: 20 | Loss: 634.7471121710325\n",
170 | "Iteration: 40 | Loss: 1377.959569044146\n",
171 | "Iteration: 60 | Loss: 337.2871462020824\n",
172 | "Iteration: 80 | Loss: 174.50232848883326\n",
173 | "Iteration: 100 | Loss: 144.48051916551537\n",
174 | "Iteration: 120 | Loss: 144.97272350980242\n",
175 | "Iteration: 140 | Loss: 157.49100221912383\n",
176 | "Iteration: 160 | Loss: 188.68354119350255\n",
177 | "Iteration: 180 | Loss: 198.83279680794266\n",
178 | "Iteration: 200 | Loss: 201.37397024816923\n",
179 | "Iteration: 220 | Loss: 209.18699964584258\n",
180 | "Iteration: 240 | Loss: 211.24550460677744\n",
181 | "Iteration: 260 | Loss: 224.75921771134932\n",
182 | "Iteration: 280 | Loss: 207.41067721227247\n",
183 | "Iteration: 300 | Loss: 202.9874547965538\n",
184 | "Iteration: 320 | Loss: 233.5225806996785\n",
185 | "Iteration: 340 | Loss: 209.41894810505434\n",
186 | "Iteration: 360 | Loss: 227.86193173406267\n",
187 | "Iteration: 380 | Loss: 220.0275230523279\n",
188 | "Iteration: 400 | Loss: 209.93813706106957\n",
189 | "Iteration: 420 | Loss: 224.69843019249313\n",
190 | "Iteration: 440 | Loss: 207.14690567298172\n",
191 | "Iteration: 460 | Loss: 222.13044724748374\n",
192 | "Iteration: 480 | Loss: 206.75958921219885\n",
193 | "Iteration: 500 | Loss: 224.10527985394174\n",
194 | "Iteration: 520 | Loss: 212.29554595340238\n",
195 | "Iteration: 540 | Loss: 222.9296852741817\n",
196 | "Iteration: 560 | Loss: 211.17164394816842\n",
197 | "Iteration: 580 | Loss: 221.71447794927437\n",
198 | "Iteration: 600 | Loss: 219.52250985351424\n",
199 | "Iteration: 620 | Loss: 212.20466048180376\n",
200 | "Iteration: 640 | Loss: 222.5975165742218\n",
201 | "Iteration: 660 | Loss: 207.1530791563107\n",
202 | "Iteration: 680 | Loss: 224.41752826545314\n",
203 | "Iteration: 700 | Loss: 212.5541881669893\n",
204 | "Iteration: 720 | Loss: 207.96840920838758\n",
205 | "Iteration: 740 | Loss: 202.79349339026604\n",
206 | "Iteration: 760 | Loss: 224.18299606836823\n",
207 | "Iteration: 780 | Loss: 223.15410816749386\n",
208 | "Iteration: 800 | Loss: 227.64698698681198\n",
209 | "Iteration: 820 | Loss: 213.35496804814935\n",
210 | "Iteration: 840 | Loss: 223.82125080471178\n",
211 | "Iteration: 860 | Loss: 211.88036740995827\n",
212 | "Iteration: 880 | Loss: 203.0702406461759\n",
213 | "Iteration: 900 | Loss: 224.43132254807512\n",
214 | "Iteration: 920 | Loss: 223.35580653357326\n",
215 | "Iteration: 940 | Loss: 227.815553049764\n",
216 | "Iteration: 960 | Loss: 219.9270821314675\n",
217 | "Iteration: 980 | Loss: 212.53249397223584\n"
218 | ]
219 | }
220 | ]
221 | },
222 | {
223 | "cell_type": "code",
224 | "metadata": {
225 | "colab": {
226 | "base_uri": "https://localhost:8080/"
227 | },
228 | "id": "l5deYnKJ_QC2",
229 | "outputId": "08d084e3-62e1-4170-e129-494051d5526e"
230 | },
231 | "source": [
232 | "acc = model.evaluate(X_test, Y_test)\n",
233 | "print (\"SVM is {:.3f}% accurate.\".format(acc * 100))"
234 | ],
235 | "execution_count": 53,
236 | "outputs": [
237 | {
238 | "output_type": "stream",
239 | "name": "stdout",
240 | "text": [
241 | "SVM is 83.916% accurate.\n"
242 | ]
243 | }
244 | ]
245 | }
246 | ]
247 | }
--------------------------------------------------------------------------------
/T3W5/README.md:
--------------------------------------------------------------------------------
1 | # Tutorial 3 Week 5: Linear Models
2 |
3 | In T3W5, I cover the Linear Models. Find the tutorial slides [here](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1).
4 |
5 | This repo contains Python implementations of `LinearRegressionClassifier`, `LogisticRegressionClassifier`, and `SupportVectorClassifier`. You can call them in a similar fashion to related models from `sklearn`. I train them on the popular *Iris Type Classification Dataset* found in `data/iris.csv`, as well as the *Breast Cancer Classification Dataset* from `sklearn.datasets`.
6 |
7 | > You can find the SVM implementation in `Intro_to_Support_Vector_Machines.ipynb`. It has some more in-depth comments inside.
8 |
9 | ## Contents
10 | This repo contains the code used to answer Questions 1, 2, and 5.
11 |
12 | ### Question 1
13 | You cannot use **Mean Squared Error**.
14 |
15 | MSE is mainly used in the case of regression problems, not classification tasks (which is when Logistic Regression is used).
16 |
17 | - `Accuracy` shows us how "good" our model is on unseen data
18 | - `AUC-ROC` shows us the model's ability to tell apart positive and negative instances
19 | - `Log Loss` is used as the cost function for Logistic Regression. The aim is to minimise this over training.
20 |
21 | ---
22 |
23 | ### Question 2
24 | The **Normal Equation** from the lecture:
25 |
26 | ```bash
27 | ø = (1/(X.T * X)) * X.T * Y
28 | = [4 -5.5 -7 7].T
29 |
30 | y_hat = 4 - 5.5x1 - 7x2 + 7x3
31 | ```
32 |
33 | ---
34 |
35 | ### Question 3
36 | > Check the slides for annotated solutions for all the equations here.
37 |
38 | ---
39 |
40 | ### Question 4
41 | > Check the slides for diagrams and answers to these questions.
42 |
43 | ---
44 |
45 | ### Question 5a
46 | A symmetric matrix is one that's equal to its transpose.
47 |
48 | > Look at the slides for an annotated proof.
49 |
50 | ### Question 5b
51 | You have to use induction for this. Consider the base case and then move on to the inductive step.
52 |
53 | > Look at the slides for an annotated proof.
54 |
55 | ### Question 5c
56 | Similar to `5b`, we have to use induction. We use the concept of **Idempotency** again.
57 |
58 | > Look at the slides for an annotated proof.
59 |
60 | ### Question 5d
61 | We know that `trace(AB) = trace(BA)`. For symmetric and idempotent matrices, `rank(A) = trace(A)`.
62 |
63 | > Look at the slides for an annotated proof.
--------------------------------------------------------------------------------
/T3W5/data/iris.csv:
--------------------------------------------------------------------------------
1 | sepal_length, sepal_width, petal_length, petal_width, class
2 | 5.1,3.5,1.4,0.2,Iris-setosa
3 | 4.9,3.0,1.4,0.2,Iris-setosa
4 | 4.7,3.2,1.3,0.2,Iris-setosa
5 | 4.6,3.1,1.5,0.2,Iris-setosa
6 | 5.0,3.6,1.4,0.2,Iris-setosa
7 | 5.4,3.9,1.7,0.4,Iris-setosa
8 | 4.6,3.4,1.4,0.3,Iris-setosa
9 | 5.0,3.4,1.5,0.2,Iris-setosa
10 | 4.4,2.9,1.4,0.2,Iris-setosa
11 | 4.9,3.1,1.5,0.1,Iris-setosa
12 | 5.4,3.7,1.5,0.2,Iris-setosa
13 | 4.8,3.4,1.6,0.2,Iris-setosa
14 | 4.8,3.0,1.4,0.1,Iris-setosa
15 | 4.3,3.0,1.1,0.1,Iris-setosa
16 | 5.8,4.0,1.2,0.2,Iris-setosa
17 | 5.7,4.4,1.5,0.4,Iris-setosa
18 | 5.4,3.9,1.3,0.4,Iris-setosa
19 | 5.1,3.5,1.4,0.3,Iris-setosa
20 | 5.7,3.8,1.7,0.3,Iris-setosa
21 | 5.1,3.8,1.5,0.3,Iris-setosa
22 | 5.4,3.4,1.7,0.2,Iris-setosa
23 | 5.1,3.7,1.5,0.4,Iris-setosa
24 | 4.6,3.6,1.0,0.2,Iris-setosa
25 | 5.1,3.3,1.7,0.5,Iris-setosa
26 | 4.8,3.4,1.9,0.2,Iris-setosa
27 | 5.0,3.0,1.6,0.2,Iris-setosa
28 | 5.0,3.4,1.6,0.4,Iris-setosa
29 | 5.2,3.5,1.5,0.2,Iris-setosa
30 | 5.2,3.4,1.4,0.2,Iris-setosa
31 | 4.7,3.2,1.6,0.2,Iris-setosa
32 | 4.8,3.1,1.6,0.2,Iris-setosa
33 | 5.4,3.4,1.5,0.4,Iris-setosa
34 | 5.2,4.1,1.5,0.1,Iris-setosa
35 | 5.5,4.2,1.4,0.2,Iris-setosa
36 | 4.9,3.1,1.5,0.1,Iris-setosa
37 | 5.0,3.2,1.2,0.2,Iris-setosa
38 | 5.5,3.5,1.3,0.2,Iris-setosa
39 | 4.9,3.1,1.5,0.1,Iris-setosa
40 | 4.4,3.0,1.3,0.2,Iris-setosa
41 | 5.1,3.4,1.5,0.2,Iris-setosa
42 | 5.0,3.5,1.3,0.3,Iris-setosa
43 | 4.5,2.3,1.3,0.3,Iris-setosa
44 | 4.4,3.2,1.3,0.2,Iris-setosa
45 | 5.0,3.5,1.6,0.6,Iris-setosa
46 | 5.1,3.8,1.9,0.4,Iris-setosa
47 | 4.8,3.0,1.4,0.3,Iris-setosa
48 | 5.1,3.8,1.6,0.2,Iris-setosa
49 | 4.6,3.2,1.4,0.2,Iris-setosa
50 | 5.3,3.7,1.5,0.2,Iris-setosa
51 | 5.0,3.3,1.4,0.2,Iris-setosa
52 | 7.0,3.2,4.7,1.4,Iris-versicolor
53 | 6.4,3.2,4.5,1.5,Iris-versicolor
54 | 6.9,3.1,4.9,1.5,Iris-versicolor
55 | 5.5,2.3,4.0,1.3,Iris-versicolor
56 | 6.5,2.8,4.6,1.5,Iris-versicolor
57 | 5.7,2.8,4.5,1.3,Iris-versicolor
58 | 6.3,3.3,4.7,1.6,Iris-versicolor
59 | 4.9,2.4,3.3,1.0,Iris-versicolor
60 | 6.6,2.9,4.6,1.3,Iris-versicolor
61 | 5.2,2.7,3.9,1.4,Iris-versicolor
62 | 5.0,2.0,3.5,1.0,Iris-versicolor
63 | 5.9,3.0,4.2,1.5,Iris-versicolor
64 | 6.0,2.2,4.0,1.0,Iris-versicolor
65 | 6.1,2.9,4.7,1.4,Iris-versicolor
66 | 5.6,2.9,3.6,1.3,Iris-versicolor
67 | 6.7,3.1,4.4,1.4,Iris-versicolor
68 | 5.6,3.0,4.5,1.5,Iris-versicolor
69 | 5.8,2.7,4.1,1.0,Iris-versicolor
70 | 6.2,2.2,4.5,1.5,Iris-versicolor
71 | 5.6,2.5,3.9,1.1,Iris-versicolor
72 | 5.9,3.2,4.8,1.8,Iris-versicolor
73 | 6.1,2.8,4.0,1.3,Iris-versicolor
74 | 6.3,2.5,4.9,1.5,Iris-versicolor
75 | 6.1,2.8,4.7,1.2,Iris-versicolor
76 | 6.4,2.9,4.3,1.3,Iris-versicolor
77 | 6.6,3.0,4.4,1.4,Iris-versicolor
78 | 6.8,2.8,4.8,1.4,Iris-versicolor
79 | 6.7,3.0,5.0,1.7,Iris-versicolor
80 | 6.0,2.9,4.5,1.5,Iris-versicolor
81 | 5.7,2.6,3.5,1.0,Iris-versicolor
82 | 5.5,2.4,3.8,1.1,Iris-versicolor
83 | 5.5,2.4,3.7,1.0,Iris-versicolor
84 | 5.8,2.7,3.9,1.2,Iris-versicolor
85 | 6.0,2.7,5.1,1.6,Iris-versicolor
86 | 5.4,3.0,4.5,1.5,Iris-versicolor
87 | 6.0,3.4,4.5,1.6,Iris-versicolor
88 | 6.7,3.1,4.7,1.5,Iris-versicolor
89 | 6.3,2.3,4.4,1.3,Iris-versicolor
90 | 5.6,3.0,4.1,1.3,Iris-versicolor
91 | 5.5,2.5,4.0,1.3,Iris-versicolor
92 | 5.5,2.6,4.4,1.2,Iris-versicolor
93 | 6.1,3.0,4.6,1.4,Iris-versicolor
94 | 5.8,2.6,4.0,1.2,Iris-versicolor
95 | 5.0,2.3,3.3,1.0,Iris-versicolor
96 | 5.6,2.7,4.2,1.3,Iris-versicolor
97 | 5.7,3.0,4.2,1.2,Iris-versicolor
98 | 5.7,2.9,4.2,1.3,Iris-versicolor
99 | 6.2,2.9,4.3,1.3,Iris-versicolor
100 | 5.1,2.5,3.0,1.1,Iris-versicolor
101 | 5.7,2.8,4.1,1.3,Iris-versicolor
102 | 6.3,3.3,6.0,2.5,Iris-virginica
103 | 5.8,2.7,5.1,1.9,Iris-virginica
104 | 7.1,3.0,5.9,2.1,Iris-virginica
105 | 6.3,2.9,5.6,1.8,Iris-virginica
106 | 6.5,3.0,5.8,2.2,Iris-virginica
107 | 7.6,3.0,6.6,2.1,Iris-virginica
108 | 4.9,2.5,4.5,1.7,Iris-virginica
109 | 7.3,2.9,6.3,1.8,Iris-virginica
110 | 6.7,2.5,5.8,1.8,Iris-virginica
111 | 7.2,3.6,6.1,2.5,Iris-virginica
112 | 6.5,3.2,5.1,2.0,Iris-virginica
113 | 6.4,2.7,5.3,1.9,Iris-virginica
114 | 6.8,3.0,5.5,2.1,Iris-virginica
115 | 5.7,2.5,5.0,2.0,Iris-virginica
116 | 5.8,2.8,5.1,2.4,Iris-virginica
117 | 6.4,3.2,5.3,2.3,Iris-virginica
118 | 6.5,3.0,5.5,1.8,Iris-virginica
119 | 7.7,3.8,6.7,2.2,Iris-virginica
120 | 7.7,2.6,6.9,2.3,Iris-virginica
121 | 6.0,2.2,5.0,1.5,Iris-virginica
122 | 6.9,3.2,5.7,2.3,Iris-virginica
123 | 5.6,2.8,4.9,2.0,Iris-virginica
124 | 7.7,2.8,6.7,2.0,Iris-virginica
125 | 6.3,2.7,4.9,1.8,Iris-virginica
126 | 6.7,3.3,5.7,2.1,Iris-virginica
127 | 7.2,3.2,6.0,1.8,Iris-virginica
128 | 6.2,2.8,4.8,1.8,Iris-virginica
129 | 6.1,3.0,4.9,1.8,Iris-virginica
130 | 6.4,2.8,5.6,2.1,Iris-virginica
131 | 7.2,3.0,5.8,1.6,Iris-virginica
132 | 7.4,2.8,6.1,1.9,Iris-virginica
133 | 7.9,3.8,6.4,2.0,Iris-virginica
134 | 6.4,2.8,5.6,2.2,Iris-virginica
135 | 6.3,2.8,5.1,1.5,Iris-virginica
136 | 6.1,2.6,5.6,1.4,Iris-virginica
137 | 7.7,3.0,6.1,2.3,Iris-virginica
138 | 6.3,3.4,5.6,2.4,Iris-virginica
139 | 6.4,3.1,5.5,1.8,Iris-virginica
140 | 6.0,3.0,4.8,1.8,Iris-virginica
141 | 6.9,3.1,5.4,2.1,Iris-virginica
142 | 6.7,3.1,5.6,2.4,Iris-virginica
143 | 6.9,3.1,5.1,2.3,Iris-virginica
144 | 5.8,2.7,5.1,1.9,Iris-virginica
145 | 6.8,3.2,5.9,2.3,Iris-virginica
146 | 6.7,3.3,5.7,2.5,Iris-virginica
147 | 6.7,3.0,5.2,2.3,Iris-virginica
148 | 6.3,2.5,5.0,1.9,Iris-virginica
149 | 6.5,3.0,5.2,2.0,Iris-virginica
150 | 6.2,3.4,5.4,2.3,Iris-virginica
151 | 5.9,3.0,5.1,1.8,Iris-virginica
--------------------------------------------------------------------------------
/T3W5/data/iris.names:
--------------------------------------------------------------------------------
1 | 1. Title: Iris Plants Database
2 | Updated Sept 21 by C.Blake - Added discrepency information
3 |
4 | 2. Sources:
5 | (a) Creator: R.A. Fisher
6 | (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
7 | (c) Date: July, 1988
8 |
9 | 3. Past Usage:
10 | - Publications: too many to mention!!! Here are a few.
11 | 1. Fisher,R.A. "The use of multiple measurements in taxonomic problems"
12 | Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions
13 | to Mathematical Statistics" (John Wiley, NY, 1950).
14 | 2. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
15 | (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
16 | 3. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
17 | Structure and Classification Rule for Recognition in Partially Exposed
18 | Environments". IEEE Transactions on Pattern Analysis and Machine
19 | Intelligence, Vol. PAMI-2, No. 1, 67-71.
20 | -- Results:
21 | -- very low misclassification rates (0% for the setosa class)
22 | 4. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE
23 | Transactions on Information Theory, May 1972, 431-433.
24 | -- Results:
25 | -- very low misclassification rates again
26 | 5. See also: 1988 MLC Proceedings, 54-64. Cheeseman et al's AUTOCLASS II
27 | conceptual clustering system finds 3 classes in the data.
28 |
29 | 4. Relevant Information:
30 | --- This is perhaps the best known database to be found in the pattern
31 | recognition literature. Fisher's paper is a classic in the field
32 | and is referenced frequently to this day. (See Duda & Hart, for
33 | example.) The data set contains 3 classes of 50 instances each,
34 | where each class refers to a type of iris plant. One class is
35 | linearly separable from the other 2; the latter are NOT linearly
36 | separable from each other.
37 | --- Predicted attribute: class of iris plant.
38 | --- This is an exceedingly simple domain.
39 | --- This data differs from the data presented in Fishers article
40 | (identified by Steve Chadwick, spchadwick@espeedaz.net )
41 | The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa"
42 | where the error is in the fourth feature.
43 | The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa"
44 | where the errors are in the second and third features.
45 |
46 | 5. Number of Instances: 150 (50 in each of three classes)
47 |
48 | 6. Number of Attributes: 4 numeric, predictive attributes and the class
49 |
50 | 7. Attribute Information:
51 | 1. sepal length in cm
52 | 2. sepal width in cm
53 | 3. petal length in cm
54 | 4. petal width in cm
55 | 5. class:
56 | -- Iris Setosa
57 | -- Iris Versicolour
58 | -- Iris Virginica
59 |
60 | 8. Missing Attribute Values: None
61 |
62 | Summary Statistics:
63 | Min Max Mean SD Class Correlation
64 | sepal length: 4.3 7.9 5.84 0.83 0.7826
65 | sepal width: 2.0 4.4 3.05 0.43 -0.4194
66 | petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
67 | petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
68 |
69 | 9. Class Distribution: 33.3% for each of 3 classes.
70 |
--------------------------------------------------------------------------------
/T3W5/linreg.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pandas as pd
3 | import matplotlib.pyplot as plt
4 |
5 | class LinearRegressionClassifier(object):
6 | def __init__(self):
7 | self.alpha = 1e-2
8 | self.iterations = 1000
9 | self.losses = []
10 | self.weights = None
11 | self.bias = None
12 |
13 | def forward(self, x):
14 | return np.dot(x, self.weights) + self.bias
15 |
16 | def backward(self, x, y_hat, y):
17 | m, d = x.shape
18 | y_hat = y_hat.reshape([m])
19 | y = y.reshape([m])
20 |
21 | partial_w = (1 / x.shape[0]) * (2 * np.dot(x.T, (y_hat - y)))
22 | partial_b = (1 / x.shape[0]) * (2 * np.sum(y_hat - y))
23 |
24 | return [partial_w, partial_b]
25 |
26 | def MSELoss(self, y_hat, y):
27 | return (1/y.shape[0]) * np.sum(np.square(y_hat - y))
28 |
29 | def update(self, grad):
30 | self.weights = self.weights - (self.alpha * grad[0])
31 | self.bias = self.bias - (self.alpha * grad[1])
32 |
33 | def fit(self, x, y):
34 | self.weights = np.random.uniform(0, 1, x.shape[1])
35 | self.bias = np.random.uniform(0, 1, 1)
36 | self.losses = []
37 |
38 | for i in range(self.iterations):
39 | y_hat = self.forward(x)
40 |
41 | loss = self.MSELoss(y_hat, y)
42 | self.losses.append(loss)
43 |
44 | grad = self.backward(x, y_hat, y)
45 |
46 | self.update(grad)
47 |
48 | def predict(self, x):
49 | return x
50 |
51 | def plot(self):
52 | plt.plot(range(self.iterations), self.losses, color="red")
53 | plt.title("Loss on Iris Dataset for {} iterations".format(self.iterations))
54 | plt.xlabel("Iteration")
55 | plt.ylabel("Loss")
56 | plt.show()
57 |
58 | cols = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
59 | df = pd.read_csv('./data/iris.csv', skiprows=1, header=0, names=cols)
60 |
61 | # replace class strings with integer indices
62 | df['class'] = df['class'].str.replace('Iris-setosa', '0')
63 | df['class'] = df['class'].str.replace('Iris-versicolor', '1')
64 | df['class'] = df['class'].str.replace('Iris-virginica', '2')
65 | df['class'] = df['class'].map(lambda x : int(x))
66 |
67 | X = df.iloc[:, :-1].values
68 | Y = df.iloc[:, -1].values.reshape(-1, 1)
69 | X = np.array(X)
70 | Y = np.array(Y)
71 |
72 | linreg = LinearRegressionClassifier()
73 | linreg.fit(X, Y)
74 | linreg.plot()
--------------------------------------------------------------------------------
/T4aW6/README.md:
--------------------------------------------------------------------------------
1 | # Tutorial 4 Week 6: Bias Variance Tradeoff
2 |
3 | In T4W6, I cover Bias-Variance Tradeoff. Find the tutorial slides [here](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1).
4 |
5 | > This tutorial was pretty difficult. I've attached a `FAQ.pdf` file that seeks to clarify certain details on this week's topics.
6 |
7 | ## Contents
8 | This repo contains answers for Questions 1 and 2.
9 |
10 | ### Question 1a
11 | Number of data points: Yes
12 | Amount of Noise: No
13 | Complexity of Target: No
14 |
15 | ### Question 1b
16 | Determinstic noise will increase as it gets harder for `H` to model `f`. Stochastic noise remains the same as it is independent of `H` and `f`. There is a greater chance of overfitting.
17 |
18 | ### Question 1c
19 | Determinstic noise will decrease as it gets eaier for `H` to model `f`. Stochastic noise remains the same as it is independent of `H` and `f`. There is a greater chance of overfitting.
20 |
21 | ---
22 |
23 | ### Question 2a
24 | Each blue point is the average training accuracy for an arbitrary value of `C`. It's the average of all the `10` accuracies for the 10-FCV.
25 |
26 | Each green point is the average validation accuracy for an arbitrary value of `C`. It's the average of all the `10` validation accuracies for the 10-FCV.
27 |
28 | ### Question 2b
29 | Each blue region represents the **variance** of the training accuracy for a value of `C`. It is calculated by getting the variance of all `10` accuracies for the 10-FCV.
30 |
31 | Similarly, the green region is the **variance** of the validation accuracy for a value of `C`. It's the variance of all `10` accuracies for the 10-FCV.
32 |
33 | ### Question 2c
34 | The best validation accuracy is reached when `C = 1`.
35 |
36 | > High training accuracy DOES NOT indicate high validation/testing accuracies. Always perform your train-test process to see if the model has generalised well to the unseen data before doing anything with the model (like deploying to production or using it IRL).
37 |
38 | ---
39 |
40 | ### Question 3a
41 |
42 | > The annotated proofs for this question can be found on the slides.
43 |
44 | ### Question 3b
45 | 1. Smaller `k` values with everything else constant, will increase the variance.
46 | 2. As `k` increases, bias increases. As the number of instances increases, we will be considering points further away from `x0` (closeness decreases) and the resulting predictions will move away from `f(x0)`.
47 |
48 | > Bias = Closeness to Truth
49 |
50 | ---
51 |
52 | ### Question 4
53 |
54 | > The annotated proofs for this question can be found on the slides.
--------------------------------------------------------------------------------
/T4aW6/Tutorial_4_FAQ.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rish-16/CS3244-Tutorial-Material/7681d6ad96354c3d672e5ef526dfd16f03eb5d0c/T4aW6/Tutorial_4_FAQ.pdf
--------------------------------------------------------------------------------
/T4bW7/README.md:
--------------------------------------------------------------------------------
1 | # Tutorial 4b Week 7: Regularisation and Validation
2 |
3 | > Hope you had a productive Recess Week! Let's try getting that 'A' for midterms :D
4 |
5 | In T4bW7, I cover the Regularisation and Validation. Find the tutorial slides [here](https://docs.google.com/presentation/d/1eE1In5ZS19YKgN3DN9VjNhBavHQoMaKB9NjZ-hreTG0/edit?usp=sharing).
6 |
7 | ## Contents
8 | This repo contains the code used to answer Questions 1, 2, and 3.
9 |
10 | ---
11 |
12 | ### Question 1a
13 | Training time : `m^2 * log(m)`
14 |
15 | In **LOO-CV**, one fold is one sample. There are `m-1` training samples and `1` testing sample, each performed `m` times for each sample. That means every sample gets its chance of being the testing sample. This is for _one_ model.
16 |
17 | Number of models: `30`
18 | Number of training samples: `m-1`
19 | Number of testing samples: `1`
20 |
21 | Total time: `30 * m * (m-1)^2 * log(m-1)`
22 |
23 | ### Question 1b
24 |
25 | In **10-FCV**, each fold has `m/10` samples inside. There are 9 training folds and 1 testing fold. Each `m/10`-sized fold gets its chance of being the testing fold. This is for _one_ model.
26 |
27 | Number of models: `30`
28 | Number of training samples: `9`
29 | Number of testing samples: `1`
30 | Training time for entire dataset of `m` samples : `m^2 * log(m)`
31 |
32 | Total time: `[30 * 10 * (9m/10)^2 * log(9m/10)] + [m^2 * log(m)]`
33 |
34 | ---
35 |
36 | ## How to read contour plots
37 | Before we get into Question 2, let's understand the figures given to us.
38 |
39 | The ellipses are contour plots that represent the altitudes of the function. Think of it as the graph surface coming out of the paper in 3 dimensions (like a volcano on paper). The lower the number next to a circle, the lower the altitude, and vice versa.
40 |
41 | 1. Find the minimum value of `Reg. Penalty + MSE term`
42 | 2. Return the corresponding values of `(Theta0, Theta1)`
43 |
44 | > It's OKAY to guess here! The values are rough _guesstimates_. Just eyeball it.
45 |
46 | ### Question 2a
47 | No regularisation means we only look at the MSE term. Find the values of `(Theta0, Theta1)` such that the value of the MSE term is minimum. This occurs at the circle at altitude `0.2` on either graph. The center of that circle corresponds to `Theta0 = ~0.9` and `Theta1 = 0.5`. It's alright if the value fluctuates `± 0.5` from the correct answer.
48 |
49 | ### Question 2b
50 | Look at graph 1. There are possible sums to consider:
51 |
52 | 1. Minimum MSE + Flexible Reg Penalty = `0.1 + 4.0 = 4.1` (NOPE)
53 | 2. Flexible MSE + Minimum Reg Penalty = `0.4 + 5.0 = 5.4` (NOPE)
54 | 3. Middle ground = `0.5 + 2.6 = 3.1` (CORRECT)
55 |
56 | > The minimum sum corresponds to the pair `(0.2, 0.25)`
57 |
58 | ### Question 2c
59 | Look at graph 2. There are possible sums to consider:
60 |
61 | 1. Minimum MSE + Flexible Reg Penalty = `0.1 + 9.0 = 9.1` (NOPE)
62 | 2. Middle ground = `2.5 + 2.2 = 4.7` (NOPE)
63 | 3. Flexible MSE + Minimum Reg Penalty = `0.0 + 4.4 = 4.4` (CORRECT)
64 |
65 | > The minimum sum corresponds to the pair `(0.0, 0.1)`
66 |
67 | ---
68 |
69 | ### Question 3a
70 | Time Series data is dependent on time. Breaking that natural order of time makes your data worthless. The best possible scenario is breaking the dataset without abrupt breakages in between. For example, you can store the past few days worth of temporal data points for training, and the future points for testing.
71 |
72 | > The value is in the time. Respect it.
73 |
74 | ### Question 3b
75 | Break your data into training, validation, and testing without switching the order of the samples or shuffling them. For example, suppose we have the following dataset with time going from `T1` to `T20`:
76 |
77 | ```
78 | Dataset = [T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20]
79 |
80 | Training = [T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11]
81 | Validation = [T12, T13, T14, T15]
82 | Testing = [T16, T17, T18, T19, T20]
83 | ```
84 |
85 | > Again, please respect the time component for temporal data.
86 |
87 | ### Question 3c
88 | Take adjacent pairs of data points for training and validation.
89 |
90 | ```
91 | Dataset = [1, 2, 3, 4]
92 |
93 | Training = [1] | Validation = [2]
94 | Training = [2] | Validation = [3]
95 | Training = [3] | Validation = [4]
96 | ```
97 |
98 | There are less-preferred alternatives:
99 |
100 | 1. `Training = [1] | Validation = [3]` -> decent model
101 | 2. `Training = [1, 2] | Validation = [3]` -> predicting too far into the future after limited training
102 | 3. `Training = [1, 2, 3] | Validation = [4]` -> better model but can't really compare to training fold in **1.**
103 |
104 | > The key is to break the dataset into comparable folds for training and testing that result in models that are not too different from one another.
--------------------------------------------------------------------------------
/T5W8/README.md:
--------------------------------------------------------------------------------
1 | # Tutorial 5 Week 8: Evaluation Metrics
2 |
3 | In T5W8, I cover Evaluation Metrics. Find the tutorial slides [here](https://docs.google.com/presentation/d/19QigHWaB3GTnhyNfWkbXYpcfvUT2wPq1Zi2ft40GATM/edit?usp=sharing).
4 |
5 | > This chapter is very important to you as ML practitioner. It gives us tools to analyse how our model is doing after training. These methods give us an indication of which direction to head in when stuck.
6 |
7 | ## Contents
8 | This repo contains answers for Questions 1 and 2.
9 |
10 | ### Question 1a
11 | | Sample | Prediction | Label |
12 | |--------|------------|---------|
13 | | x1 | 0 (NEG) | 0 (NEG) |
14 | | x2 | 0 (NEG) | 1 (POS) |
15 | | x3 | 0 (NEG) | 1 (POS) |
16 | | x4 | 0 (NEG) | 0 (NEG) |
17 | | x5 | 0 (NEG) | 0 (NEG) |
18 | | x6 | 1 (POS) | 1 (POS) |
19 | | x7 | 1 (POS) | 1 (POS) |
20 | | x8 | 1 (POS) | 0 (NEG) |
21 | | x9 | 1 (POS) | 1 (POS) |
22 | | x10 | 1 (POS) | 1 (POS) |
23 |
24 | | Submetric | (Pred, Actual) | Score |
25 | |-----------|----------------|-------|
26 | | TP | (POS, POS) | 4 |
27 | | FP | (POS, NEG) | 1 |
28 | | TN | (NEG, NEG) | 3 |
29 | | FN | (NEG, POS) | 2 |
30 |
31 | ```
32 | Precision = TP / (TP + FP) = 4/5 = 0.8
33 |
34 | Recall = TP / (TP + FN) = 4/6 = 0.67
35 |
36 | F1 = 2/(1/P + 1/R) = 2/(1/0.67 + 1/0.8) = 0.73
37 | ```
38 |
39 | ### Question 1b
40 | Brute force method of calculating F1 scores for all model outputs as thresholds will take `O(m^2)`.
41 |
42 | 1. Sort all samples – `O(m logm)`
43 | 2. For the first threshold, find TP, FN, FP, FN and calculate F1 Score – `O(m)`
44 | 3. Next threshold will take `O(1)` since we can change the 4 values in 2.
45 | 4. After the first computation, it'll take `O(m-1) ~ O(m)` for remaining `m-1` samples
46 |
47 | Total optimised run time is `O(m logm)`.
48 |
49 | ### Question 1c
50 | Here, the number of thresholds are increased beyond number of samples in the dataset.
51 |
52 | 1. Sort all samples – `O(m logm)`
53 | 2. If we pick a threshold between two samples (in sorted order), they'll give the same F1 score
54 | 3. This means there can only be `(m+1)` possible F1 scores to consider
55 | 4. We can binary search for the best F1 score peak – O(logm)
56 |
57 | ---
58 |
59 | ### Question 2a – Micro
60 | | _Dog_ | POS_act | NEG_act |
61 | |--------------|---------|---------|
62 | | **POS_pred** | 10 | 3 |
63 | | **NEG_pred** | 6 | 26 |
64 |
65 | | _Cat_ | POS_act | NEG_act |
66 | |--------------|---------|---------|
67 | | **POS_pred** | 13 | 5 |
68 | | **NEG_pred** | 6 | 21 |
69 |
70 | | _Pig_ | POS_act | NEG_act |
71 | |--------------|---------|---------|
72 | | **POS_pred** | 7 | 7 |
73 | | **NEG_pred** | 3 | 28 |
74 |
75 | ### Question 2b
76 | | _Combined_ | POS_act | NEG_act |
77 | |--------------|---------|---------|
78 | | **POS_pred** | 30 | 15 |
79 | | **NEG_pred** | 15 | 75 |
80 |
81 | ```
82 | Accuracy_micro = (TP + TN) / (TP + TN + FP + FN) = (30 + 75)/(30 + 75 + 15 + 15) = 0.778
83 |
84 | Precision_micro = TP / (TP + FP) = 30 / (30 + 15) = 0.667
85 |
86 | Recall_micro = TP / (TP + FN) = 30 / (30 + 15) = 0.667
87 |
88 | F1_micro = 2/(1/P + 1/R) = 2/(1/0.667 + 1/0.667) = 0.667
89 | ```
90 |
91 | ### Question 2c – Macro
92 | | _Dog_ | POS_act | NEG_act |
93 | |--------------|---------|---------|
94 | | **POS_pred** | 10 | 3 |
95 | | **NEG_pred** | 6 | 26 |
96 |
97 | Precision_Dog = 10 / (10 + 3) = 0.769
98 | Recall_Dog = 10 / (10 + 6) = 0.625
99 |
100 | | _Cat_ | POS_act | NEG_act |
101 | |--------------|---------|---------|
102 | | **POS_pred** | 13 | 5 |
103 | | **NEG_pred** | 6 | 21 |
104 |
105 | Precision_Cat = 13 / (13 + 5) = 0.722
106 | Recall_Cat = 13 / (13 + 6) = 0.684
107 |
108 | | _Pig_ | POS_act | NEG_act |
109 | |--------------|---------|---------|
110 | | **POS_pred** | 7 | 7 |
111 | | **NEG_pred** | 3 | 28 |
112 |
113 | Precision_Pig = 7 / (7 + 7) = 0.5
114 | Recall_Pig = 7 / (7 + 3) = 0.7
115 |
116 | ```
117 | Precision_micro = (P_Dog + P_Cat + P_Pig) / 3 = 0.664
118 | Recall_micro = (R_Dog + R_Cat + R_Pig) / 3 = 0.664
119 | ```
120 |
121 | ### Question 2d
122 | | Class | TP | FP |
123 | |-------|-----|-----|
124 | | A | 9 | 1 |
125 | | B | 100 | 900 |
126 | | C | 9 | 1 |
127 | | D | 9 | 1 |
128 |
129 | ```
130 | Precision_micro = (TP1 + TP2 + TP3) / [(TP1 + FP1) + (TP2 + FP2) + (TP3 + FP3)]
131 | = (9 + 100 + 9 + 9) / (10 + 1000 + 10 + 10)
132 | = 0.137
133 | ```
134 |
135 | ```
136 | Precision_A = Precision_C = Precision_D = 9 / 10 = 0.9
137 | Precision_B = 100 / 1000 = 0.1
138 |
139 | Precision_macro = (P1 + P2 + P3) / 4
140 | = (0.9 + 0.1 + 0.9 + 0.9) / 4
141 | = 0.7
142 | ```
143 |
144 | We can see that `Precision_macro` >>> `Precision_micro`.
145 |
146 | - The model has high Precision for classes A, C, and D, with low Precision for class B
147 | - `Precision_macro` takes the average of all individual Precision values, treating each class equally
148 | - It does not consider the heavy imbalance in class B
149 | - `Precision_macro` is relatively higher as a result
150 | - `Precision_micro` doesn't treat classes equally
151 | - The imbalances are factored into the calculation
152 | - Class B has low Precision and makes up the majority of the dataset
153 | - `Precision_micro` is relatively lower as a result
154 |
--------------------------------------------------------------------------------
/T6W9/README.md:
--------------------------------------------------------------------------------
1 | # T6W9: Visualisation and Dimensionality Reduction
2 |
3 | > I was busy tending to personal commitments during this session. TA Pranavan covered for me. Please approach him with any questions on this topic.
4 |
5 | In T6W9, TA Pranavan covers visualiation and Dimensionality Reduction techniques like PCA, LDA, and SMOTE. Please approach him on Slack for the slides.
--------------------------------------------------------------------------------
/T7W10/README.md:
--------------------------------------------------------------------------------
1 | # T7W10: Perceptrons and Neural Networks
2 |
3 | In T7W10, we cover Perceptrons, Multilayer Perceptrons, and Artificial Neural Networks (ANN). The content recap and tutorial solutions can be found on the slides [here](https://docs.google.com/presentation/d/1-nG_AElHlAuWQz0EsGt3K9WSjZVQ1OBVne2tjfZCdR0/edit?usp=sharing).
4 |
5 | ## Data Handling Clinic: Session 1
6 | I and TA Pranavan host the first session of DHC where we cover the basics and intermediate features of `numpy` and `pandas`.
7 |
8 | > The recording will *NOT* be made available in efforts to encourage live attendance and participation on the Zoom call.
9 |
10 | However, you can find the lesson material here:
11 | - [DHC Presentation Slides](https://tinyurl.com/3244-dhc-slides)
12 | - [DHC Mastercopy Colab Notebook](https://tinyurl.com/3244-dhc-mastercopy)
13 | - [DHC Student's Copy Colab Notebook](https://tinyurl.com/3244-dhc-stdnt)
--------------------------------------------------------------------------------
/T8W11/README.md:
--------------------------------------------------------------------------------
1 | # T8W11: CNNs and RNNs
2 |
3 | In T8W11, we cover Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). The content recap and tutorial solutions can be found on the slides [here](https://docs.google.com/presentation/d/1GnExFaXQQlO7wnlnCExNzHbvegDlv6G0JSqgNZ0Jp-g/edit?usp=sharing).
--------------------------------------------------------------------------------
/T9W12/README.md:
--------------------------------------------------------------------------------
1 | # T9W12: Explainable AI
2 |
3 | In T9W12, we cover Explainable AI. The content recap and tutorial solutions can be found on the slides [here](https://docs.google.com/presentation/d/1XRdBCLYpUGqMWIdfubslUbIkbvE8KFCrcKrfXBXDcA4/edit?usp=sharing).
--------------------------------------------------------------------------------
/misc/CS3244_Midterm_Cheatsheet.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rish-16/CS3244-Tutorial-Material/7681d6ad96354c3d672e5ef526dfd16f03eb5d0c/misc/CS3244_Midterm_Cheatsheet.pdf
--------------------------------------------------------------------------------