├── .github
    └── ISSUE_TEMPLATE
    │   └── question-request.md
├── .gitignore
├── LICENSE
├── README.md
├── T10W13
    └── README.md
├── T1W3
    ├── README.md
    ├── alice_knn.py
    ├── assets
    │   └── knn_bst.jpg
    ├── bob_knn.py
    └── q1.py
├── T2W4
    ├── README.md
    ├── data
    │   ├── iris.csv
    │   └── iris.names
    └── dtc.py
├── T3W5
    ├── Intro_to_Support_Vector_Machines.ipynb
    ├── README.md
    ├── data
    │   ├── iris.csv
    │   └── iris.names
    └── linreg.py
├── T4aW6
    ├── README.md
    └── Tutorial_4_FAQ.pdf
├── T4bW7
    └── README.md
├── T5W8
    └── README.md
├── T6W9
    └── README.md
├── T7W10
    └── README.md
├── T8W11
    └── README.md
├── T9W12
    └── README.md
└── misc
    └── CS3244_Midterm_Cheatsheet.pdf


/.github/ISSUE_TEMPLATE/question-request.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Question Request
 3 | about: Suggest questions you want answers to!
 4 | title: Question Suggestions!
 5 | labels: question
 6 | assignees: rish-16
 7 | 
 8 | ---
 9 | 
10 | **Drop your question(s) here.**
11 | Are Neural Networks better than Decision Trees?
12 | 
13 | **What topic(s) is this question from?**
14 | Neural Networks, Decision Trees
15 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2021 Rishabh Anand
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # CS3244-Tutorial-Material
 2 | All supplementary material used by me while TA-ing **[CS3244: Machine Learning](https://nusmods.com/modules/CS3244/machine-learning)** at NUS School of Computing.
 3 | 
 4 | ## What is this?
 5 | I teach **TG-06**, the tutorial that takes place every **Monday, 1200-1300** in AY21/22 Semester 1. It is *fully online* this semester.
 6 | 
 7 | > Unless the syllabus has drastically changed, I believe the material covered here is relevant for future AYs as well (eg: AY22/23++). The module might be deprecated soon so do take note! In future iterations of SOC's Intro to ML module, I still feel the material herein is good enough for preparation purposes.
 8 | 
 9 | This repository contains code, figures, and miscelleaneous items that aid me in teaching my class. The main source of reference *should* be the lecture notes and tutorial questions created by the CS3244 Professors and Teaching Staff. 
10 | 
11 | > Official tutorial solutions will be released at the end of every week.
12 | 
13 | ## Contents
14 | 
15 | Here's a list of what I've covered / I'll be covering in my tutorials:
16 | 
17 | - **[T1W3](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T1W3):** k-Nearest Neighbours
18 | - **[T2W4](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T2W4):** Decision Trees
19 | - **[T3W5](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T3W5):** Linear Models
20 | - **[T4aW6](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T4aW6):** Bias-Variance Tradeoff
21 | - **[T4bW7](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T4bW7):** Regularisation & Validation
22 | - **[T5W8](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T5W8):** Evaluation Metrics
23 | - **[T6W9](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T6W9):** Visualisation & Dimensionality Reduction (Approach TA Pranavan)
24 | - **[T7W10](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T7W9):** Perceptrons and Neural Networks
25 | - **[T8W11](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T8W11):** CNNs and RNNs
26 | - **[T9W12](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T9W12):** Explainable AI
27 | - **[T10W13](https://github.com/rish-16/CS3244-Tutorial-Material/tree/main/T10W13):** Unsupervised Learning
28 | 
29 | > The link to the slides for all my tutorials can be found in the `README.md` in each week's respective folder.
30 | 
31 | ## Exam Resources
32 | I've prepared some extra resources that might aid you in your exam preparation. You can find the files here:
33 | 
34 | - [**Midterm Cheatsheet:**](https://github.com/rish-16/CS3244-Tutorial-Material/blob/main/misc/CS3244_Midterm_Cheatsheet.pdf) Lectures `1a: Intro & Class Org.` to `6: Bias Variance Tradeoff`
35 | 
36 | ## Contributions
37 | If there are any issues or suggestions, feel free to raise an Issue or PR. All meaningful contributions welcome!
38 | 
39 | ## License
40 | [MIT](https://github.com/rish-16/CS3244-Tutorial-Material/blob/main/LICENSE)
41 | 


--------------------------------------------------------------------------------
/T10W13/README.md:
--------------------------------------------------------------------------------
1 | # T10W13: Unsupervised Learning
2 | 
3 | In T10W13, we cover Unsupervised Learning. The content recap and tutorial solutions can be found on the slides [here](https://docs.google.com/presentation/d/1zfhCQrQMMMdCdSbLmCy9x9brN4VoTyLkHfDu6QPQwNM/edit?usp=sharing).


--------------------------------------------------------------------------------
/T1W3/README.md:
--------------------------------------------------------------------------------
  1 | # Tutorial 1 Week 3: k-Nearest Neighbours
  2 | 
  3 | In T1W3, I cover the k-Nearest Neighbours (k-NN) algorithm. Find the tutorial slides [here](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1).
  4 | 
  5 | ## Contents
  6 | This repo contains the code used to answer Questions 2, 3, 4.
  7 | 
  8 | ### Question 2a
  9 | Here's the ranking table used to classify the new point `(1, 1)` using 3-NN: 
 10 | ```
 11 | Rank      Point     Distance       Label
 12 | 1         (0, 1)    1.000          1    
 13 | 2         (1, 0)    1.000          1    
 14 | 3         (1, 2)    1.000          1    
 15 | 4         (0, 2)    1.414          0    
 16 | 5         (2, 2)    1.414          0    
 17 | 6         (-1, 1)   2.000          0    
 18 | 7         (1, -1)   2.000          0    
 19 | 8         (2, 3)    2.236          1    
 20 | 
 21 | Rank      Point     Distance       Label
 22 | 1         (0, 1)    1.000          1    
 23 | 2         (1, 0)    1.000          1    
 24 | 3         (1, 2)    1.000          1    
 25 | 
 26 | The new point (1, 1) belongs to class 1 using 3-NN.
 27 | ```
 28 | 
 29 | Here's the ranking table used to classify the new point `(1, 1)` using 7-NN:
 30 | ```
 31 | Rank      Point     Distance       Label
 32 | 1         (0, 1)    1.000          1    
 33 | 2         (1, 0)    1.000          1    
 34 | 3         (1, 2)    1.000          1    
 35 | 4         (0, 2)    1.414          0    
 36 | 5         (2, 2)    1.414          0    
 37 | 6         (-1, 1)   2.000          0    
 38 | 7         (1, -1)   2.000          0    
 39 | 8         (2, 3)    2.236          1    
 40 | 
 41 | Rank      Point     Distance       Label
 42 | 1         (0, 1)    1.000          1    
 43 | 2         (1, 0)    1.000          1    
 44 | 3         (1, 2)    1.000          1    
 45 | 4         (0, 2)    1.414          0    
 46 | 5         (2, 2)    1.414          0    
 47 | 6         (-1, 1)   2.000          0    
 48 | 7         (1, -1)   2.000          0    
 49 | 
 50 | The new point (1, 1) belongs to class 0 using 7-NN.
 51 | ```
 52 | 
 53 | ### Question 2b
 54 | Larger values of `k` will lead to smoother decision boundaries. This leads to lower chances of overfitting (Covered in `T3W5`). So, the order is:
 55 | 
 56 | ```
 57 | k_l < k_c < k_r
 58 | ```
 59 | 
 60 | ### Question 2c
 61 | Time taken to run inference on test dataset for vanilla `k-NN` is indepedent of `k`. Altogether, we'll still be taking `m * t` time given a dataset of `m` samples.
 62 | 
 63 | --- 
 64 | 
 65 | ### Question 3a
 66 | Both algorithms are correct. Alice's algorithm runs in `O(n(d+k))` while that of Bob runs in `O(ndk)`. Alice's algorithm is much faster.
 67 | 
 68 | For implementations, check out `alice_knn.py` and `bob_knn.py`.
 69 | 
 70 | ### Question 3b
 71 | Maintain a Balanced BST (Min/Max Heap) with `k` nodes where the BST tracks the top `k` smallest distances. This would reduce the running time to `O(n(d + logk))`. Here's how you can do it:
 72 | 
 73 | 1. Calculate distances between all the `n` points and the new observation. This takes `O(nd)`
 74 | 
 75 | 2. Add the first `k` distances into a Balanced BST
 76 | 
 77 | 3. Look at the `n-k` unadded distances and iterate through them
 78 | 
 79 | 4. If current distance is > BST root, ignore and move to the next one. This takes `O(1)`
 80 | 
 81 | 5. If current distance is < BST root, remove root, insert current distance, and move on. This takes `O(n * logk)` for all `n` samples
 82 | 
 83 | 6. By the end, you'll have the correct answers occupying all the `k` nodes in the BST. This takes `O(nd + nlogk)` in total
 84 | 
 85 | > Essentially, we are taking the first `k` distances that may be incorrect and replacing them one by one with the correct `k` distances.
 86 | 
 87 | <p align="center">
 88 |     <img src="./assets/knn_bst.jpg" width=500 />
 89 | </p>    
 90 | 
 91 | ```
 92 | if d_(k+1) < d_root:
 93 |     d_root = d_(k+1)
 94 | ```
 95 | 
 96 | ---
 97 | 
 98 | ### Question 4
 99 | No. The difference between the two ranges give `0.4` and `10º Celcius`. This means the `Temperature` variable will dominate the k-NN when calculating the Euclidean Distance, minimising the impact or effect of the `Humidity` variable.
100 | 
101 | We can minimise the effect of this disproportion by **normalising** or **standardising** the inputs to a suitable range that won't affect the distance metric immensely. This will be covered in future classes.
102 | 


--------------------------------------------------------------------------------
/T1W3/alice_knn.py:
--------------------------------------------------------------------------------
 1 | import math
 2 | from pprint import pprint
 3 | 
 4 | class Point:
 5 |     def __init__(self, x, y, label=None):
 6 |         self.x = x
 7 |         self.y = y
 8 |         self.label = label
 9 |         
10 |     def euclidean_distance(self, pt):
11 |         '''
12 |         Calculates Euclidean Distance
13 |         [(x1 - x2)^2 + (y1 - y2)^2] ** 0.5
14 |         '''
15 |         return math.sqrt(
16 |             (self.x - pt.x)**2 + (self.y - pt.y)**2
17 |         )
18 |     
19 |     def __str__(self):
20 |         return "({}, {}) | {}".format(self.x, self.y, self.label)
21 |     
22 |     def __repr__(self):
23 |         return "({}, {}) | {}".format(self.x, self.y, self.label)
24 |         
25 | points = [
26 |     Point(-1, 1, 0),
27 |     Point(0, 1, 1),
28 |     Point(0, 2, 0),
29 |     Point(1, -1, 0),
30 |     Point(1, 0, 1),
31 |     Point(1, 2, 1),
32 |     Point(2, 2, 0),
33 |     Point(2, 3, 1)
34 | ]
35 | 
36 | new = Point(1, 1)
37 | k = 3
38 | 
39 | S = [0 for _ in range(len(points))]
40 | D = list(map(lambda pt : pt.euclidean_distance(new), points)) # O(nd)
41 | answers = []
42 | 
43 | # O(nk)
44 | for i in range(len(points)):
45 |     for j in range(k):
46 |         i_min = min(range(len(D)), key=D.__getitem__)
47 |         min_D = min(D)
48 |         
49 |         if S[i_min] == 0:
50 |             S[i_min] = 1
51 |             D[i_min] = float('inf') # past smallest will not be picked again
52 |             answers.append(i_min)
53 |         
54 | for i in answers[:k]:
55 |     print (points[i])


--------------------------------------------------------------------------------
/T1W3/assets/knn_bst.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rish-16/CS3244-Tutorial-Material/7681d6ad96354c3d672e5ef526dfd16f03eb5d0c/T1W3/assets/knn_bst.jpg


--------------------------------------------------------------------------------
/T1W3/bob_knn.py:
--------------------------------------------------------------------------------
 1 | import math
 2 | from pprint import pprint
 3 | 
 4 | class Point:
 5 |     def __init__(self, x, y, label=None):
 6 |         self.x = x
 7 |         self.y = y
 8 |         self.label = label
 9 |         
10 |     def euclidean_distance(self, pt):
11 |         '''
12 |         Calculates Euclidean Distance
13 |         [(x1 - x2)^2 + (y1 - y2)^2] ** 0.5
14 |         '''
15 |         return math.sqrt(
16 |             (self.x - pt.x)**2 + (self.y - pt.y)**2
17 |         )
18 |     
19 |     def __str__(self):
20 |         return "({}, {}) | {}".format(self.x, self.y, self.label)
21 |     
22 |     def __repr__(self):
23 |         return "({}, {}) | {}".format(self.x, self.y, self.label)
24 |         
25 | points = [
26 |     Point(-1, 1, 0),
27 |     Point(0, 1, 1),
28 |     Point(0, 2, 0),
29 |     Point(1, -1, 0),
30 |     Point(1, 0, 1),
31 |     Point(1, 2, 1),
32 |     Point(2, 2, 0),
33 |     Point(2, 3, 1)
34 | ]
35 | 
36 | new = Point(1, 1)
37 | k = 3
38 | 
39 | S = [0 for _ in range(len(points))]
40 | 
41 | # O(k(nd))
42 | for i in range(k): # O(k)
43 |     filtered_pts = [points[i] for i in range(len(S)) if S[i] == 0] # O(n)
44 |     D = list(map(lambda pt : pt.euclidean_distance(new), filtered_pts)) # O(nd)
45 |     i_min = min(range(len(D)), key=D.__getitem__) # O(n)
46 |     S[i_min] = 1
47 |     
48 | for i in (range(len(S))):
49 |     if S[i] == 1:
50 |         print (points[i], points[i].euclidean_distance(new))


--------------------------------------------------------------------------------
/T1W3/q1.py:
--------------------------------------------------------------------------------
 1 | import math
 2 | from pprint import pprint
 3 | 
 4 | class Point:
 5 |     def __init__(self, x, y, label=None):
 6 |         self.x = x
 7 |         self.y = y
 8 |         self.label = label
 9 |         
10 |     def euclidean_distance(self, pt):
11 |         '''
12 |         Calculates Euclidean Distance
13 |         [(x1 - x2)^2 + (y1 - y2)^2] ** 0.5
14 |         '''
15 |         return math.sqrt(
16 |             (self.x - pt.x)**2 + (self.y - pt.y)**2
17 |         )
18 |         
19 | points = [
20 |     Point(-1, 1, 0),
21 |     Point(0, 1, 1),
22 |     Point(0, 2, 0),
23 |     Point(1, -1, 0),
24 |     Point(1, 0, 1),
25 |     Point(1, 2, 1),
26 |     Point(2, 2, 0),
27 |     Point(2, 3, 1)
28 | ]
29 | 
30 | new = Point(1, 1)
31 | k = 7
32 | 
33 | # k-NN algorithm
34 | distances = list(map(
35 |     lambda pt : [pt.euclidean_distance(new), pt.x, pt.y, pt.label],
36 |     points
37 | ))
38 | 
39 | sorted_distances = sorted(distances, key=lambda pt : pt[0])
40 | 
41 | print ('{:<10}{:<10}{:<15}{:<5}'.format("Rank", "Point", "Distance", "Label"))
42 | 
43 | for i, rec in enumerate(sorted_distances, 0):
44 |     print ('{:<10}{:<10}{:<15.3f}{:<5}'.format(
45 |         i+1, 
46 |         "(" + str(rec[1]) + ", " + str(rec[2])+ ")", 
47 |         rec[0], 
48 |         rec[3]
49 |         ))
50 | 
51 | print ()
52 | print ('{:<10}{:<10}{:<15}{:<5}'.format("Rank", "Point", "Distance", "Label"))
53 | 
54 | for rec in sorted_distances[:k]:
55 |     print ('{:<10}{:<10}{:<15.3f}{:<5}'.format(
56 |         i+1, 
57 |         "(" + str(rec[1]) + ", " + str(rec[2])+ ")", 
58 |         rec[0], 
59 |         rec[3]
60 |         ))
61 |     
62 | print ("\nThe new point (1, 1) belongs to class 1.")


--------------------------------------------------------------------------------
/T2W4/README.md:
--------------------------------------------------------------------------------
  1 | # Tutorial 2 Week 4: Decision Trees
  2 | 
  3 | In T2W4, I cover the Decision Tree algorithm. Find the tutorial slides [here](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1).
  4 | 
  5 | I have included the implementation of a Decision Tree classifier in pure Python using `numpy` and `pandas` in `dtc.py`. You can call it in a similar fashion to that in `sklearn` using the `DecisionTreeClassifier` object. I train it on the popular *Iris Type Classification Dataset* found in `data/iris.csv`.
  6 | 
  7 | ## Contents
  8 | This repo contains the code used to answer Questions 1, 2, 3, and 4.
  9 | 
 10 | ### Question 1a
 11 | 
 12 | ```
 13 |                   S_1
 14 |                 /     \
 15 |               0/       \1
 16 |             F=0        S_2
 17 |                        / \
 18 |                      0/   \1
 19 |                    F=1    S_3
 20 |                          /   \
 21 |                        0/     \1
 22 |                       F=1     F=0
 23 | ```
 24 | 
 25 | ### Question 1b
 26 | Function `F` can be represented as `AND(S_2, S_3)`. We can build as tree that's of depth 2:
 27 | 
 28 | ```
 29 |                   S_2
 30 |                 /     \
 31 |               0/       \1
 32 |             S_3         S_3
 33 |             / \         / \
 34 |           0/   \1     0/   \1
 35 |          F=0   F=1   F=1   F=1
 36 | ```
 37 | 
 38 | If your memory of the `AND` gate is fuzzy, here's a tabular summary:
 39 | 
 40 | | **A** | **B** | **AND** |
 41 | |-------|-------|---------|
 42 | | 0     | 0     | 0       |
 43 | | 0     | 1     | 0       |
 44 | | 1     | 0     | 0       |
 45 | | 1     | 1     | 1       |
 46 | 
 47 | ### Question 1c
 48 | 
 49 | If your memory of the `XOR` gate is fuzzy, here's a tabular summary:
 50 | 
 51 | | **A** | **B** | **XOR** |
 52 | |-------|-------|---------|
 53 | | 0     | 0     | 0       |
 54 | | 0     | 1     | 1       |
 55 | | 1     | 0     | 1       |
 56 | | 1     | 1     | 0       |
 57 | 
 58 | To implement this `XOR` gate, we'd need `2^d` leaf nodes and `2^d - 1` internal nodes. It grows exponentially with `d`. Using a Decision Tree is not scalable. Pruning is not possible because we need to consider every single input (ie. a feature) – we can't just ignore any of them.
 59 | 
 60 | You can, however, implement `AND` and `OR` gates using a DT since pruning is possible. We dont need to consider all our inputs. For example, for an `AND` gate, if any one of our inputs is `0`, the result is `0` regardless of the other inputs. Likewise, if any feature in an `OR` gate is `1`, the result is `1` regardless of the other inputs.
 61 | 
 62 | ---
 63 | 
 64 | ### Question 2a
 65 | The features are as follows:
 66 | 
 67 | - `Income`
 68 | - `Credit History`
 69 | - `Debt`
 70 | 
 71 | The label is `Decision`.
 72 | 
 73 | At each level, the main question we will be asking is,
 74 | 
 75 | 
 76 | > Which feature to choose such that splitting via that gives us the "greatest purity" ie. the most even split between samples.
 77 | 
 78 | The tree would look like so:
 79 | 
 80 | ```
 81 |                    CrHi?
 82 |             /        |       \        
 83 |         Bad/     Good|        \Unknown
 84 |           /          |         \
 85 |          Rej        App        Income?
 86 |                             /     |     \
 87 |                            /      |      \
 88 |                       0-5K/  5-10K|       \10K+ 
 89 |                        Debt      App      App
 90 |                       /   \
 91 |                   Low/     \High
 92 |                    App     Rej
 93 | ``` 
 94 | 
 95 | *Note: Refer to the slides for more on Information Gain and Entropy. We covered Claude Shannon's Information Theory in this class!*
 96 | 
 97 | ### Question 2b
 98 | 
 99 | Tree 1:
100 | ```
101 |            CrHi?
102 |        /     |     \
103 |   Good/   Bad|      \Unknown
104 |      App    Rej     Income?
105 |                  /     |     \
106 |             0-5K/ 5-10K|      \10K+
107 |               App     App     App
108 | ```
109 | 
110 | Tree 2:
111 | ```
112 |            CrHi?
113 |        /     |     \
114 |   Good/   Bad|      \Unknown
115 |      App    Rej      Debt?
116 |                     /     \
117 |                 Low/       \High
118 |                  App       App
119 | ```
120 | 
121 | Tree 3:
122 | ```
123 |                        Income?
124 | 
125 |           /               |               \
126 |          /                |                \
127 |     0-5K/            5-10K|                 \10K+
128 |      Debt?              Debt?               Debt?
129 |     /     \            /     \             /     \
130 | Low/       \High   Low/       \High    Low/       \High
131 |  App       Rej      App       App       App       App
132 | ```
133 | 
134 | ### Question 2c
135 | > Of course, you must convert (encode) these strings like `GOOD`, `BAD`, `HIGH`, `LOW` to numeric values. So `GOOD = 1` and `BAD = 0`, for example. Same goes for the labels. ML Models ***DO NOT*** work with raw strings, only numbers.
136 | 
137 | `DT($4K, GOOD CH, HIGH debt) = Approve`
138 | 
139 | > Hint: Just follow path down to the leaf in your DT Classifier from Question 2a.
140 | 
141 | If we use our 3 DTs, the results will be the following:
142 | 
143 | ```
144 | Tree 1: Approve
145 | Tree 2: Approve
146 | Tree 3: Reject
147 | ```
148 | 
149 | If we use uniform voting (every tree gets equal say ie. majority voting), we `Approve` the application since 2/3 classifiers agree.
150 | 
151 | --- 
152 | 
153 | ### Question 3a
154 | Debt depends on Income. Person A with income of $5K and a debt of $4K, and Person B with income $15K and a debt of $4K, results in Person A being in `HIGH` debt while Person B is in `LOW` debt. Debt is categorical and Income is a quantifiable, continuous variable. This makes the explainability ambiguous.
155 | 
156 | ### Question 3b
157 | Empirically, Decision Trees are bad performers on datasets with missing values. To calculate metrics like Information Gain and Entropy, it is nice to have all the information in front of us. Missing data makes these measures unreliable, making the DT classifier inaccurate. Replacing missing values with alternatives (`mean`, `max`, `min`, `mode`, etc.) could easily skew your data (think of it as "poisoning your dataset"). Also, dropping the affected rows makes the dataset smaller and non-representative of those specific cases (your model won't know what to do in those cases anymore).
158 | 
159 | ### Question 3c
160 | Decision Trees do not consider temporal (time-related) features. You are introducing heavy class imbalance into your dataset by appending rows of data with a `REJECT` decision. Your model might overfit on this new biased dataset. Always try to maintain a good balance of `positive` and `negative` cases in your dataset to allow for better generalisation.
161 | 
162 | > More on class imbalance and overfitting in future weeks!
163 | 
164 | ---
165 | 
166 | ### Question 4
167 | Refer to the working covered in class. You can find the official working in the [slides](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1).


--------------------------------------------------------------------------------
/T2W4/data/iris.csv:
--------------------------------------------------------------------------------
  1 | sepal_length, sepal_width, petal_length, petal_width, class
  2 | 5.1,3.5,1.4,0.2,Iris-setosa
  3 | 4.9,3.0,1.4,0.2,Iris-setosa
  4 | 4.7,3.2,1.3,0.2,Iris-setosa
  5 | 4.6,3.1,1.5,0.2,Iris-setosa
  6 | 5.0,3.6,1.4,0.2,Iris-setosa
  7 | 5.4,3.9,1.7,0.4,Iris-setosa
  8 | 4.6,3.4,1.4,0.3,Iris-setosa
  9 | 5.0,3.4,1.5,0.2,Iris-setosa
 10 | 4.4,2.9,1.4,0.2,Iris-setosa
 11 | 4.9,3.1,1.5,0.1,Iris-setosa
 12 | 5.4,3.7,1.5,0.2,Iris-setosa
 13 | 4.8,3.4,1.6,0.2,Iris-setosa
 14 | 4.8,3.0,1.4,0.1,Iris-setosa
 15 | 4.3,3.0,1.1,0.1,Iris-setosa
 16 | 5.8,4.0,1.2,0.2,Iris-setosa
 17 | 5.7,4.4,1.5,0.4,Iris-setosa
 18 | 5.4,3.9,1.3,0.4,Iris-setosa
 19 | 5.1,3.5,1.4,0.3,Iris-setosa
 20 | 5.7,3.8,1.7,0.3,Iris-setosa
 21 | 5.1,3.8,1.5,0.3,Iris-setosa
 22 | 5.4,3.4,1.7,0.2,Iris-setosa
 23 | 5.1,3.7,1.5,0.4,Iris-setosa
 24 | 4.6,3.6,1.0,0.2,Iris-setosa
 25 | 5.1,3.3,1.7,0.5,Iris-setosa
 26 | 4.8,3.4,1.9,0.2,Iris-setosa
 27 | 5.0,3.0,1.6,0.2,Iris-setosa
 28 | 5.0,3.4,1.6,0.4,Iris-setosa
 29 | 5.2,3.5,1.5,0.2,Iris-setosa
 30 | 5.2,3.4,1.4,0.2,Iris-setosa
 31 | 4.7,3.2,1.6,0.2,Iris-setosa
 32 | 4.8,3.1,1.6,0.2,Iris-setosa
 33 | 5.4,3.4,1.5,0.4,Iris-setosa
 34 | 5.2,4.1,1.5,0.1,Iris-setosa
 35 | 5.5,4.2,1.4,0.2,Iris-setosa
 36 | 4.9,3.1,1.5,0.1,Iris-setosa
 37 | 5.0,3.2,1.2,0.2,Iris-setosa
 38 | 5.5,3.5,1.3,0.2,Iris-setosa
 39 | 4.9,3.1,1.5,0.1,Iris-setosa
 40 | 4.4,3.0,1.3,0.2,Iris-setosa
 41 | 5.1,3.4,1.5,0.2,Iris-setosa
 42 | 5.0,3.5,1.3,0.3,Iris-setosa
 43 | 4.5,2.3,1.3,0.3,Iris-setosa
 44 | 4.4,3.2,1.3,0.2,Iris-setosa
 45 | 5.0,3.5,1.6,0.6,Iris-setosa
 46 | 5.1,3.8,1.9,0.4,Iris-setosa
 47 | 4.8,3.0,1.4,0.3,Iris-setosa
 48 | 5.1,3.8,1.6,0.2,Iris-setosa
 49 | 4.6,3.2,1.4,0.2,Iris-setosa
 50 | 5.3,3.7,1.5,0.2,Iris-setosa
 51 | 5.0,3.3,1.4,0.2,Iris-setosa
 52 | 7.0,3.2,4.7,1.4,Iris-versicolor
 53 | 6.4,3.2,4.5,1.5,Iris-versicolor
 54 | 6.9,3.1,4.9,1.5,Iris-versicolor
 55 | 5.5,2.3,4.0,1.3,Iris-versicolor
 56 | 6.5,2.8,4.6,1.5,Iris-versicolor
 57 | 5.7,2.8,4.5,1.3,Iris-versicolor
 58 | 6.3,3.3,4.7,1.6,Iris-versicolor
 59 | 4.9,2.4,3.3,1.0,Iris-versicolor
 60 | 6.6,2.9,4.6,1.3,Iris-versicolor
 61 | 5.2,2.7,3.9,1.4,Iris-versicolor
 62 | 5.0,2.0,3.5,1.0,Iris-versicolor
 63 | 5.9,3.0,4.2,1.5,Iris-versicolor
 64 | 6.0,2.2,4.0,1.0,Iris-versicolor
 65 | 6.1,2.9,4.7,1.4,Iris-versicolor
 66 | 5.6,2.9,3.6,1.3,Iris-versicolor
 67 | 6.7,3.1,4.4,1.4,Iris-versicolor
 68 | 5.6,3.0,4.5,1.5,Iris-versicolor
 69 | 5.8,2.7,4.1,1.0,Iris-versicolor
 70 | 6.2,2.2,4.5,1.5,Iris-versicolor
 71 | 5.6,2.5,3.9,1.1,Iris-versicolor
 72 | 5.9,3.2,4.8,1.8,Iris-versicolor
 73 | 6.1,2.8,4.0,1.3,Iris-versicolor
 74 | 6.3,2.5,4.9,1.5,Iris-versicolor
 75 | 6.1,2.8,4.7,1.2,Iris-versicolor
 76 | 6.4,2.9,4.3,1.3,Iris-versicolor
 77 | 6.6,3.0,4.4,1.4,Iris-versicolor
 78 | 6.8,2.8,4.8,1.4,Iris-versicolor
 79 | 6.7,3.0,5.0,1.7,Iris-versicolor
 80 | 6.0,2.9,4.5,1.5,Iris-versicolor
 81 | 5.7,2.6,3.5,1.0,Iris-versicolor
 82 | 5.5,2.4,3.8,1.1,Iris-versicolor
 83 | 5.5,2.4,3.7,1.0,Iris-versicolor
 84 | 5.8,2.7,3.9,1.2,Iris-versicolor
 85 | 6.0,2.7,5.1,1.6,Iris-versicolor
 86 | 5.4,3.0,4.5,1.5,Iris-versicolor
 87 | 6.0,3.4,4.5,1.6,Iris-versicolor
 88 | 6.7,3.1,4.7,1.5,Iris-versicolor
 89 | 6.3,2.3,4.4,1.3,Iris-versicolor
 90 | 5.6,3.0,4.1,1.3,Iris-versicolor
 91 | 5.5,2.5,4.0,1.3,Iris-versicolor
 92 | 5.5,2.6,4.4,1.2,Iris-versicolor
 93 | 6.1,3.0,4.6,1.4,Iris-versicolor
 94 | 5.8,2.6,4.0,1.2,Iris-versicolor
 95 | 5.0,2.3,3.3,1.0,Iris-versicolor
 96 | 5.6,2.7,4.2,1.3,Iris-versicolor
 97 | 5.7,3.0,4.2,1.2,Iris-versicolor
 98 | 5.7,2.9,4.2,1.3,Iris-versicolor
 99 | 6.2,2.9,4.3,1.3,Iris-versicolor
100 | 5.1,2.5,3.0,1.1,Iris-versicolor
101 | 5.7,2.8,4.1,1.3,Iris-versicolor
102 | 6.3,3.3,6.0,2.5,Iris-virginica
103 | 5.8,2.7,5.1,1.9,Iris-virginica
104 | 7.1,3.0,5.9,2.1,Iris-virginica
105 | 6.3,2.9,5.6,1.8,Iris-virginica
106 | 6.5,3.0,5.8,2.2,Iris-virginica
107 | 7.6,3.0,6.6,2.1,Iris-virginica
108 | 4.9,2.5,4.5,1.7,Iris-virginica
109 | 7.3,2.9,6.3,1.8,Iris-virginica
110 | 6.7,2.5,5.8,1.8,Iris-virginica
111 | 7.2,3.6,6.1,2.5,Iris-virginica
112 | 6.5,3.2,5.1,2.0,Iris-virginica
113 | 6.4,2.7,5.3,1.9,Iris-virginica
114 | 6.8,3.0,5.5,2.1,Iris-virginica
115 | 5.7,2.5,5.0,2.0,Iris-virginica
116 | 5.8,2.8,5.1,2.4,Iris-virginica
117 | 6.4,3.2,5.3,2.3,Iris-virginica
118 | 6.5,3.0,5.5,1.8,Iris-virginica
119 | 7.7,3.8,6.7,2.2,Iris-virginica
120 | 7.7,2.6,6.9,2.3,Iris-virginica
121 | 6.0,2.2,5.0,1.5,Iris-virginica
122 | 6.9,3.2,5.7,2.3,Iris-virginica
123 | 5.6,2.8,4.9,2.0,Iris-virginica
124 | 7.7,2.8,6.7,2.0,Iris-virginica
125 | 6.3,2.7,4.9,1.8,Iris-virginica
126 | 6.7,3.3,5.7,2.1,Iris-virginica
127 | 7.2,3.2,6.0,1.8,Iris-virginica
128 | 6.2,2.8,4.8,1.8,Iris-virginica
129 | 6.1,3.0,4.9,1.8,Iris-virginica
130 | 6.4,2.8,5.6,2.1,Iris-virginica
131 | 7.2,3.0,5.8,1.6,Iris-virginica
132 | 7.4,2.8,6.1,1.9,Iris-virginica
133 | 7.9,3.8,6.4,2.0,Iris-virginica
134 | 6.4,2.8,5.6,2.2,Iris-virginica
135 | 6.3,2.8,5.1,1.5,Iris-virginica
136 | 6.1,2.6,5.6,1.4,Iris-virginica
137 | 7.7,3.0,6.1,2.3,Iris-virginica
138 | 6.3,3.4,5.6,2.4,Iris-virginica
139 | 6.4,3.1,5.5,1.8,Iris-virginica
140 | 6.0,3.0,4.8,1.8,Iris-virginica
141 | 6.9,3.1,5.4,2.1,Iris-virginica
142 | 6.7,3.1,5.6,2.4,Iris-virginica
143 | 6.9,3.1,5.1,2.3,Iris-virginica
144 | 5.8,2.7,5.1,1.9,Iris-virginica
145 | 6.8,3.2,5.9,2.3,Iris-virginica
146 | 6.7,3.3,5.7,2.5,Iris-virginica
147 | 6.7,3.0,5.2,2.3,Iris-virginica
148 | 6.3,2.5,5.0,1.9,Iris-virginica
149 | 6.5,3.0,5.2,2.0,Iris-virginica
150 | 6.2,3.4,5.4,2.3,Iris-virginica
151 | 5.9,3.0,5.1,1.8,Iris-virginica


--------------------------------------------------------------------------------
/T2W4/data/iris.names:
--------------------------------------------------------------------------------
 1 | 1. Title: Iris Plants Database
 2 | 	Updated Sept 21 by C.Blake - Added discrepency information
 3 | 
 4 | 2. Sources:
 5 |      (a) Creator: R.A. Fisher
 6 |      (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
 7 |      (c) Date: July, 1988
 8 | 
 9 | 3. Past Usage:
10 |    - Publications: too many to mention!!!  Here are a few.
11 |    1. Fisher,R.A. "The use of multiple measurements in taxonomic problems"
12 |       Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions
13 |       to Mathematical Statistics" (John Wiley, NY, 1950).
14 |    2. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
15 |       (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
16 |    3. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
17 |       Structure and Classification Rule for Recognition in Partially Exposed
18 |       Environments".  IEEE Transactions on Pattern Analysis and Machine
19 |       Intelligence, Vol. PAMI-2, No. 1, 67-71.
20 |       -- Results:
21 |          -- very low misclassification rates (0% for the setosa class)
22 |    4. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE 
23 |       Transactions on Information Theory, May 1972, 431-433.
24 |       -- Results:
25 |          -- very low misclassification rates again
26 |    5. See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al's AUTOCLASS II
27 |       conceptual clustering system finds 3 classes in the data.
28 | 
29 | 4. Relevant Information:
30 |    --- This is perhaps the best known database to be found in the pattern
31 |        recognition literature.  Fisher's paper is a classic in the field
32 |        and is referenced frequently to this day.  (See Duda & Hart, for
33 |        example.)  The data set contains 3 classes of 50 instances each,
34 |        where each class refers to a type of iris plant.  One class is
35 |        linearly separable from the other 2; the latter are NOT linearly
36 |        separable from each other.
37 |    --- Predicted attribute: class of iris plant.
38 |    --- This is an exceedingly simple domain.
39 |    --- This data differs from the data presented in Fishers article
40 | 	(identified by Steve Chadwick,  spchadwick@espeedaz.net )
41 | 	The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa"
42 | 	where the error is in the fourth feature.
43 | 	The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa"
44 | 	where the errors are in the second and third features.  
45 | 
46 | 5. Number of Instances: 150 (50 in each of three classes)
47 | 
48 | 6. Number of Attributes: 4 numeric, predictive attributes and the class
49 | 
50 | 7. Attribute Information:
51 |    1. sepal length in cm
52 |    2. sepal width in cm
53 |    3. petal length in cm
54 |    4. petal width in cm
55 |    5. class: 
56 |       -- Iris Setosa
57 |       -- Iris Versicolour
58 |       -- Iris Virginica
59 | 
60 | 8. Missing Attribute Values: None
61 | 
62 | Summary Statistics:
63 | 	         Min  Max   Mean    SD   Class Correlation
64 |    sepal length: 4.3  7.9   5.84  0.83    0.7826   
65 |     sepal width: 2.0  4.4   3.05  0.43   -0.4194
66 |    petal length: 1.0  6.9   3.76  1.76    0.9490  (high!)
67 |     petal width: 0.1  2.5   1.20  0.76    0.9565  (high!)
68 | 
69 | 9. Class Distribution: 33.3% for each of 3 classes.
70 | 


--------------------------------------------------------------------------------
/T2W4/dtc.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import pandas as pd
  3 | 
  4 | '''
  5 | Decision Trees are greedy algorithms
  6 | that maximise the current Information Gain
  7 | without backtracking or going back up to the root.
  8 | 
  9 | Future splits are based on the current splits:
 10 | split(t+1) = f(split(t))
 11 | 
 12 | At every level, the impurity of the dataset
 13 | decreases. The entropy (randomness) decreases
 14 | with the level.
 15 | '''
 16 | 
 17 | class DTNode():
 18 |     def __init__(self, feat_idx=None, bounds=None, left=None, right=None, info_gain=None, value=None):
 19 |        self.feat_idx = feat_idx
 20 |        self.bounds = bounds
 21 |        self.left = left
 22 |        self.right = right 
 23 |        self.info_gain = info_gain
 24 |        self.value = value
 25 | 
 26 | class DecisionTreeClassifier():
 27 |     def __init__(self, depth=2, min_split=2):
 28 |         self.root = None
 29 |         self.depth = depth
 30 |         self.min_split = min_split
 31 |         
 32 |     def build_tree(self, dataset, cur_depth=0):
 33 |         x, y = dataset[:, :-1], dataset[:, -1]
 34 |         n, n_dim = x.shape
 35 |         
 36 |         # recursively build the subtrees
 37 |         if n >= self.min_split and cur_depth <= self.depth:
 38 |             best_split = self.get_best_split(dataset, n, n_dim)
 39 |             
 40 |             if best_split['info_gain'] > 0:
 41 |                 left_tree = self.build_tree(best_split['left'], cur_depth+1)
 42 |                 right_tree = self.build_tree(best_split['right'], cur_depth+1)
 43 |                 
 44 |                 return DTNode(best_split['feat_idx'], best_split['bounds'], left_tree, right_tree, best_split['info_gain'])
 45 |             
 46 |         y = list(y)
 47 |         value = max(y, key=y.count) # class label = majority count at leaves
 48 |         
 49 |         return DTNode(value=value)
 50 |     
 51 |     def get_best_split(self, dataset, n, n_dim):
 52 |         best_split = {}
 53 |         max_info_gain = -float('inf')
 54 |         
 55 |         for idx in range(n_dim):
 56 |             feat_val = dataset[:, idx]
 57 |             possible_boundss = np.unique(feat_val)
 58 |             
 59 |             for thresh in possible_boundss:
 60 |                 # data_left, data_right = self.split(dataset, idx, thresh)
 61 |                 data_left = np.array([row for row in dataset if row[idx] <= thresh])
 62 |                 data_right = np.array([row for row in dataset if row[idx] > thresh])
 63 |                 
 64 |                 if len(data_left) > 0 and len(data_right) > 0:
 65 |                     y, left_y, right_y = dataset[:, -1], data_left[:, -1], data_right[:, -1]
 66 |                     cur_info_gain = self.get_info_gain(y, left_y, right_y)
 67 |                     
 68 |                     if cur_info_gain > max_info_gain:
 69 |                        best_split['feat_idx'] = idx
 70 |                        best_split['bounds'] = thresh
 71 |                        best_split['left'] = data_left
 72 |                        best_split['right'] = data_right
 73 |                        best_split['info_gain'] = cur_info_gain
 74 |                        max_info_gain = cur_info_gain
 75 |                        
 76 |         return best_split
 77 |     
 78 |     def get_info_gain(self, parent, left, right):
 79 |         weight_left = len(left) / len(parent)
 80 |         weight_right = len(right) / len(parent)
 81 |         
 82 |         info_gain = self.get_entropy(parent) - (weight_left * self.get_entropy(left) + weight_right * self.get_entropy(right))
 83 |         
 84 |         return info_gain
 85 |     
 86 |     def get_entropy(self, y):
 87 |         labels = np.unique(y)
 88 |         entropy = 0
 89 |         for cls in labels:
 90 |             p_cls = len(y[y == cls]) / len(y)
 91 |             entropy += -p_cls * np.log2(p_cls)
 92 |             
 93 |         return entropy
 94 |     
 95 |     def fit(self, x, y):
 96 |         dataset = np.concatenate((x, y), axis=1)
 97 |         self.root = self.build_tree(dataset)
 98 |         
 99 |     def make_pred(self, x, root):
100 |         if root.value != None:
101 |             return root.value
102 |         
103 |         feat_val = x[root.feat_idx]
104 |         
105 |         if feat_val <= root.bounds:
106 |             return self.make_pred(x, root.left)
107 |         else:
108 |             return self.make_pred(x, root.right)
109 |             
110 |     def predict(self, x):
111 |         return [self.make_pred(i, self.root) for i in x]
112 | 
113 | cols = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
114 | df = pd.read_csv('./data/iris.csv', skiprows=1, header=0, names=cols)
115 | 
116 | # replace class strings with integer indices
117 | df['class'] = df['class'].str.replace('Iris-setosa', '0')
118 | df['class'] = df['class'].str.replace('Iris-versicolor', '1')
119 | df['class'] = df['class'].str.replace('Iris-virginica', '2')
120 | df['class'] = df['class'].map(lambda x : int(x))
121 | 
122 | X = df.iloc[:, :-1].values
123 | Y = df.iloc[:, -1].values.reshape(-1, 1)
124 | X = np.array(X)
125 | Y = np.array(Y)
126 |         
127 | clf = DecisionTreeClassifier()
128 | clf.fit(X, Y) # split this into training and testing datasets
129 | 
130 | def print_tree(root=None, indent="  "):
131 |     if root.value != None:
132 |         print (root.value)
133 |     else:
134 |         print ("x_" + str(root.feat_idx), '<=', root.bounds, ":", format(root.info_gain, '0.4f'))
135 |         print (indent + "left: ", end="")
136 |         print_tree(root.left, indent + indent)
137 |         print (indent + "right: ", end="")
138 |         print_tree(root.right, indent + indent)
139 |         
140 | print_tree(clf.root)


--------------------------------------------------------------------------------
/T3W5/Intro_to_Support_Vector_Machines.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Support Vector Machines – An In-depth Tutorial",
  7 |       "provenance": [],
  8 |       "collapsed_sections": []
  9 |     },
 10 |     "kernelspec": {
 11 |       "name": "python3",
 12 |       "display_name": "Python 3"
 13 |     },
 14 |     "language_info": {
 15 |       "name": "python"
 16 |     }
 17 |   },
 18 |   "cells": [
 19 |     {
 20 |       "cell_type": "code",
 21 |       "metadata": {
 22 |         "id": "05sNYyltOOMD"
 23 |       },
 24 |       "source": [
 25 |         "import numpy as np\n",
 26 |         "import pandas as pd\n",
 27 |         "from sklearn import datasets"
 28 |       ],
 29 |       "execution_count": 5,
 30 |       "outputs": []
 31 |     },
 32 |     {
 33 |       "cell_type": "code",
 34 |       "metadata": {
 35 |         "id": "eoCyq5f7OVwZ"
 36 |       },
 37 |       "source": [
 38 |         "class SupperVectorClassifier:\n",
 39 |         "    def __init__(self):\n",
 40 |         "        self.w = None\n",
 41 |         "        self.iterations = 1000\n",
 42 |         "\n",
 43 |         "    def hinge_loss(self, YHAT, Y):\n",
 44 |         "        '''\n",
 45 |         "        Hinge Loss from lecture. No changes made.\n",
 46 |         "        '''\n",
 47 |         "\n",
 48 |         "        distances = 1 - (Y * YHAT)\n",
 49 |         "        distances[distances < 0] = 0 # everywhere it's the correct prediction, give a loss of 0\n",
 50 |         "        return np.sum(distances) / len(YHAT) # average loss\n",
 51 |         "\n",
 52 |         "    def gradient_descent(self, X, Y, loss):\n",
 53 |         "        '''\n",
 54 |         "        Vanilla gradient descent. \n",
 55 |         "        \n",
 56 |         "        You can switch this to SGD as well to improve performance.\n",
 57 |         "        '''\n",
 58 |         "\n",
 59 |         "        grads = {}\n",
 60 |         "        loss = 1 - (Y * np.dot(X, self.w))\n",
 61 |         "        dw = np.zeros(len(self.w))\n",
 62 |         "        \n",
 63 |         "        for ind, d in enumerate(loss):\n",
 64 |         "            if max(0, d) == 0:\n",
 65 |         "                di = self.w\n",
 66 |         "            else:\n",
 67 |         "                di = self.w - (Y[ind] * X[ind])\n",
 68 |         "            dw += di\n",
 69 |         "        \n",
 70 |         "        dw = dw / len(Y)  # get the average gradient\n",
 71 |         "        grads['dw'] = dw\n",
 72 |         "\n",
 73 |         "        return grads\n",
 74 |         "\n",
 75 |         "    def update(self, grads, alpha):\n",
 76 |         "        '''\n",
 77 |         "        Performs the actual update step in gradient descent.\n",
 78 |         "\n",
 79 |         "        grads : gradient of loss wrt weights\n",
 80 |         "        alpha : learning rate\n",
 81 |         "        '''\n",
 82 |         "        self.w = self.w - alpha * grads['dw']\n",
 83 |         "\n",
 84 |         "    def fit(self, X, Y, alpha=1e-2):\n",
 85 |         "        '''\n",
 86 |         "        Fits the model on the given dataset.\n",
 87 |         "\n",
 88 |         "        X: data samples\n",
 89 |         "        Y: binary labels (1 or 0)\n",
 90 |         "        alpha: step size / learning rate\n",
 91 |         "        '''\n",
 92 |         "\n",
 93 |         "        # reset the parameters for every call to fit\n",
 94 |         "        self.w = np.random.rand(X[0].shape[-1]) # get the number of features per sample\n",
 95 |         "\n",
 96 |         "        # perform the N iterations of learning\n",
 97 |         "        for i in range(self.iterations):\n",
 98 |         "            # forward pass\n",
 99 |         "            YHAT = np.dot(X, self.w)\n",
100 |         "            loss = self.hinge_loss(YHAT, Y)\n",
101 |         "\n",
102 |         "            if i % 20 == 0:\n",
103 |         "                print (\"Iteration: {} | Loss: {}\".format(i, loss))\n",
104 |         "\n",
105 |         "            # backward pass\n",
106 |         "            grads = self.gradient_descent(X, Y, loss) # calculate gradient wrt parameters\n",
107 |         "            self.update(grads, alpha) # optimise the parameters\n",
108 |         "    \n",
109 |         "    def predict(self, X):\n",
110 |         "        # simply compute forward pass\n",
111 |         "        return np.dot(X, self.w)\n",
112 |         "\n",
113 |         "    def evaluate(self, X_test, Y_test):\n",
114 |         "        '''\n",
115 |         "        Returns the accuracy of the model.\n",
116 |         "        '''\n",
117 |         "        pred = self.predict(X_test)\n",
118 |         "\n",
119 |         "        # anything negative gets label -1, anything positive gets label 1\n",
120 |         "        pred[pred < 0] = -1 \n",
121 |         "        pred[pred >= 0] = 1\n",
122 |         "        correct = 0\n",
123 |         "\n",
124 |         "        for i in range(len(Y_test)):\n",
125 |         "            if pred[i] == Y_test[i]:\n",
126 |         "                correct += 1\n",
127 |         "\n",
128 |         "        return correct / len(Y_test) # get final accuracy based on number of correct samples"
129 |       ],
130 |       "execution_count": 46,
131 |       "outputs": []
132 |     },
133 |     {
134 |       "cell_type": "code",
135 |       "metadata": {
136 |         "id": "vKilb4EqQq6K"
137 |       },
138 |       "source": [
139 |         "from sklearn.model_selection import train_test_split\n",
140 |         "\n",
141 |         "X, Y = datasets.load_breast_cancer(return_X_y=True)\n",
142 |         "Y[Y == 0] = -1 # switch labels from [0, 1] to [-1, 1]\n",
143 |         "\n",
144 |         "X_train, X_test, Y_train, Y_test = train_test_split(X, Y)"
145 |       ],
146 |       "execution_count": 50,
147 |       "outputs": []
148 |     },
149 |     {
150 |       "cell_type": "code",
151 |       "metadata": {
152 |         "colab": {
153 |           "base_uri": "https://localhost:8080/"
154 |         },
155 |         "id": "YywW366QQ6L8",
156 |         "outputId": "720d4170-9a26-4c51-959f-7ab81b044bdf"
157 |       },
158 |       "source": [
159 |         "model = SupperVectorClassifier()\n",
160 |         "model.fit(X_train, Y_train)"
161 |       ],
162 |       "execution_count": 52,
163 |       "outputs": [
164 |         {
165 |           "output_type": "stream",
166 |           "name": "stdout",
167 |           "text": [
168 |             "Iteration: 0 | Loss: 416.2350522151881\n",
169 |             "Iteration: 20 | Loss: 634.7471121710325\n",
170 |             "Iteration: 40 | Loss: 1377.959569044146\n",
171 |             "Iteration: 60 | Loss: 337.2871462020824\n",
172 |             "Iteration: 80 | Loss: 174.50232848883326\n",
173 |             "Iteration: 100 | Loss: 144.48051916551537\n",
174 |             "Iteration: 120 | Loss: 144.97272350980242\n",
175 |             "Iteration: 140 | Loss: 157.49100221912383\n",
176 |             "Iteration: 160 | Loss: 188.68354119350255\n",
177 |             "Iteration: 180 | Loss: 198.83279680794266\n",
178 |             "Iteration: 200 | Loss: 201.37397024816923\n",
179 |             "Iteration: 220 | Loss: 209.18699964584258\n",
180 |             "Iteration: 240 | Loss: 211.24550460677744\n",
181 |             "Iteration: 260 | Loss: 224.75921771134932\n",
182 |             "Iteration: 280 | Loss: 207.41067721227247\n",
183 |             "Iteration: 300 | Loss: 202.9874547965538\n",
184 |             "Iteration: 320 | Loss: 233.5225806996785\n",
185 |             "Iteration: 340 | Loss: 209.41894810505434\n",
186 |             "Iteration: 360 | Loss: 227.86193173406267\n",
187 |             "Iteration: 380 | Loss: 220.0275230523279\n",
188 |             "Iteration: 400 | Loss: 209.93813706106957\n",
189 |             "Iteration: 420 | Loss: 224.69843019249313\n",
190 |             "Iteration: 440 | Loss: 207.14690567298172\n",
191 |             "Iteration: 460 | Loss: 222.13044724748374\n",
192 |             "Iteration: 480 | Loss: 206.75958921219885\n",
193 |             "Iteration: 500 | Loss: 224.10527985394174\n",
194 |             "Iteration: 520 | Loss: 212.29554595340238\n",
195 |             "Iteration: 540 | Loss: 222.9296852741817\n",
196 |             "Iteration: 560 | Loss: 211.17164394816842\n",
197 |             "Iteration: 580 | Loss: 221.71447794927437\n",
198 |             "Iteration: 600 | Loss: 219.52250985351424\n",
199 |             "Iteration: 620 | Loss: 212.20466048180376\n",
200 |             "Iteration: 640 | Loss: 222.5975165742218\n",
201 |             "Iteration: 660 | Loss: 207.1530791563107\n",
202 |             "Iteration: 680 | Loss: 224.41752826545314\n",
203 |             "Iteration: 700 | Loss: 212.5541881669893\n",
204 |             "Iteration: 720 | Loss: 207.96840920838758\n",
205 |             "Iteration: 740 | Loss: 202.79349339026604\n",
206 |             "Iteration: 760 | Loss: 224.18299606836823\n",
207 |             "Iteration: 780 | Loss: 223.15410816749386\n",
208 |             "Iteration: 800 | Loss: 227.64698698681198\n",
209 |             "Iteration: 820 | Loss: 213.35496804814935\n",
210 |             "Iteration: 840 | Loss: 223.82125080471178\n",
211 |             "Iteration: 860 | Loss: 211.88036740995827\n",
212 |             "Iteration: 880 | Loss: 203.0702406461759\n",
213 |             "Iteration: 900 | Loss: 224.43132254807512\n",
214 |             "Iteration: 920 | Loss: 223.35580653357326\n",
215 |             "Iteration: 940 | Loss: 227.815553049764\n",
216 |             "Iteration: 960 | Loss: 219.9270821314675\n",
217 |             "Iteration: 980 | Loss: 212.53249397223584\n"
218 |           ]
219 |         }
220 |       ]
221 |     },
222 |     {
223 |       "cell_type": "code",
224 |       "metadata": {
225 |         "colab": {
226 |           "base_uri": "https://localhost:8080/"
227 |         },
228 |         "id": "l5deYnKJ_QC2",
229 |         "outputId": "08d084e3-62e1-4170-e129-494051d5526e"
230 |       },
231 |       "source": [
232 |         "acc = model.evaluate(X_test, Y_test)\n",
233 |         "print (\"SVM is {:.3f}% accurate.\".format(acc * 100))"
234 |       ],
235 |       "execution_count": 53,
236 |       "outputs": [
237 |         {
238 |           "output_type": "stream",
239 |           "name": "stdout",
240 |           "text": [
241 |             "SVM is 83.916% accurate.\n"
242 |           ]
243 |         }
244 |       ]
245 |     }
246 |   ]
247 | }


--------------------------------------------------------------------------------
/T3W5/README.md:
--------------------------------------------------------------------------------
 1 | # Tutorial 3 Week 5: Linear Models
 2 | 
 3 | In T3W5, I cover the Linear Models. Find the tutorial slides [here](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1).
 4 | 
 5 | This repo contains Python implementations of `LinearRegressionClassifier`, `LogisticRegressionClassifier`, and `SupportVectorClassifier`. You can call them in a similar fashion to related models from `sklearn`. I train them on the popular *Iris Type Classification Dataset* found in `data/iris.csv`, as well as the *Breast Cancer Classification Dataset* from `sklearn.datasets`. 
 6 | 
 7 | > You can find the SVM implementation in `Intro_to_Support_Vector_Machines.ipynb`. It has some more in-depth comments inside.
 8 | 
 9 | ## Contents
10 | This repo contains the code used to answer Questions 1, 2, and 5.
11 | 
12 | ### Question 1
13 | You cannot use **Mean Squared Error**.
14 | 
15 | MSE is mainly used in the case of regression problems, not classification tasks (which is when Logistic Regression is used). 
16 | 
17 | - `Accuracy` shows us how "good" our model is on unseen data
18 | - `AUC-ROC` shows us the model's ability to tell apart positive and negative instances
19 | - `Log Loss` is used as the cost function for Logistic Regression. The aim is to minimise this over training.
20 | 
21 | ---
22 | 
23 | ### Question 2
24 | The **Normal Equation** from the lecture:
25 | 
26 | ```bash
27 | ø = (1/(X.T * X)) * X.T * Y
28 |   = [4    -5.5    -7    7].T
29 | 
30 | y_hat = 4 - 5.5x1 - 7x2 + 7x3
31 | ```
32 | 
33 | ---
34 | 
35 | ### Question 3
36 | > Check the slides for annotated solutions for all the equations here.
37 | 
38 | ---
39 | 
40 | ### Question 4
41 | > Check the slides for diagrams and answers to these questions.
42 | 
43 | ---
44 | 
45 | ### Question 5a
46 | A symmetric matrix is one that's equal to its transpose. 
47 | 
48 | > Look at the slides for an annotated proof.
49 | 
50 | ### Question 5b
51 | You have to use induction for this. Consider the base case and then move on to the inductive step.
52 | 
53 | > Look at the slides for an annotated proof.
54 | 
55 | ### Question 5c
56 | Similar to `5b`, we have to use induction. We use the concept of **Idempotency** again.
57 | 
58 | > Look at the slides for an annotated proof.
59 | 
60 | ### Question 5d
61 | We know that `trace(AB) = trace(BA)`. For symmetric and idempotent matrices, `rank(A) = trace(A)`.
62 | 
63 | > Look at the slides for an annotated proof.


--------------------------------------------------------------------------------
/T3W5/data/iris.csv:
--------------------------------------------------------------------------------
  1 | sepal_length, sepal_width, petal_length, petal_width, class
  2 | 5.1,3.5,1.4,0.2,Iris-setosa
  3 | 4.9,3.0,1.4,0.2,Iris-setosa
  4 | 4.7,3.2,1.3,0.2,Iris-setosa
  5 | 4.6,3.1,1.5,0.2,Iris-setosa
  6 | 5.0,3.6,1.4,0.2,Iris-setosa
  7 | 5.4,3.9,1.7,0.4,Iris-setosa
  8 | 4.6,3.4,1.4,0.3,Iris-setosa
  9 | 5.0,3.4,1.5,0.2,Iris-setosa
 10 | 4.4,2.9,1.4,0.2,Iris-setosa
 11 | 4.9,3.1,1.5,0.1,Iris-setosa
 12 | 5.4,3.7,1.5,0.2,Iris-setosa
 13 | 4.8,3.4,1.6,0.2,Iris-setosa
 14 | 4.8,3.0,1.4,0.1,Iris-setosa
 15 | 4.3,3.0,1.1,0.1,Iris-setosa
 16 | 5.8,4.0,1.2,0.2,Iris-setosa
 17 | 5.7,4.4,1.5,0.4,Iris-setosa
 18 | 5.4,3.9,1.3,0.4,Iris-setosa
 19 | 5.1,3.5,1.4,0.3,Iris-setosa
 20 | 5.7,3.8,1.7,0.3,Iris-setosa
 21 | 5.1,3.8,1.5,0.3,Iris-setosa
 22 | 5.4,3.4,1.7,0.2,Iris-setosa
 23 | 5.1,3.7,1.5,0.4,Iris-setosa
 24 | 4.6,3.6,1.0,0.2,Iris-setosa
 25 | 5.1,3.3,1.7,0.5,Iris-setosa
 26 | 4.8,3.4,1.9,0.2,Iris-setosa
 27 | 5.0,3.0,1.6,0.2,Iris-setosa
 28 | 5.0,3.4,1.6,0.4,Iris-setosa
 29 | 5.2,3.5,1.5,0.2,Iris-setosa
 30 | 5.2,3.4,1.4,0.2,Iris-setosa
 31 | 4.7,3.2,1.6,0.2,Iris-setosa
 32 | 4.8,3.1,1.6,0.2,Iris-setosa
 33 | 5.4,3.4,1.5,0.4,Iris-setosa
 34 | 5.2,4.1,1.5,0.1,Iris-setosa
 35 | 5.5,4.2,1.4,0.2,Iris-setosa
 36 | 4.9,3.1,1.5,0.1,Iris-setosa
 37 | 5.0,3.2,1.2,0.2,Iris-setosa
 38 | 5.5,3.5,1.3,0.2,Iris-setosa
 39 | 4.9,3.1,1.5,0.1,Iris-setosa
 40 | 4.4,3.0,1.3,0.2,Iris-setosa
 41 | 5.1,3.4,1.5,0.2,Iris-setosa
 42 | 5.0,3.5,1.3,0.3,Iris-setosa
 43 | 4.5,2.3,1.3,0.3,Iris-setosa
 44 | 4.4,3.2,1.3,0.2,Iris-setosa
 45 | 5.0,3.5,1.6,0.6,Iris-setosa
 46 | 5.1,3.8,1.9,0.4,Iris-setosa
 47 | 4.8,3.0,1.4,0.3,Iris-setosa
 48 | 5.1,3.8,1.6,0.2,Iris-setosa
 49 | 4.6,3.2,1.4,0.2,Iris-setosa
 50 | 5.3,3.7,1.5,0.2,Iris-setosa
 51 | 5.0,3.3,1.4,0.2,Iris-setosa
 52 | 7.0,3.2,4.7,1.4,Iris-versicolor
 53 | 6.4,3.2,4.5,1.5,Iris-versicolor
 54 | 6.9,3.1,4.9,1.5,Iris-versicolor
 55 | 5.5,2.3,4.0,1.3,Iris-versicolor
 56 | 6.5,2.8,4.6,1.5,Iris-versicolor
 57 | 5.7,2.8,4.5,1.3,Iris-versicolor
 58 | 6.3,3.3,4.7,1.6,Iris-versicolor
 59 | 4.9,2.4,3.3,1.0,Iris-versicolor
 60 | 6.6,2.9,4.6,1.3,Iris-versicolor
 61 | 5.2,2.7,3.9,1.4,Iris-versicolor
 62 | 5.0,2.0,3.5,1.0,Iris-versicolor
 63 | 5.9,3.0,4.2,1.5,Iris-versicolor
 64 | 6.0,2.2,4.0,1.0,Iris-versicolor
 65 | 6.1,2.9,4.7,1.4,Iris-versicolor
 66 | 5.6,2.9,3.6,1.3,Iris-versicolor
 67 | 6.7,3.1,4.4,1.4,Iris-versicolor
 68 | 5.6,3.0,4.5,1.5,Iris-versicolor
 69 | 5.8,2.7,4.1,1.0,Iris-versicolor
 70 | 6.2,2.2,4.5,1.5,Iris-versicolor
 71 | 5.6,2.5,3.9,1.1,Iris-versicolor
 72 | 5.9,3.2,4.8,1.8,Iris-versicolor
 73 | 6.1,2.8,4.0,1.3,Iris-versicolor
 74 | 6.3,2.5,4.9,1.5,Iris-versicolor
 75 | 6.1,2.8,4.7,1.2,Iris-versicolor
 76 | 6.4,2.9,4.3,1.3,Iris-versicolor
 77 | 6.6,3.0,4.4,1.4,Iris-versicolor
 78 | 6.8,2.8,4.8,1.4,Iris-versicolor
 79 | 6.7,3.0,5.0,1.7,Iris-versicolor
 80 | 6.0,2.9,4.5,1.5,Iris-versicolor
 81 | 5.7,2.6,3.5,1.0,Iris-versicolor
 82 | 5.5,2.4,3.8,1.1,Iris-versicolor
 83 | 5.5,2.4,3.7,1.0,Iris-versicolor
 84 | 5.8,2.7,3.9,1.2,Iris-versicolor
 85 | 6.0,2.7,5.1,1.6,Iris-versicolor
 86 | 5.4,3.0,4.5,1.5,Iris-versicolor
 87 | 6.0,3.4,4.5,1.6,Iris-versicolor
 88 | 6.7,3.1,4.7,1.5,Iris-versicolor
 89 | 6.3,2.3,4.4,1.3,Iris-versicolor
 90 | 5.6,3.0,4.1,1.3,Iris-versicolor
 91 | 5.5,2.5,4.0,1.3,Iris-versicolor
 92 | 5.5,2.6,4.4,1.2,Iris-versicolor
 93 | 6.1,3.0,4.6,1.4,Iris-versicolor
 94 | 5.8,2.6,4.0,1.2,Iris-versicolor
 95 | 5.0,2.3,3.3,1.0,Iris-versicolor
 96 | 5.6,2.7,4.2,1.3,Iris-versicolor
 97 | 5.7,3.0,4.2,1.2,Iris-versicolor
 98 | 5.7,2.9,4.2,1.3,Iris-versicolor
 99 | 6.2,2.9,4.3,1.3,Iris-versicolor
100 | 5.1,2.5,3.0,1.1,Iris-versicolor
101 | 5.7,2.8,4.1,1.3,Iris-versicolor
102 | 6.3,3.3,6.0,2.5,Iris-virginica
103 | 5.8,2.7,5.1,1.9,Iris-virginica
104 | 7.1,3.0,5.9,2.1,Iris-virginica
105 | 6.3,2.9,5.6,1.8,Iris-virginica
106 | 6.5,3.0,5.8,2.2,Iris-virginica
107 | 7.6,3.0,6.6,2.1,Iris-virginica
108 | 4.9,2.5,4.5,1.7,Iris-virginica
109 | 7.3,2.9,6.3,1.8,Iris-virginica
110 | 6.7,2.5,5.8,1.8,Iris-virginica
111 | 7.2,3.6,6.1,2.5,Iris-virginica
112 | 6.5,3.2,5.1,2.0,Iris-virginica
113 | 6.4,2.7,5.3,1.9,Iris-virginica
114 | 6.8,3.0,5.5,2.1,Iris-virginica
115 | 5.7,2.5,5.0,2.0,Iris-virginica
116 | 5.8,2.8,5.1,2.4,Iris-virginica
117 | 6.4,3.2,5.3,2.3,Iris-virginica
118 | 6.5,3.0,5.5,1.8,Iris-virginica
119 | 7.7,3.8,6.7,2.2,Iris-virginica
120 | 7.7,2.6,6.9,2.3,Iris-virginica
121 | 6.0,2.2,5.0,1.5,Iris-virginica
122 | 6.9,3.2,5.7,2.3,Iris-virginica
123 | 5.6,2.8,4.9,2.0,Iris-virginica
124 | 7.7,2.8,6.7,2.0,Iris-virginica
125 | 6.3,2.7,4.9,1.8,Iris-virginica
126 | 6.7,3.3,5.7,2.1,Iris-virginica
127 | 7.2,3.2,6.0,1.8,Iris-virginica
128 | 6.2,2.8,4.8,1.8,Iris-virginica
129 | 6.1,3.0,4.9,1.8,Iris-virginica
130 | 6.4,2.8,5.6,2.1,Iris-virginica
131 | 7.2,3.0,5.8,1.6,Iris-virginica
132 | 7.4,2.8,6.1,1.9,Iris-virginica
133 | 7.9,3.8,6.4,2.0,Iris-virginica
134 | 6.4,2.8,5.6,2.2,Iris-virginica
135 | 6.3,2.8,5.1,1.5,Iris-virginica
136 | 6.1,2.6,5.6,1.4,Iris-virginica
137 | 7.7,3.0,6.1,2.3,Iris-virginica
138 | 6.3,3.4,5.6,2.4,Iris-virginica
139 | 6.4,3.1,5.5,1.8,Iris-virginica
140 | 6.0,3.0,4.8,1.8,Iris-virginica
141 | 6.9,3.1,5.4,2.1,Iris-virginica
142 | 6.7,3.1,5.6,2.4,Iris-virginica
143 | 6.9,3.1,5.1,2.3,Iris-virginica
144 | 5.8,2.7,5.1,1.9,Iris-virginica
145 | 6.8,3.2,5.9,2.3,Iris-virginica
146 | 6.7,3.3,5.7,2.5,Iris-virginica
147 | 6.7,3.0,5.2,2.3,Iris-virginica
148 | 6.3,2.5,5.0,1.9,Iris-virginica
149 | 6.5,3.0,5.2,2.0,Iris-virginica
150 | 6.2,3.4,5.4,2.3,Iris-virginica
151 | 5.9,3.0,5.1,1.8,Iris-virginica


--------------------------------------------------------------------------------
/T3W5/data/iris.names:
--------------------------------------------------------------------------------
 1 | 1. Title: Iris Plants Database
 2 | 	Updated Sept 21 by C.Blake - Added discrepency information
 3 | 
 4 | 2. Sources:
 5 |      (a) Creator: R.A. Fisher
 6 |      (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
 7 |      (c) Date: July, 1988
 8 | 
 9 | 3. Past Usage:
10 |    - Publications: too many to mention!!!  Here are a few.
11 |    1. Fisher,R.A. "The use of multiple measurements in taxonomic problems"
12 |       Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions
13 |       to Mathematical Statistics" (John Wiley, NY, 1950).
14 |    2. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
15 |       (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
16 |    3. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
17 |       Structure and Classification Rule for Recognition in Partially Exposed
18 |       Environments".  IEEE Transactions on Pattern Analysis and Machine
19 |       Intelligence, Vol. PAMI-2, No. 1, 67-71.
20 |       -- Results:
21 |          -- very low misclassification rates (0% for the setosa class)
22 |    4. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE 
23 |       Transactions on Information Theory, May 1972, 431-433.
24 |       -- Results:
25 |          -- very low misclassification rates again
26 |    5. See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al's AUTOCLASS II
27 |       conceptual clustering system finds 3 classes in the data.
28 | 
29 | 4. Relevant Information:
30 |    --- This is perhaps the best known database to be found in the pattern
31 |        recognition literature.  Fisher's paper is a classic in the field
32 |        and is referenced frequently to this day.  (See Duda & Hart, for
33 |        example.)  The data set contains 3 classes of 50 instances each,
34 |        where each class refers to a type of iris plant.  One class is
35 |        linearly separable from the other 2; the latter are NOT linearly
36 |        separable from each other.
37 |    --- Predicted attribute: class of iris plant.
38 |    --- This is an exceedingly simple domain.
39 |    --- This data differs from the data presented in Fishers article
40 | 	(identified by Steve Chadwick,  spchadwick@espeedaz.net )
41 | 	The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa"
42 | 	where the error is in the fourth feature.
43 | 	The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa"
44 | 	where the errors are in the second and third features.  
45 | 
46 | 5. Number of Instances: 150 (50 in each of three classes)
47 | 
48 | 6. Number of Attributes: 4 numeric, predictive attributes and the class
49 | 
50 | 7. Attribute Information:
51 |    1. sepal length in cm
52 |    2. sepal width in cm
53 |    3. petal length in cm
54 |    4. petal width in cm
55 |    5. class: 
56 |       -- Iris Setosa
57 |       -- Iris Versicolour
58 |       -- Iris Virginica
59 | 
60 | 8. Missing Attribute Values: None
61 | 
62 | Summary Statistics:
63 | 	         Min  Max   Mean    SD   Class Correlation
64 |    sepal length: 4.3  7.9   5.84  0.83    0.7826   
65 |     sepal width: 2.0  4.4   3.05  0.43   -0.4194
66 |    petal length: 1.0  6.9   3.76  1.76    0.9490  (high!)
67 |     petal width: 0.1  2.5   1.20  0.76    0.9565  (high!)
68 | 
69 | 9. Class Distribution: 33.3% for each of 3 classes.
70 | 


--------------------------------------------------------------------------------
/T3W5/linreg.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import pandas as pd
 3 | import matplotlib.pyplot as plt
 4 | 
 5 | class LinearRegressionClassifier(object):
 6 |     def __init__(self):
 7 |         self.alpha = 1e-2
 8 |         self.iterations = 1000
 9 |         self.losses = []
10 |         self.weights = None
11 |         self.bias = None
12 |         
13 |     def forward(self, x):
14 |         return np.dot(x, self.weights) + self.bias
15 |     
16 |     def backward(self, x, y_hat, y):
17 |         m, d = x.shape
18 |         y_hat = y_hat.reshape([m])
19 |         y = y.reshape([m])
20 |         
21 |         partial_w = (1 / x.shape[0]) * (2 * np.dot(x.T, (y_hat - y)))
22 |         partial_b = (1 / x.shape[0]) * (2 * np.sum(y_hat - y))
23 |         
24 |         return [partial_w, partial_b]
25 | 
26 |     def MSELoss(self, y_hat, y):
27 |         return (1/y.shape[0]) * np.sum(np.square(y_hat - y))
28 |     
29 |     def update(self, grad):
30 |         self.weights = self.weights - (self.alpha * grad[0])
31 |         self.bias = self.bias - (self.alpha * grad[1])
32 |         
33 |     def fit(self, x, y):
34 |         self.weights = np.random.uniform(0, 1, x.shape[1])
35 |         self.bias = np.random.uniform(0, 1, 1)
36 |         self.losses = []
37 |         
38 |         for i in range(self.iterations):
39 |             y_hat = self.forward(x)
40 |             
41 |             loss = self.MSELoss(y_hat, y)
42 |             self.losses.append(loss)
43 |             
44 |             grad = self.backward(x, y_hat, y)
45 |             
46 |             self.update(grad)
47 |         
48 |     def predict(self, x):
49 |         return x
50 |     
51 |     def plot(self):
52 |         plt.plot(range(self.iterations), self.losses, color="red")
53 |         plt.title("Loss on Iris Dataset for {} iterations".format(self.iterations))
54 |         plt.xlabel("Iteration")
55 |         plt.ylabel("Loss")
56 |         plt.show()
57 | 
58 | cols = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
59 | df = pd.read_csv('./data/iris.csv', skiprows=1, header=0, names=cols)
60 | 
61 | # replace class strings with integer indices
62 | df['class'] = df['class'].str.replace('Iris-setosa', '0')
63 | df['class'] = df['class'].str.replace('Iris-versicolor', '1')
64 | df['class'] = df['class'].str.replace('Iris-virginica', '2')
65 | df['class'] = df['class'].map(lambda x : int(x))
66 | 
67 | X = df.iloc[:, :-1].values
68 | Y = df.iloc[:, -1].values.reshape(-1, 1)
69 | X = np.array(X)
70 | Y = np.array(Y)
71 | 
72 | linreg = LinearRegressionClassifier()
73 | linreg.fit(X, Y)
74 | linreg.plot()


--------------------------------------------------------------------------------
/T4aW6/README.md:
--------------------------------------------------------------------------------
 1 | # Tutorial 4 Week 6: Bias Variance Tradeoff
 2 | 
 3 | In T4W6, I cover Bias-Variance Tradeoff. Find the tutorial slides [here](https://www.figma.com/file/jqyuC4tzHUu84NaXc28zPI/Tutorial-Slides?node-id=0%3A1).
 4 | 
 5 | > This tutorial was pretty difficult. I've attached a `FAQ.pdf` file that seeks to clarify certain details on this week's topics.
 6 | 
 7 | ## Contents
 8 | This repo contains answers for Questions 1 and 2.
 9 | 
10 | ### Question 1a
11 | Number of data points: Yes
12 | Amount of Noise: No
13 | Complexity of Target: No
14 | 
15 | ### Question 1b
16 | Determinstic noise will increase as it gets harder for `H` to model `f`. Stochastic noise remains the same as it is independent of `H` and `f`. There is a greater chance of overfitting.
17 | 
18 | ### Question 1c
19 | Determinstic noise will decrease as it gets eaier for `H` to model `f`. Stochastic noise remains the same as it is independent of `H` and `f`. There is a greater chance of overfitting.
20 | 
21 | ---
22 | 
23 | ### Question 2a
24 | Each blue point is the average training accuracy for an arbitrary value of `C`. It's the average of all the `10` accuracies for the 10-FCV. 
25 | 
26 | Each green point is the average validation accuracy for an arbitrary value of `C`. It's the average of all the `10` validation accuracies for the 10-FCV.
27 | 
28 | ### Question 2b
29 | Each blue region represents the **variance** of the training accuracy for a value of `C`. It is calculated by getting the variance of all `10` accuracies for the 10-FCV. 
30 | 
31 | Similarly, the green region is the **variance** of the validation accuracy for a value of `C`. It's the variance of all `10` accuracies for the 10-FCV.
32 | 
33 | ### Question 2c
34 | The best validation accuracy is reached when `C = 1`. 
35 | 
36 | > High training accuracy DOES NOT indicate high validation/testing accuracies. Always perform your train-test process to see if the model has generalised well to the unseen data before doing anything with the model (like deploying to production or using it IRL).
37 | 
38 | ---
39 | 
40 | ### Question 3a
41 | 
42 | > The annotated proofs for this question can be found on the slides.
43 | 
44 | ### Question 3b
45 | 1. Smaller `k` values with everything else constant, will increase the variance.
46 | 2. As `k` increases, bias increases. As the number of instances increases, we will be considering points further away from `x0` (closeness decreases) and the resulting predictions will move away from `f(x0)`. 
47 | 
48 | > Bias = Closeness to Truth
49 | 
50 | ---
51 | 
52 | ### Question 4
53 | 
54 | > The annotated proofs for this question can be found on the slides.


--------------------------------------------------------------------------------
/T4aW6/Tutorial_4_FAQ.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rish-16/CS3244-Tutorial-Material/7681d6ad96354c3d672e5ef526dfd16f03eb5d0c/T4aW6/Tutorial_4_FAQ.pdf


--------------------------------------------------------------------------------
/T4bW7/README.md:
--------------------------------------------------------------------------------
  1 | # Tutorial 4b Week 7: Regularisation and Validation
  2 | 
  3 | > Hope you had a productive Recess Week! Let's try getting that 'A' for midterms :D
  4 | 
  5 | In T4bW7, I cover the Regularisation and Validation. Find the tutorial slides [here](https://docs.google.com/presentation/d/1eE1In5ZS19YKgN3DN9VjNhBavHQoMaKB9NjZ-hreTG0/edit?usp=sharing).
  6 | 
  7 | ## Contents
  8 | This repo contains the code used to answer Questions 1, 2, and 3.
  9 | 
 10 | ---
 11 | 
 12 | ### Question 1a
 13 | Training time : `m^2 * log(m)`
 14 | 
 15 | In **LOO-CV**, one fold is one sample. There are `m-1` training samples and `1` testing sample, each performed `m` times for each sample. That means every sample gets its chance of being the testing sample. This is for _one_ model.
 16 | 
 17 | Number of models: `30` <br>
 18 | Number of training samples: `m-1` <br>
 19 | Number of testing samples: `1` <br>
 20 | <br>
 21 | Total time: `30 * m * (m-1)^2 * log(m-1)`
 22 | 
 23 | ### Question 1b
 24 | 
 25 | In **10-FCV**, each fold has `m/10` samples inside. There are 9 training folds and 1 testing fold. Each `m/10`-sized fold gets its chance of being the testing fold. This is for _one_ model.
 26 | 
 27 | Number of models: `30` <br>
 28 | Number of training samples: `9` <br>
 29 | Number of testing samples: `1` <br>
 30 | Training time for entire dataset of `m` samples : `m^2 * log(m)` <br>
 31 | <br>
 32 | Total time: `[30 * 10 * (9m/10)^2 * log(9m/10)] + [m^2 * log(m)]`
 33 | 
 34 | ---
 35 | 
 36 | ## How to read contour plots
 37 | Before we get into Question 2, let's understand the figures given to us.
 38 | <br><br>
 39 | The ellipses are contour plots that represent the altitudes of the function. Think of it as the graph surface coming out of the paper in 3 dimensions (like a volcano on paper). The lower the number next to a circle, the lower the altitude, and vice versa.
 40 | 
 41 | 1. Find the minimum value of `Reg. Penalty + MSE term`
 42 | 2. Return the corresponding values of `(Theta0, Theta1)`
 43 | 
 44 | > It's OKAY to guess here! The values are rough _guesstimates_. Just eyeball it.
 45 | 
 46 | ### Question 2a
 47 | No regularisation means we only look at the MSE term. Find the values of `(Theta0, Theta1)` such that the value of the MSE term is minimum. This occurs at the circle at altitude `0.2` on either graph. The center of that circle corresponds to `Theta0 = ~0.9` and `Theta1 = 0.5`. It's alright if the value fluctuates `± 0.5` from the correct answer.
 48 | 
 49 | ### Question 2b
 50 | Look at graph 1. There are possible sums to consider:
 51 | 
 52 | 1. Minimum MSE + Flexible Reg Penalty = `0.1 + 4.0 = 4.1` (NOPE)
 53 | 2. Flexible MSE + Minimum Reg Penalty = `0.4 + 5.0 = 5.4` (NOPE)
 54 | 3. Middle ground = `0.5 + 2.6 = 3.1` (CORRECT)
 55 | 
 56 | > The minimum sum corresponds to the pair `(0.2, 0.25)`
 57 | 
 58 | ### Question 2c
 59 | Look at graph 2. There are possible sums to consider:
 60 | 
 61 | 1. Minimum MSE + Flexible Reg Penalty = `0.1 + 9.0 = 9.1` (NOPE)
 62 | 2. Middle ground = `2.5 + 2.2 = 4.7` (NOPE)
 63 | 3. Flexible MSE + Minimum Reg Penalty = `0.0 + 4.4 = 4.4` (CORRECT)
 64 | 
 65 | > The minimum sum corresponds to the pair `(0.0, 0.1)`
 66 | 
 67 | ---
 68 | 
 69 | ### Question 3a
 70 | Time Series data is dependent on time. Breaking that natural order of time makes your data worthless. The best possible scenario is breaking the dataset without abrupt breakages in between. For example, you can store the past few days worth of temporal data points for training, and the future points for testing.
 71 | 
 72 | > The value is in the time. Respect it.
 73 | 
 74 | ### Question 3b
 75 | Break your data into training, validation, and testing without switching the order of the samples or shuffling them. For example, suppose we have the following dataset with time going from `T1` to `T20`:
 76 | 
 77 | ```
 78 | Dataset = [T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20]
 79 | 
 80 | Training = [T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11]
 81 | Validation = [T12, T13, T14, T15]
 82 | Testing = [T16, T17, T18, T19, T20]
 83 | ```
 84 | 
 85 | > Again, please respect the time component for temporal data.
 86 | 
 87 | ### Question 3c
 88 | Take adjacent pairs of data points for training and validation.
 89 | 
 90 | ```
 91 | Dataset = [1, 2, 3, 4]
 92 | 
 93 | Training = [1] | Validation = [2]
 94 | Training = [2] | Validation = [3]
 95 | Training = [3] | Validation = [4]
 96 | ```
 97 | 
 98 | There are less-preferred alternatives:
 99 | 
100 | 1. `Training = [1] | Validation = [3]` -> decent model
101 | 2. `Training = [1, 2] | Validation = [3]` -> predicting too far into the future after limited training
102 | 3. `Training = [1, 2, 3] | Validation = [4]` -> better model but can't really compare to training fold in **1.**
103 | 
104 | > The key is to break the dataset into comparable folds for training and testing that result in models that are not too different from one another.


--------------------------------------------------------------------------------
/T5W8/README.md:
--------------------------------------------------------------------------------
  1 | # Tutorial 5 Week 8: Evaluation Metrics
  2 | 
  3 | In T5W8, I cover Evaluation Metrics. Find the tutorial slides [here](https://docs.google.com/presentation/d/19QigHWaB3GTnhyNfWkbXYpcfvUT2wPq1Zi2ft40GATM/edit?usp=sharing).
  4 | 
  5 | > This chapter is very important to you as ML practitioner. It gives us tools to analyse how our model is doing after training. These methods give us an indication of which direction to head in when stuck. 
  6 | 
  7 | ## Contents
  8 | This repo contains answers for Questions 1 and 2.
  9 | 
 10 | ### Question 1a
 11 | | Sample | Prediction | Label   |
 12 | |--------|------------|---------|
 13 | | x1     | 0 (NEG)    | 0 (NEG) |
 14 | | x2     | 0 (NEG)    | 1 (POS) |
 15 | | x3     | 0 (NEG)    | 1 (POS) |
 16 | | x4     | 0 (NEG)    | 0 (NEG) |
 17 | | x5     | 0 (NEG)    | 0 (NEG) |
 18 | | x6     | 1 (POS)    | 1 (POS) |
 19 | | x7     | 1 (POS)    | 1 (POS) |
 20 | | x8     | 1 (POS)    | 0 (NEG) |
 21 | | x9     | 1 (POS)    | 1 (POS) |
 22 | | x10    | 1 (POS)    | 1 (POS) |
 23 | 
 24 | | Submetric | (Pred, Actual) | Score |
 25 | |-----------|----------------|-------|
 26 | | TP        | (POS, POS)     | 4     |
 27 | | FP        | (POS, NEG)     | 1     |
 28 | | TN        | (NEG, NEG)     | 3     |
 29 | | FN        | (NEG, POS)     | 2     |
 30 | 
 31 | ```
 32 | Precision = TP / (TP + FP)  = 4/5 = 0.8
 33 | 
 34 | Recall = TP / (TP + FN) = 4/6 = 0.67
 35 | 
 36 | F1 = 2/(1/P + 1/R) = 2/(1/0.67 + 1/0.8) = 0.73
 37 | ```
 38 | 
 39 | ### Question 1b
 40 | Brute force method of calculating F1 scores for all model outputs as thresholds will take `O(m^2)`. <br><br>
 41 | 
 42 | 1. Sort all samples – `O(m logm)`
 43 | 2. For the first threshold, find TP, FN, FP, FN and calculate F1 Score – `O(m)`
 44 | 3. Next threshold will take `O(1)` since we can change the 4 values in 2.
 45 | 4. After the first computation, it'll take `O(m-1) ~ O(m)` for remaining `m-1` samples
 46 | 
 47 | Total optimised run time is `O(m logm)`.
 48 | 
 49 | ### Question 1c
 50 | Here, the number of thresholds are increased beyond number of samples in the dataset.
 51 | 
 52 | 1. Sort all samples – `O(m logm)`
 53 | 2. If we pick a threshold between two samples (in sorted order), they'll give the same F1 score
 54 | 3. This means there can only be `(m+1)` possible F1 scores to consider
 55 | 4. We can binary search for the best F1 score peak – O(logm)
 56 | 
 57 | ---
 58 | 
 59 | ### Question 2a – Micro
 60 | | _Dog_        | POS_act | NEG_act |
 61 | |--------------|---------|---------|
 62 | | **POS_pred** | 10      | 3       |
 63 | | **NEG_pred** | 6       | 26      |
 64 | 
 65 | | _Cat_        | POS_act | NEG_act |
 66 | |--------------|---------|---------|
 67 | | **POS_pred** | 13      | 5       |
 68 | | **NEG_pred** | 6       | 21      |
 69 | 
 70 | | _Pig_        | POS_act | NEG_act |
 71 | |--------------|---------|---------|
 72 | | **POS_pred** | 7       | 7       |
 73 | | **NEG_pred** | 3       | 28      |
 74 | 
 75 | ### Question 2b
 76 | | _Combined_   | POS_act | NEG_act |
 77 | |--------------|---------|---------|
 78 | | **POS_pred** | 30      | 15      |
 79 | | **NEG_pred** | 15      | 75      |
 80 | 
 81 | ```
 82 | Accuracy_micro = (TP + TN) / (TP + TN + FP + FN) = (30 + 75)/(30 + 75 + 15 + 15) = 0.778
 83 | 
 84 | Precision_micro = TP / (TP + FP) = 30 / (30 + 15) = 0.667
 85 | 
 86 | Recall_micro = TP / (TP + FN) = 30 / (30 + 15) = 0.667
 87 | 
 88 | F1_micro = 2/(1/P + 1/R) = 2/(1/0.667 + 1/0.667) = 0.667
 89 | ```
 90 | 
 91 | ### Question 2c – Macro
 92 | | _Dog_        | POS_act | NEG_act |
 93 | |--------------|---------|---------|
 94 | | **POS_pred** | 10      | 3       |
 95 | | **NEG_pred** | 6       | 26      |
 96 | 
 97 | Precision_Dog = 10 / (10 + 3) = 0.769 <br>
 98 | Recall_Dog = 10 / (10 + 6) = 0.625
 99 | 
100 | | _Cat_        | POS_act | NEG_act |
101 | |--------------|---------|---------|
102 | | **POS_pred** | 13      | 5       |
103 | | **NEG_pred** | 6       | 21      |
104 | 
105 | Precision_Cat = 13 / (13 + 5) = 0.722 <br>
106 | Recall_Cat = 13 / (13 + 6) = 0.684
107 | 
108 | | _Pig_        | POS_act | NEG_act |
109 | |--------------|---------|---------|
110 | | **POS_pred** | 7       | 7       |
111 | | **NEG_pred** | 3       | 28      |
112 | 
113 | Precision_Pig = 7 / (7 + 7) = 0.5 <br>
114 | Recall_Pig = 7 / (7 + 3) = 0.7
115 | 
116 | ```
117 | Precision_micro = (P_Dog + P_Cat + P_Pig) / 3 = 0.664
118 | Recall_micro = (R_Dog + R_Cat + R_Pig) / 3 = 0.664
119 | ```
120 | 
121 | ### Question 2d
122 | | Class | TP  | FP  |
123 | |-------|-----|-----|
124 | | A     | 9   | 1   |
125 | | B     | 100 | 900 |
126 | | C     | 9   | 1   |
127 | | D     | 9   | 1   |
128 | 
129 | ```
130 | Precision_micro = (TP1 + TP2 + TP3) / [(TP1 + FP1) + (TP2 + FP2) + (TP3 + FP3)] 
131 |                 = (9 + 100 + 9 + 9) / (10 + 1000 + 10 + 10) 
132 |                 = 0.137
133 | ```
134 | 
135 | ```
136 | Precision_A = Precision_C = Precision_D = 9 / 10 = 0.9
137 | Precision_B = 100 / 1000 = 0.1
138 | 
139 | Precision_macro = (P1 + P2 + P3) / 4
140 |                 = (0.9 + 0.1 + 0.9 + 0.9) / 4
141 |                 = 0.7
142 | ```
143 | 
144 | We can see that `Precision_macro` >>> `Precision_micro`. <br><br>
145 | 
146 | - The model has high Precision for classes A, C, and D, with low Precision for class B
147 | - `Precision_macro` takes the average of all individual Precision values, treating each class equally
148 |     - It does not consider the heavy imbalance in class B
149 |     - `Precision_macro` is relatively higher as a result
150 | - `Precision_micro` doesn't treat classes equally
151 |     - The imbalances are factored into the calculation
152 |     - Class B has low Precision and makes up the majority of the dataset
153 |     - `Precision_micro` is relatively lower as a result
154 | 


--------------------------------------------------------------------------------
/T6W9/README.md:
--------------------------------------------------------------------------------
1 | # T6W9: Visualisation and Dimensionality Reduction
2 | 
3 | > I was busy tending to personal commitments during this session. TA Pranavan covered for me. Please approach him with any questions on this topic.
4 | 
5 | In T6W9, TA Pranavan covers visualiation and Dimensionality Reduction techniques like PCA, LDA, and SMOTE. Please approach him on Slack for the slides.


--------------------------------------------------------------------------------
/T7W10/README.md:
--------------------------------------------------------------------------------
 1 | # T7W10: Perceptrons and Neural Networks
 2 | 
 3 | In T7W10, we cover Perceptrons, Multilayer Perceptrons, and Artificial Neural Networks (ANN). The content recap and tutorial solutions can be found on the slides [here](https://docs.google.com/presentation/d/1-nG_AElHlAuWQz0EsGt3K9WSjZVQ1OBVne2tjfZCdR0/edit?usp=sharing).
 4 | 
 5 | ## Data Handling Clinic: Session 1
 6 | I and TA Pranavan host the first session of DHC where we cover the basics and intermediate features of `numpy` and `pandas`.
 7 | 
 8 | > The recording will *NOT* be made available in efforts to encourage live attendance and participation on the Zoom call.
 9 | 
10 | However, you can find the lesson material here:
11 | - [DHC Presentation Slides](https://tinyurl.com/3244-dhc-slides)
12 | - [DHC Mastercopy Colab Notebook](https://tinyurl.com/3244-dhc-mastercopy)
13 | - [DHC Student's Copy Colab Notebook](https://tinyurl.com/3244-dhc-stdnt)


--------------------------------------------------------------------------------
/T8W11/README.md:
--------------------------------------------------------------------------------
1 | # T8W11: CNNs and RNNs
2 | 
3 | In T8W11, we cover Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). The content recap and tutorial solutions can be found on the slides [here](https://docs.google.com/presentation/d/1GnExFaXQQlO7wnlnCExNzHbvegDlv6G0JSqgNZ0Jp-g/edit?usp=sharing).


--------------------------------------------------------------------------------
/T9W12/README.md:
--------------------------------------------------------------------------------
1 | # T9W12: Explainable AI
2 | 
3 | In T9W12, we cover Explainable AI. The content recap and tutorial solutions can be found on the slides [here](https://docs.google.com/presentation/d/1XRdBCLYpUGqMWIdfubslUbIkbvE8KFCrcKrfXBXDcA4/edit?usp=sharing).


--------------------------------------------------------------------------------
/misc/CS3244_Midterm_Cheatsheet.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rish-16/CS3244-Tutorial-Material/7681d6ad96354c3d672e5ef526dfd16f03eb5d0c/misc/CS3244_Midterm_Cheatsheet.pdf


--------------------------------------------------------------------------------