├── .gitignore
├── bash
    ├── Readme.md
    ├── hw1.py
    ├── hw1.sh
    └── testing.csv
├── github
    ├── .gitignore
    └── Readme.md
├── miniconda
    ├── .gitignore
    └── Readme.md
└── pytorch
    ├── mnist_pytorch.ipynb
    ├── processed
        ├── test.pt
        └── training.pt
    ├── raw
        ├── t10k-images-idx3-ubyte
        ├── t10k-labels-idx1-ubyte
        ├── train-images-idx3-ubyte
        └── train-labels-idx1-ubyte
    └── readme.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | 


--------------------------------------------------------------------------------
/bash/Readme.md:
--------------------------------------------------------------------------------
1 | #### the ta's will run the following command to test your script
2 | 
3 | `bash hw1.sh testing.csv output.csv`
4 | 
5 | 


--------------------------------------------------------------------------------
/bash/hw1.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | 
 3 | testing_data_path = sys.argv[1]
 4 | output_file_path  = sys.argv[2]
 5 | 
 6 | with open(testing_data_path, 'r') as f:
 7 |     ans = []
 8 |     for line in f:
 9 |         x, y = [int(i) for i in line.split(',')]
10 |         ans.append(x+y)
11 | 
12 | with open(output_file_path, 'w') as f:
13 |     for a in ans:
14 |         f.write(f'{a}\n')
15 | 


--------------------------------------------------------------------------------
/bash/hw1.sh:
--------------------------------------------------------------------------------
1 | testing_data=$1
2 | output_file=$2
3 | 
4 | python hw1.py $testing_data $output_file
5 | 


--------------------------------------------------------------------------------
/bash/testing.csv:
--------------------------------------------------------------------------------
1 | 1, 2
2 | 5, 7
3 | 4, 9
4 | 5, 7
5 | 


--------------------------------------------------------------------------------
/github/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | 


--------------------------------------------------------------------------------
/github/Readme.md:
--------------------------------------------------------------------------------
 1 | # Tutorial for how to use git in the ML class
 2 | 
 3 | 
 4 | 
 5 | #### 1. How to get your own repository
 6 | 
 7 | After creating our own repository, we would want to have a local copy of it, and wish that any changes made to it would be also made to the remote one. 
 8 | 
 9 | So naturally, we need to "clone" the remote repository to local. To do that, type
10 | 
11 | `git clone https://github.com/YOUR_USERNAME/ML2019SPRING`
12 | 
13 | in your command line. 
14 | 
15 | A new folder named `ML2019SPRING` will be created in your working directory. This is your local in-class repository and should be synced with the remote one at all times. 
16 | 
17 | #### 2. Set local repository user credentials
18 | 
19 | We need to setup the email address and username of our current repository.
20 | 
21 | Type `git config user.email [YOUR EMAIL]` to set the email of the repo. 
22 | 
23 | Type `git config user.name [YOUR USERNAME]` to set the username of the repo.
24 | 
25 | You can type `git config user.name(or user.email)` to check that the username/email has been setup correctly.
26 | 
27 | You can alternatively add the `--global` flag to set the email/username of all the git repos on your computer .
28 | 
29 | NOTE: the email and username are not the same as your Github email/username and need not to (but can) be.
30 | 
31 | NOTE2: if you don't want to enter your GitHub username/password every time performing an operation, refer to this discussion:
32 | 
33 | https://stackoverflow.com/questions/6565357/git-push-requires-username-and-password. 
34 | 
35 | #### 3. How to add files to commit
36 | 
37 | To perform a commit, one first needs to "add" the desired files to the local index.
38 | 
39 | Type `git add [LIST OF FILES]`, or `git add -A` to add all files to the index.
40 | 
41 | #### 4. How to commit a change
42 | 
43 | Any changes made to the local repository needs to be committed before being "pushed", or "updated" to the remote one.
44 | 
45 | To view changes being made to the repository, type `git status`.
46 | 
47 | If the updated files are correct, we should then proceed to commit these changes.
48 | 
49 | Type `git commit -m "YOUR COMMIT MESSAGE"` to commit the update. 
50 | 
51 | #### 5. How to push to remote
52 | 
53 | Now that we've committed our changes, it's time for us to push the updated files to the remote repository.
54 | 
55 | Type `git push` to push your files to remote. 
56 | 
57 | 
58 | 
59 | That's it. You've learned how to update and push to your own repository.
60 | 
61 | 


--------------------------------------------------------------------------------
/miniconda/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | 


--------------------------------------------------------------------------------
/miniconda/Readme.md:
--------------------------------------------------------------------------------
 1 | # Tutorial on how to install miniconda
 2 | 
 3 | 
 4 | 
 5 | #### Miniconda is a light weight python package manager for easy management of modules between projects.
 6 | 
 7 | To install Miniconda, go to https://docs.conda.io/en/latest/miniconda.html and download the installer corresponding to your platform. 
 8 | 
 9 | Note that you should only download either the Linux or the OSX version and choose python3.7
10 | 
11 | After the download has completed, run the installer by typing
12 | 
13 | `sh Miniconda3-latest-MacOSX-x86_64.sh` (for Mac), or
14 | 
15 | `sh Miniconda3-latest-Linux-x86_64.sh` (for Linux).
16 | 
17 | Then follow the instructions on screen to finish the installation.
18 | 
19 | 
20 | 
21 | #### Introduction to using Miniconda
22 | 
23 | To create a new environment, type
24 | 
25 | `conda create --name ENV_NAME python=3.6`
26 | 
27 | To activate the environment, type
28 | 
29 | `source activate ENV_NAME`
30 | 
31 | While inside an environment, you can type `which python` to check that the python binaries have changed to the miniconda one. `which pip` tells you that pip is now using the miniconda version. You can still install packages using pip. 
32 | 
33 | To deactivate the environment, type
34 | 
35 | `source deactivate`
36 | 
37 | 
38 | 
39 | #### TA's environment
40 | 
41 | For each HW, we will release a Miniconda environment for you to test your code.
42 | 
43 | If it works in the environment, it should work in our machines.
44 | 
45 | To install the environment, type
46 | 
47 | `conda env create -f TA_ENV_FILE`
48 | 
49 | 


--------------------------------------------------------------------------------
/pytorch/mnist_pytorch.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "## PyTorch MNIST tutorial\n",
  8 |     "\n",
  9 |     "Let's train a model on MNIST handwritten digit dataset."
 10 |    ]
 11 |   },
 12 |   {
 13 |    "cell_type": "code",
 14 |    "execution_count": 1,
 15 |    "metadata": {},
 16 |    "outputs": [],
 17 |    "source": [
 18 |     "from torchvision import datasets, transforms\n",
 19 |     "from torch.utils.data import DataLoader\n",
 20 |     "from torch.optim import Adam\n",
 21 |     "import torch.nn as nn\n",
 22 |     "import torch\n",
 23 |     "%matplotlib inline\n",
 24 |     "import matplotlib.pyplot as plt\n",
 25 |     "import numpy as np"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "markdown",
 30 |    "metadata": {},
 31 |    "source": [
 32 |     "First we need to prepare our training and testing data. Training(Testing) data are sometimes not formatted well, you might need to do some data pre-processing. Here we simply download the dataset that is well-formatted."
 33 |    ]
 34 |   },
 35 |   {
 36 |    "cell_type": "code",
 37 |    "execution_count": 2,
 38 |    "metadata": {},
 39 |    "outputs": [],
 40 |    "source": [
 41 |     "from torchvision import datasets, transforms\n",
 42 |     "\n",
 43 |     "# Transform PIL image to Tensor \n",
 44 |     "# It includes normalization pixel value to range of [0, 1]\n",
 45 |     "transform = transforms.Compose([transforms.ToTensor()])\n",
 46 |     "\n",
 47 |     "# MNIST dataset\n",
 48 |     "train_dataset = datasets.MNIST(root=\"./\", train=True, transform=transform, download=True)\n",
 49 |     "test_dataset = datasets.MNIST(root='./', train=False, transform=transform, download=True)"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "markdown",
 54 |    "metadata": {},
 55 |    "source": [
 56 |     "Take a look at the first image (id = 0)"
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "code",
 61 |    "execution_count": 3,
 62 |    "metadata": {},
 63 |    "outputs": [
 64 |     {
 65 |      "name": "stdout",
 66 |      "output_type": "stream",
 67 |      "text": [
 68 |       "torch.Size([1, 28, 28])\n",
 69 |       "tensor(5)\n"
 70 |      ]
 71 |     }
 72 |    ],
 73 |    "source": [
 74 |     "sample_img, sample_label = train_dataset[0]\n",
 75 |     "print(sample_img.size())\n",
 76 |     "print(sample_label)"
 77 |    ]
 78 |   },
 79 |   {
 80 |    "cell_type": "markdown",
 81 |    "metadata": {},
 82 |    "source": [
 83 |     "Visualize this sample image."
 84 |    ]
 85 |   },
 86 |   {
 87 |    "cell_type": "code",
 88 |    "execution_count": 4,
 89 |    "metadata": {},
 90 |    "outputs": [
 91 |     {
 92 |      "data": {
 93 |       "text/plain": [
 94 |        "<matplotlib.image.AxesImage at 0x1168c08d0>"
 95 |       ]
 96 |      },
 97 |      "execution_count": 4,
 98 |      "metadata": {},
 99 |      "output_type": "execute_result"
100 |     },
101 |     {
102 |      "data": {
103 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP8AAAD8CAYAAAC4nHJkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAADgdJREFUeJzt3X9sXfV5x/HPs9D8QRoIXjUTpWFpIhQUIuZOJkwoGkXM5YeCggGhWkLKRBT3j1ii0hQNZX8MNAVFg2RqBKrsqqHJ1KWZBCghqpp0CZBOTBEmhF9mKQylqi2TFAWTH/zIHD/74x53Lvh+r3Pvufdc+3m/JMv3nuecex4d5ZPz8/pr7i4A8fxJ0Q0AKAbhB4Ii/EBQhB8IivADQRF+ICjCDwRF+IGgCD8Q1GWNXJmZ8TghUGfublOZr6Y9v5ndYWbHzex9M3ukls8C0FhW7bP9ZjZL0m8kdUgalPSqpC53H0gsw54fqLNG7PlXSHrf3T9w9wuSfi5pdQ2fB6CBagn/Akm/m/B+MJv2R8ys28z6zay/hnUByFndL/i5e5+kPonDfqCZ1LLnH5K0cML7b2bTAEwDtYT/VUnXmtm3zGy2pO9J2ptPWwDqrerDfncfNbMeSfslzZK03d3fya0zAHVV9a2+qlbGOT9Qdw15yAfA9EX4gaAIPxAU4QeCIvxAUIQfCIrwA0ERfiAowg8ERfiBoAg/EBThB4Ii/EBQhB8IivADQRF+ICjCDwRF+IGgCD8QFOEHgiL8QFCEHwiK8ANBEX4gKMIPBEX4gaAIPxAU4QeCIvxAUFUP0S1JZnZC0llJFyWNunt7Hk0hP7NmzUrWr7zyyrquv6enp2zt8ssvTy67dOnSZH39+vXJ+pNPPlm21tXVlVz2888/T9Y3b96crD/22GPJejOoKfyZW939oxw+B0ADcdgPBFVr+F3SATN7zcy682gIQGPUeti/0t2HzOzPJP3KzP7b3Q9PnCH7T4H/GIAmU9Oe392Hst+nJD0vacUk8/S5ezsXA4HmUnX4zWyOmc0dfy3pu5LezqsxAPVVy2F/q6TnzWz8c/7N3X+ZS1cA6q7q8Lv7B5L+IsdeZqxrrrkmWZ89e3ayfvPNNyfrK1euLFubN29ectn77rsvWS/S4OBgsr5t27ZkvbOzs2zt7NmzyWXfeOONZP3ll19O1qcDbvUBQRF+ICjCDwRF+IGgCD8QFOEHgjJ3b9zKzBq3sgZqa2tL1g8dOpSs1/trtc1qbGwsWX/ooYeS9XPnzlW97uHh4WT9448/TtaPHz9e9brrzd1tKvOx5weCIvxAUIQfCIrwA0ERfiAowg8ERfiBoLjPn4OWlpZk/ciRI8n64sWL82wnV5V6HxkZSdZvvfXWsrULFy4kl436/EOtuM8PIInwA0ERfiAowg8ERfiBoAg/EBThB4LKY5Te8E6fPp2sb9iwIVlftWpVsv76668n65X+hHXKsWPHkvWOjo5k/fz588n69ddfX7b28MMPJ5dFfbHnB4Ii/EBQhB8IivADQRF+ICjCDwRF+IGgKn6f38y2S1ol6ZS7L8+mtUjaLWmRpBOSHnD39B8618z9Pn+trrjiimS90nDSvb29ZWtr165NLvvggw8m67t27UrW0Xzy/D7/TyXd8aVpj0g66O7XSjqYvQcwjVQMv7sflvTlR9hWS9qRvd4h6Z6c+wJQZ9We87e6+/h4Rx9Kas2pHwANUvOz/e7uqXN5M+uW1F3regDkq9o9/0kzmy9J2e9T5WZ09z53b3f39irXBaAOqg3/XklrstdrJO3Jpx0AjVIx/Ga2S9J/SVpqZoNmtlbSZkkdZvaepL/J3gOYRiqe87t7V5nSbTn3EtaZM2dqWv6TTz6petl169Yl67t3707Wx8bGql43isUTfkBQhB8IivADQRF+ICjCDwRF+IGgGKJ7BpgzZ07Z2gsvvJBc9pZbbknW77zzzmT9wIEDyToajyG6ASQRfiAowg8ERfiBoAg/EBThB4Ii/EBQ3Oef4ZYsWZKsHz16NFkfGRlJ1l988cVkvb+/v2zt6aefTi7byH+bMwn3+QEkEX4gKMIPBEX4gaAIPxAU4QeCIvxAUNznD66zszNZf+aZZ5L1uXPnVr3ujRs3Jus7d+5M1oeHh5P1qLjPDyCJ8ANBEX4gKMIPBEX4gaAIPxAU4QeCqnif38y2S1ol6ZS7L8+mPSppnaTfZ7NtdPdfVFwZ9/mnneXLlyfrW7duTdZvu636kdx7e3uT9U2bNiXrQ0NDVa97OsvzPv9PJd0xyfR/cfe27Kdi8AE0l4rhd/fDkk43oBcADVTLOX+Pmb1pZtvN7KrcOgLQENWG/0eSlkhqkzQsaUu5Gc2s28z6zaz8H3MD0HBVhd/dT7r7RXcfk/RjSSsS8/a5e7u7t1fbJID8VRV+M5s/4W2npLfzaQdAo1xWaQYz2yXpO5K+YWaDkv5R0nfMrE2SSzoh6ft17BFAHfB9ftRk3rx5yfrdd99dtlbpbwWYpW9XHzp0KFnv6OhI1mcqvs8PIInwA0ERfiAowg8ERfiBoAg/EBS3+lCYL774Ilm/7LL0Yyijo6PJ+u2331629tJLLyWXnc641QcgifADQRF+ICjCDwRF+IGgCD8QFOEHgqr4fX7EdsMNNyTr999/f7J+4403lq1Vuo9fycDAQLJ++PDhmj5/pmPPDwRF+IGgCD8QFOEHgiL8QFCEHwiK8ANBcZ9/hlu6dGmy3tPTk6zfe++9yfrVV199yT1N1cWLF5P14eHhZH1sbCzPdmYc9vxAUIQfCIrwA0ERfiAowg8ERfiBoAg/EFTF+/xmtlDSTkmtklxSn7v/0MxaJO2WtEjSCUkPuPvH9Ws1rkr30ru6usrWKt3HX7RoUTUt5aK/vz9Z37RpU7K+d+/ePNsJZyp7/lFJf+fuyyT9laT1ZrZM0iOSDrr7tZIOZu8BTBMVw+/uw+5+NHt9VtK7khZIWi1pRzbbDkn31KtJAPm7pHN+M1sk6duSjkhqdffx5ys/VOm0AMA0MeVn+83s65KelfQDdz9j9v/Dgbm7lxuHz8y6JXXX2iiAfE1pz29mX1Mp+D9z9+eyySfNbH5Wny/p1GTLunufu7e7e3seDQPIR8XwW2kX/xNJ77r71gmlvZLWZK/XSNqTf3sA6qXiEN1mtlLSryW9JWn8O5IbVTrv/3dJ10j6rUq3+k5X+KyQQ3S3tqYvhyxbtixZf+qpp5L166677pJ7ysuRI0eS9SeeeKJsbc+e9P6Cr+RWZ6pDdFc853f3/5RU7sNuu5SmADQPnvADgiL8QFCEHwiK8ANBEX4gKMIPBMWf7p6ilpaWsrXe3t7ksm1tbcn64sWLq+opD6+88kqyvmXLlmR9//79yfpnn312yT2hMdjzA0ERfiAowg8ERfiBoAg/EBThB4Ii/EBQYe7z33TTTcn6hg0bkvUVK1aUrS1YsKCqnvLy6aeflq1t27Ytuezjjz+erJ8/f76qntD82PMDQRF+ICjCDwRF+IGgCD8QFOEHgiL8QFBh7vN3dnbWVK/FwMBAsr5v375kfXR0NFlPfed+ZGQkuSziYs8PBEX4gaAIPxAU4QeCIvxAUIQfCIrwA0GZu6dnMFsoaaekVkkuqc/df2hmj0paJ+n32awb3f0XFT4rvTIANXN3m8p8Uwn/fEnz3f2omc2V9JqkeyQ9IOmcuz851aYIP1B/Uw1/xSf83H1Y0nD2+qyZvSup2D9dA6Bml3TOb2aLJH1b0pFsUo+ZvWlm283sqjLLdJtZv5n119QpgFxVPOz/w4xmX5f0sqRN7v6cmbVK+kil6wD/pNKpwUMVPoPDfqDOcjvnlyQz+5qkfZL2u/vWSeqLJO1z9+UVPofwA3U21fBXPOw3M5P0E0nvTgx+diFwXKekty+1SQDFmcrV/pWSfi3pLUlj2eSNkroktal02H9C0vezi4Opz2LPD9RZrof9eSH8QP3ldtgPYGYi/EBQhB8IivADQRF+ICjCDwRF+IGgCD8QFOEHgiL8QFCEHwiK8ANBEX4gKMIPBNXoIbo/kvTbCe+/kU1rRs3aW7P2JdFbtfLs7c+nOmNDv8//lZWb9bt7e2ENJDRrb83al0Rv1SqqNw77gaAIPxBU0eHvK3j9Kc3aW7P2JdFbtQrprdBzfgDFKXrPD6AghYTfzO4ws+Nm9r6ZPVJED+WY2Qkze8vMjhU9xFg2DNopM3t7wrQWM/uVmb2X/Z50mLSCenvUzIaybXfMzO4qqLeFZvaimQ2Y2Ttm9nA2vdBtl+irkO3W8MN+M5sl6TeSOiQNSnpVUpe7DzS0kTLM7ISkdncv/J6wmf21pHOSdo6PhmRm/yzptLtvzv7jvMrd/75JentUlzhyc516Kzey9N+qwG2X54jXeShiz79C0vvu/oG7X5D0c0mrC+ij6bn7YUmnvzR5taQd2esdKv3jabgyvTUFdx9296PZ67OSxkeWLnTbJfoqRBHhXyDpdxPeD6q5hvx2SQfM7DUz6y66mUm0ThgZ6UNJrUU2M4mKIzc30pdGlm6abVfNiNd544LfV61097+UdKek9dnhbVPy0jlbM92u+ZGkJSoN4zYsaUuRzWQjSz8r6QfufmZirchtN0lfhWy3IsI/JGnhhPffzKY1BXcfyn6fkvS8SqcpzeTk+CCp2e9TBffzB+5+0t0vuvuYpB+rwG2XjSz9rKSfuftz2eTCt91kfRW13YoI/6uSrjWzb5nZbEnfk7S3gD6+wszmZBdiZGZzJH1XzTf68F5Ja7LXayTtKbCXP9IsIzeXG1laBW+7phvx2t0b/iPpLpWu+P+PpH8ooocyfS2W9Eb2807RvUnapdJh4P+qdG1kraQ/lXRQ0nuS/kNSSxP19q8qjeb8pkpBm19QbytVOqR/U9Kx7Oeuorddoq9CthtP+AFBccEPCIrwA0ERfiAowg8ERfiBoAg/EBThB4Ii/EBQ/weCC5r/92q6mAAAAABJRU5ErkJggg==\n",
104 |       "text/plain": [
105 |        "<Figure size 432x288 with 1 Axes>"
106 |       ]
107 |      },
108 |      "metadata": {},
109 |      "output_type": "display_data"
110 |     }
111 |    ],
112 |    "source": [
113 |     "plt.imshow(sample_img.squeeze(0).numpy() * 255, cmap='gray')"
114 |    ]
115 |   },
116 |   {
117 |    "cell_type": "markdown",
118 |    "metadata": {},
119 |    "source": [
120 |     "Then we need a dataloader that we iterately access our data.\n",
121 |     " \n",
122 |     "`DataLoader` handles the data in each iterations during training.\n",
123 |     "\n",
124 |     "You can set `batch_size`, `shuffle`"
125 |    ]
126 |   },
127 |   {
128 |    "cell_type": "code",
129 |    "execution_count": 5,
130 |    "metadata": {},
131 |    "outputs": [],
132 |    "source": [
133 |     "# load the dataset into dataloader\n",
134 |     "train_loader = DataLoader(dataset=train_dataset,\n",
135 |     "                          batch_size=16,\n",
136 |     "                          shuffle=True)\n",
137 |     "test_loader = DataLoader(dataset=test_dataset,\n",
138 |     "                         batch_size=16,\n",
139 |     "                         shuffle=False)"
140 |    ]
141 |   },
142 |   {
143 |    "cell_type": "markdown",
144 |    "metadata": {},
145 |    "source": [
146 |     "Of course, you still can modify how it access data by replace the `collate_fn` parameter.\n",
147 |     "\n",
148 |     "For more infomation: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html"
149 |    ]
150 |   },
151 |   {
152 |    "cell_type": "markdown",
153 |    "metadata": {},
154 |    "source": [
155 |     "### Build Model\n",
156 |     "\n",
157 |     "As the dataset is ready, we are ready for building our model.\n",
158 |     "\n",
159 |     "This is a simple fully-connected model.\n",
160 |     "\n",
161 |     "It is better to modualize all your models if the structure gets more complicated."
162 |    ]
163 |   },
164 |   {
165 |    "cell_type": "code",
166 |    "execution_count": 6,
167 |    "metadata": {},
168 |    "outputs": [],
169 |    "source": [
170 |     "# define our model\n",
171 |     "class Model(nn.Module):\n",
172 |     "    def __init__(self):\n",
173 |     "        super(Model, self).__init__()\n",
174 |     "        self.fc = nn.Sequential(\n",
175 |     "            nn.Linear(784, 128),\n",
176 |     "            nn.ReLU(inplace=True),\n",
177 |     "            nn.Linear(128, 64),\n",
178 |     "            nn.ReLU(inplace=True),\n",
179 |     "            nn.Linear(64, 10),\n",
180 |     "        )\n",
181 |     "        self.output = nn.Softmax(dim=1)\n",
182 |     "    \n",
183 |     "    def forward(self, x):\n",
184 |     "        # You can modify your model connection whatever you like\n",
185 |     "        out = self.fc(x.view(-1, 28*28))\n",
186 |     "        out = self.output(out)\n",
187 |     "        return out"
188 |    ]
189 |   },
190 |   {
191 |    "cell_type": "markdown",
192 |    "metadata": {},
193 |    "source": [
194 |     "### Start Training"
195 |    ]
196 |   },
197 |   {
198 |    "cell_type": "markdown",
199 |    "metadata": {},
200 |    "source": [
201 |     "Cuda configurations: set to cuda if you have a GPU device."
202 |    ]
203 |   },
204 |   {
205 |    "cell_type": "code",
206 |    "execution_count": 8,
207 |    "metadata": {},
208 |    "outputs": [],
209 |    "source": [
210 |     "device = torch.device('cuda')     \n",
211 |     "# device = torch.device('cpu') for cpu"
212 |    ]
213 |   },
214 |   {
215 |    "cell_type": "markdown",
216 |    "metadata": {},
217 |    "source": [
218 |     "Set up: our model, optimizer, and loss function"
219 |    ]
220 |   },
221 |   {
222 |    "cell_type": "code",
223 |    "execution_count": 9,
224 |    "metadata": {},
225 |    "outputs": [],
226 |    "source": [
227 |     "model = Model()\n",
228 |     "model.to(device)\n",
229 |     "optimizer = Adam(model.parameters(), lr=0.0001)\n",
230 |     "loss_fn = nn.CrossEntropyLoss()"
231 |    ]
232 |   },
233 |   {
234 |    "cell_type": "markdown",
235 |    "metadata": {},
236 |    "source": [
237 |     "Start training!"
238 |    ]
239 |   },
240 |   {
241 |    "cell_type": "code",
242 |    "execution_count": 10,
243 |    "metadata": {},
244 |    "outputs": [
245 |     {
246 |      "name": "stdout",
247 |      "output_type": "stream",
248 |      "text": [
249 |       "Epoch: 1, Loss: 1.6988, Acc: 0.8141\n",
250 |       "Epoch: 2, Loss: 1.5633, Acc: 0.9122\n",
251 |       "Epoch: 3, Loss: 1.5482, Acc: 0.9226\n",
252 |       "Epoch: 4, Loss: 1.5393, Acc: 0.9292\n",
253 |       "Epoch: 5, Loss: 1.5324, Acc: 0.9353\n",
254 |       "Epoch: 6, Loss: 1.5269, Acc: 0.9400\n",
255 |       "Epoch: 7, Loss: 1.5222, Acc: 0.9440\n",
256 |       "Epoch: 8, Loss: 1.5179, Acc: 0.9484\n",
257 |       "Epoch: 9, Loss: 1.5142, Acc: 0.9516\n",
258 |       "Epoch: 10, Loss: 1.5109, Acc: 0.9545\n"
259 |      ]
260 |     }
261 |    ],
262 |    "source": [
263 |     "# start training\n",
264 |     "model.train()\n",
265 |     "\n",
266 |     "for epoch in range(10):\n",
267 |     "    train_loss = []\n",
268 |     "    train_acc = []\n",
269 |     "    for _, (img, target) in enumerate(train_loader):\n",
270 |     "        img_cuda = img.to(device)\n",
271 |     "        target_cuda = target.to(device)\n",
272 |     "        \n",
273 |     "        # You can also use\n",
274 |     "        # img_cuda = img.cuda()\n",
275 |     "        # target_cuda = target.cuda()\n",
276 |     "        \n",
277 |     "        \n",
278 |     "        optimizer.zero_grad()\n",
279 |     "        \n",
280 |     "        output = model(img_cuda)\n",
281 |     "        loss = loss_fn(output, target_cuda)\n",
282 |     "        loss.backward()\n",
283 |     "        optimizer.step()\n",
284 |     "        \n",
285 |     "        predict = torch.max(output, 1)[1]\n",
286 |     "        acc = np.mean((target_cuda == predict).numpy())\n",
287 |     "        \n",
288 |     "        train_acc.append(acc)\n",
289 |     "        train_loss.append(loss.item())\n",
290 |     "        \n",
291 |     "    print(\"Epoch: {}, Loss: {:.4f}, Acc: {:.4f}\".format(epoch + 1, np.mean(train_loss), np.mean(train_acc)))"
292 |    ]
293 |   },
294 |   {
295 |    "cell_type": "markdown",
296 |    "metadata": {},
297 |    "source": [
298 |     "Done! That is all for training."
299 |    ]
300 |   }
301 |  ],
302 |  "metadata": {
303 |   "kernelspec": {
304 |    "display_name": "Python 3",
305 |    "language": "python",
306 |    "name": "python3"
307 |   },
308 |   "language_info": {
309 |    "codemirror_mode": {
310 |     "name": "ipython",
311 |     "version": 3
312 |    },
313 |    "file_extension": ".py",
314 |    "mimetype": "text/x-python",
315 |    "name": "python",
316 |    "nbconvert_exporter": "python",
317 |    "pygments_lexer": "ipython3",
318 |    "version": "3.7.2"
319 |   }
320 |  },
321 |  "nbformat": 4,
322 |  "nbformat_minor": 1
323 | }
324 | 


--------------------------------------------------------------------------------
/pytorch/processed/test.pt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hyes92121/ml-tutorial/4b1699816f00cf26f34ecd381a24fcc163db964e/pytorch/processed/test.pt


--------------------------------------------------------------------------------
/pytorch/processed/training.pt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hyes92121/ml-tutorial/4b1699816f00cf26f34ecd381a24fcc163db964e/pytorch/processed/training.pt


--------------------------------------------------------------------------------
/pytorch/raw/t10k-images-idx3-ubyte:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hyes92121/ml-tutorial/4b1699816f00cf26f34ecd381a24fcc163db964e/pytorch/raw/t10k-images-idx3-ubyte


--------------------------------------------------------------------------------
/pytorch/raw/t10k-labels-idx1-ubyte:
--------------------------------------------------------------------------------
1 |     ' 		 	 		   		  				 						   		  	  	 				 		 		  					      			  	 	   		 			 		   		 			 		 	 		 	   		 	  	   	 		  	 				    					  	    	  		 		  	 	   	 		 	   			 			  	  		 	 	 	 		 		 			 			 	 	 	 	  	   	   	 	 		   		   						 	 		 	    	 						 	 		 	  		 		 		  	   			 	    			   	 		 				 			 		  	  	  	 		       				  	  			   			 		 	 			   	 	 	 	 				   		   	  		   		 	 	  	 	  		  		 								   		 		    				   	  	 	 	 		 			  	 			   			 	 	  			  	 			  	   		  	    		 			     	 			  		    	  		  	    						 			  					 	 		   	 		 			   				    		   			    					    	 	    		      		        	     	 				 		   	     						  			 				 		 	 	   			  	      		  		   	 			   			   	 		    		 			  					  	 				 					   			  	 			 	 		 						 	  	 	    	   	  								  	  		 				  	 			 			 				 	  		 										 		  	 	   	 	 	 	   								 		  	     			 				   	 		 							 	  	   	  		 				 	 								  	  	 		  	 		  	  					  	 	  				   		   						  		  						   	 	 	 		 	 		   	 		  	  	 					 	  	      			 	 	 	   		  	  		   					 	 	 	 	   	  	    		   	 	  		 	 			 	  	 	 				    		 	    			 	 	   		 	 				 		  	    	  	 		 	 		 	 	   	 	 	 		 	    		   			 		 	 	 	  	  		     	 				 	 	 	 					 		        		 	     	 		  						    	 	 	     	 				 			    		 		    	  		 		 	 	 				 		 	 	 		      		  	 		  	 	 	    	 		  							    	 	 	  	     							  		  	 	 	  				 		 	 	 	   	 	 	    	 	 					 	   	 	 	 		 	  	 		   		 		  	    			 	 		 			  	 	 	   	 	 					 	   	 	 			 		  		  	 	   	  	 	 		  	 			 	  		  	  	 		 	  	 	  		   	 	 	 		   				  	  	 		   	 	 	 	 		   	 	 		    			   		 	 	 	 	 	 	 	  		 		 		     	 		 	 	  	 	 			 	 	 		    			  	 	 	 	    		 		    	  		 		 	 	 				    		 	    			 	 	 	 			 	       		 				 	 	 			 				    		   			   		 	 	 			    			  	 	  	 	 				       					 	  			  	  	   	 	 	 	  		 	 		 	      				 	 	 	   		 	 					 	 	 	    		  	  		   		 	 	 	 		    				   	  	 		 	   		 	  		 	 	     	 	 


--------------------------------------------------------------------------------
/pytorch/raw/train-images-idx3-ubyte:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hyes92121/ml-tutorial/4b1699816f00cf26f34ecd381a24fcc163db964e/pytorch/raw/train-images-idx3-ubyte


--------------------------------------------------------------------------------
/pytorch/raw/train-labels-idx1-ubyte:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hyes92121/ml-tutorial/4b1699816f00cf26f34ecd381a24fcc163db964e/pytorch/raw/train-labels-idx1-ubyte


--------------------------------------------------------------------------------
/pytorch/readme.md:
--------------------------------------------------------------------------------
  1 | # PyTorch tutorial
  2 | 
  3 | **TA tutorial,  Machine Learning (2019 Spring)**
  4 | 
  5 | ## Contents
  6 | * Package Requirements
  7 | * NumPy Array Manipulation
  8 | * PyTorch
  9 | * Start building a model
 10 | 
 11 | ## Package Requirement
 12 | **Note: This is a tutorial for `PyTorch==1.0.1` version**
 13 | * PyTorch == 1.0.1
 14 | * NumPy >= 1.14
 15 | * SciPy == 1.2.1
 16 | 
 17 | ## NumPy Array Manipulation 
 18 | Some useful functions that you may use for managing your training data. We **must** carefully check our data dimensions are logically correct.
 19 | 
 20 | * `np.concatenate((arr_1, arr_2, ...), axis=0)`
 21 |   
 22 |    Note that the shape of array in the sequence should be the same except the dimension corresponds to the axis.
 23 |    
 24 |    ```python
 25 |        # concatenate two array
 26 |        a1 = np.array([[1, 2], [3, 4], [5, 6]])    # shape: (3, 2)
 27 |        a2 = np.array([[3, 4], [5, 6], [7, 8]])    # shape: (3, 2)
 28 | 
 29 |        # along the axis = 0
 30 |        a3 = np.concatenate((a1, a2), axis=0)      # shape: (6, 4)
 31 |    
 32 |        # along the axis = 1
 33 |        a4 = np.concatenate((a1, a2), axis=1)      # shape: (3, 4)
 34 |    ```
 35 |    
 36 | * `np.transpose(arr, axis)`
 37 |   
 38 |    Mostly we use it to align the dimension of our data.
 39 |    ```python
 40 |        # transpose 2D array
 41 |        a5 = np.array([[1, 2], [3, 4], [5, 6]])    # shape: (3, 2)
 42 |        np.transpose(a5)                           # shape: (2, 3)
 43 |    ```
 44 |    
 45 |    We can also permute multiple axis of the array.
 46 |    
 47 |    ```python
 48 |        a6 = np.array([[[1, 2], [3, 4], [5, 6]]])  # shape: (1, 3, 2)
 49 |        np.transpose((a6), axis=(2, 1, 0))         # shape: (2, 3, 1)
 50 |    ```
 51 |    
 52 | ## PyTorch
 53 | 
 54 | ### Tensor Manipulation
 55 | 
 56 | A `torch.tensor` is conceptually identical to a numpy array, but with GPU support and additional attributes to allow Pytorch operations. 
 57 | 
 58 | * Create a tensor
 59 | 
 60 |     ```python
 61 |         b1 = torch.tensor([[[1, 2, 3], [4, 5, 6]]])
 62 |     ```
 63 | 
 64 | * Some frequently-used functions you can use
 65 |     ```python
 66 |         b1.size()               # to check to size of the tensor
 67 |                                 # torch.Size([1, 2, 3])
 68 |         b1.view((1, 3, 2))      # same as view in numpy (same underlying data, different interpretations)
 69 |                                 # tensor([[[1, 2],
 70 |                                 #          [3, 4],
 71 |                                 #          [5, 6]]])
 72 |         b1.squeeze()            # removes all the dimensions of size 1 
 73 |                                 # tensor([[1, 2, 3],
 74 |                                 #         [4, 5, 6]])
 75 |         b1.unsqueeze()          # inserts a new dimension of size one in a specific position
 76 |                                 # tensor([[[[1, 2, 3],
 77 |                                 #           [4, 5, 6]]]])
 78 |     ```
 79 | 
 80 | * Other manipulation functions are similar to that of NumPy, we omitted it here for simplification. For more information, please check the PyTorch documentation: https://pytorch.org/docs/stable/tensors.html
 81 | 
 82 | ### Tensor Attributes
 83 | 
 84 | - Some important attributes of `torch.tensor`
 85 | 
 86 | - ```python
 87 |         b1.grad                 # gradient of the tensor
 88 |         b1.grad_fn              # the gradient function of the tensor
 89 |         b1.is_leaf              # check if tensor is a leaf node of the graph
 90 |         b1.requires_grad        # if set to True, starts tracking all operations performed
 91 |     ```
 92 | 
 93 | ### Autograd
 94 | 
 95 | **torch.Autograd** is a package that provides functions implementing differentiation for scalar outputs.
 96 | 
 97 | For example:
 98 | * Create a tensor and set `requires_grad=True` to track the computation with it.
 99 | 
100 |     ```python
101 |         x1 = torch.tensor([[1., 2.],
102 |                            [3., 4.]], requires_grad=True)
103 |       # x1.grad             None 
104 |       # x1.grad_fn          None
105 |       # x1.is_leaf          True
106 |       # x1.requires_grad    True
107 |         
108 |         x2 = torch.tensor([[1., 2.],
109 |                            [3., 4.]], requires_grad=True)
110 |       # x2.grad             None 
111 |       # x2.grad_fn          None
112 |       # x2.is_leaf          True
113 |       # x2.requires_grad    True
114 |     ```
115 | 
116 |     It also enables the tensor to do gradient computations later on.
117 | 
118 |     Note: Only floating dtype can require gradients.
119 | 
120 | * Do some simple operation
121 | 
122 |     ```python
123 |         z = (0.5 * x1 + x2).sum()
124 |       # x2.grad             None 
125 |       # x2.grad_fn          <SumBackward0>
126 |       # x2.is_leaf          False
127 |       # x2.requires_grad    True
128 |     ```
129 | 
130 |     Note: If we view `x1` as 
131 | 
132 |     ![equation](https://latex.codecogs.com/svg.latex?%5Clarge%20X_1%3D%20%5Cleft%5B%20%7B%5Cbegin%7Barray%7D%7Bcc%7D%20x_1%20%26%20x_2%20%5C%5C%20x_3%20%26%20x_4%20%5C%5C%20%5Cend%7Barray%7D%20%7D%20%5Cright%5D)
133 | 
134 |     ​           and view `x2` as 
135 | 
136 |     ![equation](https://latex.codecogs.com/svg.latex?%5Clarge%20X_2%3D%20%5Cleft%5B%20%7B%5Cbegin%7Barray%7D%7Bcc%7D%20x_5%20%26%20x_6%20%5C%5C%20x_7%20%26%20x_8%20%5C%5C%20%5Cend%7Barray%7D%20%7D%20%5Cright%5D)
137 | 
138 |     ​           Then `z` is equivilant to ![equation](https://latex.codecogs.com/svg.latex?%5Clarge%20z%3D%5Cfrac%7B1%7D%7B2%7D%28x_1&plus;x_2&plus;x_3&plus;x_4%29&plus;%28x_5&plus;x_6&plus;x_7&plus;x_8%29)
139 | 
140 | * Call `backward()` function to compute gradients automatically
141 | 
142 |     ```python
143 |         z.backward()	# this is identical to calling z.backward(torch.tensor(1.))
144 |     ```
145 | 
146 |     `z.backward()` is actually just the derivative of z with respect to inputs (tensors whose `is_leaf` and `requires_grad` both equals `True`)
147 | 
148 |     For example, if we want to know the derivative of `z` with respect to `x_1`, it is:
149 | 
150 |     ![equation](https://latex.codecogs.com/svg.latex?%5Clarge%20%5Cfrac%7B%5Cpartial%20z%7D%7B%5Cpartial%20x_1%7D%3D0.5)
151 | 
152 | * Check the gradients using `.grad`
153 | 
154 |     ```python
155 |         x1.grad
156 |         x2.grad
157 |     ```
158 | 
159 |     Output will be something like this
160 | 
161 |     ```python
162 |         tensor([[[0.5000, 0.5000],        # x1.grad
163 |                  [0.5000, 0.5000]]])
164 |         tensor([[[1., 1.],                # x2.grad
165 |                  [1., 1.]]])
166 |     ```
167 | 
168 | More in-depth explanation of Autograd can be found in this awesome youtube video: [Link](https://youtu.be/MswxJw-8PvE)
169 | 
170 | ## Start building a model
171 | 
172 | ### Dataset class
173 | 
174 | Pytorch provides a convenient way for interacting with datasets by `torch.utils.data.Dataset`, an abstract class representing a dataset. When datasets are large, the RAM on our machine may not be large enough to fit all the data at once. Instead, we load only a portion of the data when needed, and move it back to disk when finished using.
175 | 
176 | A simple dataset is created as follows:	
177 | 
178 | ```python
179 | import csv 
180 | from torch.utils.data import Dataset
181 | 
182 | class MyDataset(Dataset):
183 |     def __init__(self, label_path):
184 |         """
185 |         let's assume the csv is as follows:
186 |         ================================
187 |         image_path                 label
188 |         imgs/001.png               1     
189 |         imgs/002.png               0     
190 |         imgs/003.png               2     
191 |         imgs/004.png               1     
192 |                       .
193 |                       .
194 |                       .
195 |         ================================
196 |        	And we define a function parse_csv() that parses the csv into a list of tuples 
197 |        	[('imgs/001.png', 1), ('imgs/002.png', 0)...]
198 |         """		
199 |         self.label = parse_csv()
200 |        
201 |     def __len__(self):
202 |         return len(self.labels)
203 |     
204 |     def __getitem__(self, idx):
205 |         img_path, label = self.label[idx]
206 |        	
207 |         # imread: a function that reads an image from path
208 |         
209 |         img = imread(img_path)
210 |         
211 |         # some operations/transformations
212 |         
213 |         return torch.tensor(img), torch.tensor(label)
214 |         
215 | ```
216 | 
217 | Note that `MyDataset` inherits `Dataset`. If we look at the [source code](https://pytorch.org/docs/stable/_modules/torch/utils/data/dataset.html#Dataset), we can see that the default behavior of `__len__` and `__getitem__` is to raise a `NotImplementedError`, meaning that we should override them every time we create a custom dataset. 
218 | 
219 | ### Dataloader
220 | 
221 | We can iterate through the dataset with a `for` loop, but we cannot shuffle, batch or load the data in parallel. `torch.utils.data.Dataloader` is an iterator which provides all those features. We can specify the batch size, whether to shuffle the data, and number of workers to load the data. 
222 | 
223 | ```python
224 | from torch.utils.data import DataLoader
225 | 
226 | dataset = MyDataset('/imgs')
227 | dataloader = DataLoader(dataset, batch_size=32, shuffle=True, num_workers=4)
228 | 
229 | for batch_id, batch in enumerate(dataloader):
230 |     imgs, labels = batch
231 |     
232 |     """
233 |     do something for each batch
234 |     ex: 
235 |         output = model(imgs) 
236 |         loss = cross_entropy(output, labels)
237 |     """
238 | 
239 | ```
240 | 
241 | ### Model
242 | 
243 | Pytorch provides an `nn.Module` for easy definition of a model. A simple DNN model can be defined as such:
244 | 
245 | ```python
246 | import torch
247 | import torch.nn as nn
248 | 
249 | class MyNet(nn.Module):
250 |     def __init__(self):
251 |         super(MyNet, self).__init__() # call parent __init__ function
252 |         self.fc = nn.Sequential(
253 |             nn.Linear(784, 128),
254 |             nn.ReLU(inplace=True),
255 |             nn.Linear(128, 64),
256 |             nn.ReLU(inplace=True),
257 |             nn.Linear(64, 10),
258 |         )
259 |         self.output = nn.Softmax(dim=1)
260 |        
261 |     def forward(self, x):
262 |         # You can modify your model connection whatever you like
263 |         out = self.fc(x.view(-1, 28*28))
264 |         out = self.output(out)
265 |         return out        
266 | ```
267 | 
268 | We let our model inherit from the `nn.Module` class. But why do we need to call `super` in the `__init__` function whereas in the `Dataset` case we don't ? If we look at the [source code](https://pytorch.org/docs/stable/_modules/torch/nn/modules/module.html#Module) of `nn.Module` we can see that there are certain attributes needed in order for the model to work. In the case of `Dataset`, there is no `__init__` function, so  `super` is not needed.
269 | 
270 | In addition, `forward` is also by default not implemented, so we need to override it with our own forward propagation function. 
271 | 
272 | ### Example
273 | 
274 | A full example of a MNIST classifier: [Link](https://github.com/hyes92121/ml-tutorial/blob/master/pytorch/mnist_pytorch.ipynb)
275 | 
276 | 


--------------------------------------------------------------------------------