├── README.md ├── _config.yml └── index.md /README.md: -------------------------------------------------------------------------------- 1 | # 2019 Spring ML HW1 - Predict PM2.5 手把手教學 2 | 3 | [手把手投影片](https://docs.google.com/presentation/u/2/d/1TkPQoOPyDY9IzzuaVsYq1E26D1NTmi_QA9S9c-rw9K8/edit#slide=id.g5047f99cc6_0_0) 4 | 5 |

Import package

6 | 7 | ```python 8 | import sys 9 | import numpy as np 10 | import pandas as pd 11 | import csv 12 | ``` 13 | 14 |

Read in training set

15 | 16 | ```python 17 | raw_data = np.genfromtxt(sys.argv[1], delimiter=',') ## train.csv 18 | data = raw_data[1:,3:] 19 | where_are_NaNs = np.isnan(data) 20 | data[where_are_NaNs] = 0 21 | 22 | month_to_data = {} ## Dictionary (key:month , value:data) 23 | 24 | for month in range(12): 25 | sample = np.empty(shape = (18 , 480)) 26 | for day in range(20): 27 | for hour in range(24): 28 | sample[:,day * 24 + hour] = data[18 * (month * 20 + day): 18 * (month * 20 + day + 1),hour] 29 | month_to_data[month] = sample 30 | ``` 31 | 32 |

Preprocess

33 | 34 | ```python 35 | x = np.empty(shape = (12 * 471 , 18 * 9),dtype = float) 36 | y = np.empty(shape = (12 * 471 , 1),dtype = float) 37 | 38 | for month in range(12): 39 | for day in range(20): 40 | for hour in range(24): 41 | if day == 19 and hour > 14: 42 | continue 43 | x[month * 471 + day * 24 + hour,:] = month_to_data[month][:,day * 24 + hour : day * 24 + hour + 9].reshape(1,-1) 44 | y[month * 471 + day * 24 + hour,0] = month_to_data[month][9 ,day * 24 + hour + 9] 45 | ``` 46 | 47 |

Normalization

48 | 49 | ```python 50 | mean = np.mean(x, axis = 0) 51 | std = np.std(x, axis = 0) 52 | for i in range(x.shape[0]): 53 | for j in range(x.shape[1]): 54 | if not std[j] == 0 : 55 | x[i][j] = (x[i][j]- mean[j]) / std[j] 56 | ``` 57 | 58 |

Training

59 | 60 | ```python 61 | dim = x.shape[1] + 1 62 | w = np.zeros(shape = (dim, 1 )) 63 | x = np.concatenate((np.ones((x.shape[0], 1 )), x) , axis = 1).astype(float) 64 | learning_rate = np.array([[200]] * dim) 65 | adagrad_sum = np.zeros(shape = (dim, 1 )) 66 | 67 | for T in range(10000): 68 | if(T % 500 == 0 ): 69 | print("T=",T) 70 | print("Loss:",np.power(np.sum(np.power(x.dot(w) - y, 2 ))/ x.shape[0],0.5)) 71 | gradient = (-2) * np.transpose(x).dot(y-x.dot(w)) 72 | adagrad_sum += gradient ** 2 73 | w = w - learning_rate * gradient / (np.sqrt(adagrad_sum) + 0.0005) 74 | 75 | np.save('weight.npy',w) ## save weight 76 | ``` 77 | 78 |

Read in testing set

79 | 80 | ```python 81 | w = np.load('weight.npy') ## load weight 82 | test_raw_data = np.genfromtxt(sys.argv[2], delimiter=',') ## test.csv 83 | test_data = test_raw_data[:, 2: ] 84 | where_are_NaNs = np.isnan(test_data) 85 | test_data[where_are_NaNs] = 0 86 | ``` 87 | 88 |

Predict

89 | 90 | ```python 91 | test_x = np.empty(shape = (240, 18 * 9),dtype = float) 92 | 93 | for i in range(240): 94 | test_x[i,:] = test_data[18 * i : 18 * (i+1),:].reshape(1,-1) 95 | 96 | for i in range(test_x.shape[0]): ##Normalization 97 | for j in range(test_x.shape[1]): 98 | if not std[j] == 0 : 99 | test_x[i][j] = (test_x[i][j]- mean[j]) / std[j] 100 | 101 | test_x = np.concatenate((np.ones(shape = (test_x.shape[0],1)),test_x),axis = 1).astype(float) 102 | answer = test_x.dot(w) 103 | ``` 104 | 105 |

Write file

106 | 107 | ```python 108 | f = open(sys.argv[3],"w") 109 | w = csv.writer(f) 110 | title = ['id','value'] 111 | w.writerow(title) 112 | for i in range(240): 113 | content = ['id_'+str(i),answer[i][0]] 114 | w.writerow(content) 115 | ``` -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-cayman -------------------------------------------------------------------------------- /index.md: -------------------------------------------------------------------------------- 1 | ## Assignment 1 - Predicting PM 2.5 2 | 3 | You can use the [editor on GitHub](https://github.com/ntumlta2019/hw1/edit/master/index.md) to maintain and preview the content for your website in Markdown files. 4 | 5 | Whenever you commit to this repository, GitHub Pages will run [Jekyll](https://jekyllrb.com/) to rebuild the pages in your site, from the content in your Markdown files. 6 | 7 | ### Markdown 8 | 9 | Markdown is a lightweight and easy-to-use syntax for styling your writing. It includes conventions for 10 | 11 | ```markdown 12 | Syntax highlighted code block 13 | 14 | # Header 1 15 | ## Header 2 16 | ### Header 3 17 | 18 | - Bulleted 19 | - List 20 | 21 | 1. Numbered 22 | 2. List 23 | 24 | **Bold** and _Italic_ and `Code` text 25 | 26 | [Link](url) and ![Image](src) 27 | ``` 28 | 29 | For more details see [GitHub Flavored Markdown](https://guides.github.com/features/mastering-markdown/). 30 | 31 | ### Jekyll Themes 32 | 33 | Your Pages site will use the layout and styles from the Jekyll theme you have selected in your [repository settings](https://github.com/ntumlta2019/hw1/settings). The name of this theme is saved in the Jekyll `_config.yml` configuration file. 34 | 35 | ### Support or Contact 36 | 37 | Having trouble with Pages? Check out our [documentation](https://help.github.com/categories/github-pages-basics/) or [contact support](https://github.com/contact) and we’ll help you sort it out. 38 | --------------------------------------------------------------------------------