├── README.md
├── _config.yml
└── index.md
/README.md:
--------------------------------------------------------------------------------
1 | # 2019 Spring ML HW1 - Predict PM2.5 手把手教學
2 |
3 | [手把手投影片](https://docs.google.com/presentation/u/2/d/1TkPQoOPyDY9IzzuaVsYq1E26D1NTmi_QA9S9c-rw9K8/edit#slide=id.g5047f99cc6_0_0)
4 |
5 |
Import package
6 |
7 | ```python
8 | import sys
9 | import numpy as np
10 | import pandas as pd
11 | import csv
12 | ```
13 |
14 | Read in training set
15 |
16 | ```python
17 | raw_data = np.genfromtxt(sys.argv[1], delimiter=',') ## train.csv
18 | data = raw_data[1:,3:]
19 | where_are_NaNs = np.isnan(data)
20 | data[where_are_NaNs] = 0
21 |
22 | month_to_data = {} ## Dictionary (key:month , value:data)
23 |
24 | for month in range(12):
25 | sample = np.empty(shape = (18 , 480))
26 | for day in range(20):
27 | for hour in range(24):
28 | sample[:,day * 24 + hour] = data[18 * (month * 20 + day): 18 * (month * 20 + day + 1),hour]
29 | month_to_data[month] = sample
30 | ```
31 |
32 | Preprocess
33 |
34 | ```python
35 | x = np.empty(shape = (12 * 471 , 18 * 9),dtype = float)
36 | y = np.empty(shape = (12 * 471 , 1),dtype = float)
37 |
38 | for month in range(12):
39 | for day in range(20):
40 | for hour in range(24):
41 | if day == 19 and hour > 14:
42 | continue
43 | x[month * 471 + day * 24 + hour,:] = month_to_data[month][:,day * 24 + hour : day * 24 + hour + 9].reshape(1,-1)
44 | y[month * 471 + day * 24 + hour,0] = month_to_data[month][9 ,day * 24 + hour + 9]
45 | ```
46 |
47 | Normalization
48 |
49 | ```python
50 | mean = np.mean(x, axis = 0)
51 | std = np.std(x, axis = 0)
52 | for i in range(x.shape[0]):
53 | for j in range(x.shape[1]):
54 | if not std[j] == 0 :
55 | x[i][j] = (x[i][j]- mean[j]) / std[j]
56 | ```
57 |
58 | Training
59 |
60 | ```python
61 | dim = x.shape[1] + 1
62 | w = np.zeros(shape = (dim, 1 ))
63 | x = np.concatenate((np.ones((x.shape[0], 1 )), x) , axis = 1).astype(float)
64 | learning_rate = np.array([[200]] * dim)
65 | adagrad_sum = np.zeros(shape = (dim, 1 ))
66 |
67 | for T in range(10000):
68 | if(T % 500 == 0 ):
69 | print("T=",T)
70 | print("Loss:",np.power(np.sum(np.power(x.dot(w) - y, 2 ))/ x.shape[0],0.5))
71 | gradient = (-2) * np.transpose(x).dot(y-x.dot(w))
72 | adagrad_sum += gradient ** 2
73 | w = w - learning_rate * gradient / (np.sqrt(adagrad_sum) + 0.0005)
74 |
75 | np.save('weight.npy',w) ## save weight
76 | ```
77 |
78 | Read in testing set
79 |
80 | ```python
81 | w = np.load('weight.npy') ## load weight
82 | test_raw_data = np.genfromtxt(sys.argv[2], delimiter=',') ## test.csv
83 | test_data = test_raw_data[:, 2: ]
84 | where_are_NaNs = np.isnan(test_data)
85 | test_data[where_are_NaNs] = 0
86 | ```
87 |
88 | Predict
89 |
90 | ```python
91 | test_x = np.empty(shape = (240, 18 * 9),dtype = float)
92 |
93 | for i in range(240):
94 | test_x[i,:] = test_data[18 * i : 18 * (i+1),:].reshape(1,-1)
95 |
96 | for i in range(test_x.shape[0]): ##Normalization
97 | for j in range(test_x.shape[1]):
98 | if not std[j] == 0 :
99 | test_x[i][j] = (test_x[i][j]- mean[j]) / std[j]
100 |
101 | test_x = np.concatenate((np.ones(shape = (test_x.shape[0],1)),test_x),axis = 1).astype(float)
102 | answer = test_x.dot(w)
103 | ```
104 |
105 | Write file
106 |
107 | ```python
108 | f = open(sys.argv[3],"w")
109 | w = csv.writer(f)
110 | title = ['id','value']
111 | w.writerow(title)
112 | for i in range(240):
113 | content = ['id_'+str(i),answer[i][0]]
114 | w.writerow(content)
115 | ```
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-cayman
--------------------------------------------------------------------------------
/index.md:
--------------------------------------------------------------------------------
1 | ## Assignment 1 - Predicting PM 2.5
2 |
3 | You can use the [editor on GitHub](https://github.com/ntumlta2019/hw1/edit/master/index.md) to maintain and preview the content for your website in Markdown files.
4 |
5 | Whenever you commit to this repository, GitHub Pages will run [Jekyll](https://jekyllrb.com/) to rebuild the pages in your site, from the content in your Markdown files.
6 |
7 | ### Markdown
8 |
9 | Markdown is a lightweight and easy-to-use syntax for styling your writing. It includes conventions for
10 |
11 | ```markdown
12 | Syntax highlighted code block
13 |
14 | # Header 1
15 | ## Header 2
16 | ### Header 3
17 |
18 | - Bulleted
19 | - List
20 |
21 | 1. Numbered
22 | 2. List
23 |
24 | **Bold** and _Italic_ and `Code` text
25 |
26 | [Link](url) and 
27 | ```
28 |
29 | For more details see [GitHub Flavored Markdown](https://guides.github.com/features/mastering-markdown/).
30 |
31 | ### Jekyll Themes
32 |
33 | Your Pages site will use the layout and styles from the Jekyll theme you have selected in your [repository settings](https://github.com/ntumlta2019/hw1/settings). The name of this theme is saved in the Jekyll `_config.yml` configuration file.
34 |
35 | ### Support or Contact
36 |
37 | Having trouble with Pages? Check out our [documentation](https://help.github.com/categories/github-pages-basics/) or [contact support](https://github.com/contact) and we’ll help you sort it out.
38 |
--------------------------------------------------------------------------------