├── Feature_Engineering_for_Machine_Learning
├── Feature Engineering for Machine Learning.mmap
├── Feature_Engineering_for_Machine_Learning.html
├── Feature_Engineering_for_Machine_Learning.jpeg
└── README.md
├── README.md
├── outlier_detection
├── README.md
├── outlier detection.ipynb
├── outlier_detection.html
├── pic
│ ├── DBSCAN1.png
│ ├── DBSCAN2.png
│ ├── DBSCAN3.png
│ ├── LOF1.png
│ ├── LOF2.png
│ ├── LOF3.png
│ ├── isolation_forest1.png
│ ├── isolation_forest2.png
│ ├── isolation_forest3.png
│ ├── oneclassSVM.png
│ ├── oneclassSVM2.png
│ └── oneclassSVM3.png
└── test.pkl
├── python_package_time_series
├── image
│ └── image.png
└── 时间序列.md
├── zhou_zhihua_minmap
├── 机器学习——周志华 (2).html
└── 机器学习——周志华.mmap
└── 算法图解.html
/Feature_Engineering_for_Machine_Learning/Feature Engineering for Machine Learning.mmap:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/Feature_Engineering_for_Machine_Learning/Feature Engineering for Machine Learning.mmap
--------------------------------------------------------------------------------
/Feature_Engineering_for_Machine_Learning/Feature_Engineering_for_Machine_Learning.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/Feature_Engineering_for_Machine_Learning/Feature_Engineering_for_Machine_Learning.jpeg
--------------------------------------------------------------------------------
/Feature_Engineering_for_Machine_Learning/README.md:
--------------------------------------------------------------------------------
1 | ## 《Feature Engineering for Machine Learning》
2 |
3 | 简述:该书对常见的特征处理方式进行了梳理,可以作为特征处理的入门读物。
4 |
5 | **注意:** 书中关于TF-IDF的公式以及PCA的证明部分可能有误。
6 |
7 | [思维导图jpg]( https://github.com/cjn-chen/machine_learn_reading_notes/tree/master/Feature_Engineering_for_Machine_Learning/Feature_Engineering_for_Machine_Learning.jpeg)
8 |
9 | [思维导图html]( https://github.com/cjn-chen/machine_learn_reading_notes/tree/master/Feature_Engineering_for_Machine_Learning/Feature_Engineering_for_Machine_Learning.html)H5格式阅读体验更好。
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # machine_learn_reading_notes
2 | 关于机器学习的一些读书笔记,思维导图
3 |
4 | [TOC]
5 |
6 | ## 特征工程相关
7 | - 《Feature Engineering for Machine Learning》 [思维导图]( https://github.com/cjn-chen/machine_learn_reading_notes/tree/master/Feature_Engineering_for_Machine_Learning)
8 | - 去除outlier [代码及演示](https://github.com/cjn-chen/machine_learn_reading_notes/tree/master/outlier_detection)
9 |
10 |
11 | ## 机器学习相关
12 | [周志华 机器学习](https://github.com/cjn-chen/machine_learn_reading_notes/tree/master/zhou_zhihua_minmap)
13 |
14 | ## 算法相关
15 | [算法图解](https://github.com/cjn-chen/machine_learn_reading_notes)
16 |
17 | ## 工具
18 |
19 | [时间序列常用的python包](https://github.com/cjn-chen/machine_learn_reading_notes/blob/master/python_package_time_series/%E6%97%B6%E9%97%B4%E5%BA%8F%E5%88%97.md)
20 |
21 |
--------------------------------------------------------------------------------
/outlier_detection/README.md:
--------------------------------------------------------------------------------
1 | # outlier detection异常点识别方法
2 |
3 | ## 1. isolation forest 孤立森林
4 |
5 | ### 1.1 测试样本示例
6 |
7 | 文件 [test.pkl](https://github.com/cjn-chen/machine_learn_reading_notes/blob/master/outliner_detection/test.pkl)
8 |
9 | | index | x | y |
10 | | ----- | -------- | -------- |
11 | | 0 | -2.24055 | -2.21173 |
12 | | 1 | -1.66227 | -1.79528 |
13 | | 2 | -1.65948 | -1.58545 |
14 | | 3 | -1.65629 | -1.59716 |
15 | | 4 | -1.64114 | -1.82453 |
16 |
17 | ### 1.2 孤立森林 demo
18 |
19 | **孤立森林原理**
20 |
21 | 通过对特征进行随机划分,建立随机森林,将经过较少次数进行划分就可以划分出来的点认为时异常点。
22 |
23 | ```python
24 | # 参考https://blog.csdn.net/ye1215172385/article/details/79762317
25 | # 官方例子https://scikit-learn.org/stable/auto_examples/ensemble/plot_isolation_forest.html#sphx-glr-auto-examples-ensemble-plot-isolation-forest-py
26 | import numpy as np
27 | import matplotlib.pyplot as plt
28 | from sklearn.ensemble import IsolationForest
29 |
30 | rng = np.random.RandomState(42)
31 |
32 | # 构造训练样本
33 | n_samples = 200 #样本总数
34 | outliers_fraction = 0.25 #异常样本比例
35 | n_inliers = int((1. - outliers_fraction) * n_samples)
36 | n_outliers = int(outliers_fraction * n_samples)
37 |
38 | X = 0.3 * rng.randn(n_inliers // 2, 2)
39 | X_train = np.r_[X + 2, X - 2] #正常样本
40 | X_train = np.r_[X_train, np.random.uniform(low=-6, high=6, size=(n_outliers, 2))] #正常样本加上异常样本
41 |
42 | # 构造模型并拟合
43 | clf = IsolationForest(max_samples=n_samples, random_state=rng, contamination=outliers_fraction)
44 | clf.fit(X_train)
45 | # 计算得分并设置阈值
46 | scores_pred = clf.decision_function(X_train)
47 | threshold = np.percentile(scores_pred, 100 * outliers_fraction) #根据训练样本中异常样本比例,得到阈值,用于绘图
48 |
49 | # plot the line, the samples, and the nearest vectors to the plane
50 | xx, yy = np.meshgrid(np.linspace(-7, 7, 50), np.linspace(-7, 7, 50))
51 | Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
52 | Z = Z.reshape(xx.shape)
53 |
54 | plt.title("IsolationForest")
55 | # plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)
56 | plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), threshold, 7), cmap=plt.cm.Blues_r) #绘制异常点区域,值从最小的到阈值的那部分
57 | a = plt.contour(xx, yy, Z, levels=[threshold], linewidths=2, colors='red') #绘制异常点区域和正常点区域的边界
58 | plt.contourf(xx, yy, Z, levels=[threshold, Z.max()], colors='palevioletred') #绘制正常点区域,值从阈值到最大的那部分
59 |
60 | b = plt.scatter(X_train[:-n_outliers, 0], X_train[:-n_outliers, 1], c='white',
61 | s=20, edgecolor='k')
62 | c = plt.scatter(X_train[-n_outliers:, 0], X_train[-n_outliers:, 1], c='black',
63 | s=20, edgecolor='k')
64 | plt.axis('tight')
65 | plt.xlim((-7, 7))
66 | plt.ylim((-7, 7))
67 | plt.legend([a.collections[0], b, c],
68 | ['learned decision function', 'true inliers', 'true outliers'],
69 | loc="upper left")
70 | plt.show()
71 | ```
72 |
73 | 
74 |
75 | ### 1.3 自己修改的,X_train能够改成自己需要的数据
76 |
77 | 此处没有进行标准化,可以先进行标准化再在标准化的基础上去除异常点, from sklearn.preprocessing import StandardScaler
78 |
79 | ```python
80 | import numpy as np
81 | import matplotlib.pyplot as plt
82 | from sklearn.ensemble import IsolationForest
83 | from scipy import stats
84 |
85 | rng = np.random.RandomState(42)
86 |
87 |
88 | X_train = X_train_demo.values
89 | outliers_fraction = 0.1
90 | n_samples = 500
91 | # 构造模型并拟合
92 | clf = IsolationForest(max_samples=n_samples, random_state=rng, contamination=outliers_fraction)
93 | clf.fit(X_train)
94 | # 计算得分并设置阈值
95 | scores_pred = clf.decision_function(X_train)
96 | threshold = stats.scoreatpercentile(scores_pred, 100 * outliers_fraction) #根据训练样本中异常样本比例,得到阈值,用于绘图
97 |
98 | # plot the line, the samples, and the nearest vectors to the plane
99 | range_max_min0 = (X_train[:,0].max()-X_train[:,0].min())*0.2
100 | range_max_min1 = (X_train[:,1].max()-X_train[:,1].min())*0.2
101 | xx, yy = np.meshgrid(np.linspace(X_train[:,0].min()-range_max_min0, X_train[:,0].max()+range_max_min0, 500),
102 | np.linspace(X_train[:,1].min()-range_max_min1, X_train[:,1].max()+range_max_min1, 500))
103 | Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
104 | Z = Z.reshape(xx.shape)
105 |
106 | plt.title("IsolationForest")
107 | # plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)
108 | plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), threshold, 7), cmap=plt.cm.Blues_r) #绘制异常点区域,值从最小的到阈值的那部分
109 | a = plt.contour(xx, yy, Z, levels=[threshold], linewidths=2, colors='red') #绘制异常点区域和正常点区域的边界
110 | plt.contourf(xx, yy, Z, levels=[threshold, Z.max()], colors='palevioletred') #绘制正常点区域,值从阈值到最大的那部分
111 |
112 |
113 | is_in = clf.predict(X_train)>0
114 | b = plt.scatter(X_train[is_in, 0], X_train[is_in, 1], c='white',
115 | s=20, edgecolor='k')
116 | c = plt.scatter(X_train[~is_in, 0], X_train[~is_in, 1], c='black',
117 | s=20, edgecolor='k')
118 | plt.axis('tight')
119 | plt.xlim((X_train[:,0].min()-range_max_min0, X_train[:,0].max()+range_max_min0,))
120 | plt.ylim((X_train[:,1].min()-range_max_min1, X_train[:,1].max()+range_max_min1,))
121 | plt.legend([a.collections[0], b, c],
122 | ['learned decision function', 'inliers', 'outliers'],
123 | loc="upper left")
124 | plt.show()
125 | ```
126 |
127 | 
128 |
129 | ### 1.4 核心代码
130 |
131 | #### 1.4.1 示例样本
132 |
133 | ```python
134 | import numpy as np
135 | # 构造训练样本
136 | n_samples = 200 #样本总数
137 | outliers_fraction = 0.25 #异常样本比例
138 | n_inliers = int((1. - outliers_fraction) * n_samples)
139 | n_outliers = int(outliers_fraction * n_samples)
140 |
141 | X = 0.3 * rng.randn(n_inliers // 2, 2)
142 | X_train = np.r_[X + 2, X - 2] #正常样本
143 | X_train = np.r_[X_train, np.random.uniform(low=-6, high=6, size=(n_outliers, 2))] #正常样本加上异常样本
144 | ```
145 |
146 | #### 1.4.2 核心代码实现
147 |
148 | **clf = IsolationForest(max_samples=0.8, contamination=0.25)**
149 |
150 | ```python
151 | from sklearn.ensemble import IsolationForest
152 | # fit the model
153 | # max_samples 构造一棵树使用的样本数,输入大于1的整数则使用该数字作为构造的最大样本数目,
154 | # 如果数字属于(0,1]则使用该比例的数字作为构造iforest
155 | # outliers_fraction 多少比例的样本可以作为异常值
156 | clf = IsolationForest(max_samples=0.8, contamination=0.25)
157 | clf.fit(X_train)
158 | # y_pred_train = clf.predict(X_train)
159 | scores_pred = clf.decision_function(X_train)
160 | threshold = np.percentile(scores_pred, 100 * outliers_fraction) #根据训练样本中异常样本比例,得到阈值,用于绘图
161 |
162 | ## 以下两种方法的筛选结果,完全相同
163 | X_train_predict1 = X_train[clf.predict(X_train)==1]
164 | X_train_predict2 = X_train[scores_pred>=threshold,:]
165 |
166 | ```
167 |
168 | ```python
169 | # 其中,1的表示非异常点,-1的表示为异常点
170 | clf.predict(X_train)
171 | array([ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
172 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
173 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
174 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1,
175 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
176 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
177 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
178 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
179 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1,
180 | -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
181 | -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
182 | -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1])
183 | ```
184 |
185 | ## 2. DBSCAN
186 |
187 | **DBSCAN(Density-Based Spatial Clustering of Applications with Noise) 原理**
188 |
189 | 以每个点为中心,设定邻域及邻域内需要有多少个点,如果样本点大于指定要求,则认为该点与邻域内的点属于同一类,如果小于指定值,若该点位于其它点的邻域内,则属于边界点。
190 |
191 | 设定两个参数,eps表示聚类点为中心划定邻域,min_samples表示每个邻域内需要多少个样本点。
192 |
193 | 
194 |
195 | ### 2.1 DBSCAN demo
196 |
197 | ```python
198 | # 参考https://blog.csdn.net/hb707934728/article/details/71515160
199 | #
200 | # 官方示例 https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py
201 |
202 | import numpy as np
203 |
204 | import matplotlib.pyplot as plt
205 | import matplotlib.colors
206 |
207 | import sklearn.datasets as ds
208 | from sklearn.cluster import DBSCAN
209 | from sklearn.preprocessing import StandardScaler
210 |
211 |
212 | def expand(a, b):
213 | d = (b - a) * 0.1
214 | return a-d, b+d
215 |
216 |
217 | if __name__ == "__main__":
218 | N = 1000
219 | centers = [[1, 2], [-1, -1], [1, -1], [-1, 1]]
220 | #scikit中的make_blobs方法常被用来生成聚类算法的测试数据,直观地说,make_blobs会根据用户指定的特征数量、
221 | # 中心点数量、范围等来生成几类数据,这些数据可用于测试聚类算法的效果。
222 | #函数原型:sklearn.datasets.make_blobs(n_samples=100, n_features=2,
223 | # centers=3, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None)[source]
224 | #参数解析:
225 | # n_samples是待生成的样本的总数。
226 | #
227 | # n_features是每个样本的特征数。
228 | #
229 | # centers表示类别数。
230 | #
231 | # cluster_std表示每个类别的方差,例如我们希望生成2类数据,其中一类比另一类具有更大的方差,可以将cluster_std设置为[1.0, 3.0]。
232 | data, y = ds.make_blobs(N, n_features=2, centers=centers, cluster_std=[0.5, 0.25, 0.7, 0.5], random_state=0)
233 | data = StandardScaler().fit_transform(data)
234 | # 数据1的参数:(epsilon, min_sample)
235 | params = ((0.2, 5), (0.2, 10), (0.2, 15), (0.3, 5), (0.3, 10), (0.3, 15))
236 |
237 | plt.figure(figsize=(12, 8), facecolor='w')
238 | plt.suptitle(u'DBSCAN clustering', fontsize=20)
239 |
240 | for i in range(6):
241 | eps, min_samples = params[i]
242 | #参数含义:
243 | #eps:半径,表示以给定点P为中心的圆形邻域的范围
244 | #min_samples:以点P为中心的邻域内最少点的数量
245 | #如果满足,以点P为中心,半径为EPS的邻域内点的个数不少于MinPts,则称点P为核心点
246 | model = DBSCAN(eps=eps, min_samples=min_samples)
247 | model.fit(data)
248 | y_hat = model.labels_
249 |
250 | core_indices = np.zeros_like(y_hat, dtype=bool) # 生成数据类型和数据shape和指定array一致的变量
251 | core_indices[model.core_sample_indices_] = True # model.core_sample_indices_ border point位于y_hat中的下标
252 |
253 | # 统计总共有积累,其中为-1的为未分类样本
254 | y_unique = np.unique(y_hat)
255 | n_clusters = y_unique.size - (1 if -1 in y_hat else 0)
256 | print (y_unique, '聚类簇的个数为:', n_clusters)
257 |
258 | plt.subplot(2, 3, i+1) # 对第几个图绘制,2行3列,绘制第i+1个图
259 | # plt.cm.spectral https://blog.csdn.net/robin_Xu_shuai/article/details/79178857
260 | clrs = plt.cm.Spectral(np.linspace(0, 0.8, y_unique.size)) #用于给画图灰色
261 | for k, clr in zip(y_unique, clrs):
262 | cur = (y_hat == k)
263 | if k == -1:
264 | # 用于绘制未分类样本
265 | plt.scatter(data[cur, 0], data[cur, 1], s=20, c='k')
266 | continue
267 | # 绘制正常节点
268 | plt.scatter(data[cur, 0], data[cur, 1], s=30, c=clr, edgecolors='k')
269 | # 绘制边缘点
270 | plt.scatter(data[cur & core_indices][:, 0], data[cur & core_indices][:, 1], s=60, c=clr, marker='o', edgecolors='k')
271 | x1_min, x2_min = np.min(data, axis=0)
272 | x1_max, x2_max = np.max(data, axis=0)
273 | x1_min, x1_max = expand(x1_min, x1_max)
274 | x2_min, x2_max = expand(x2_min, x2_max)
275 | plt.xlim((x1_min, x1_max))
276 | plt.ylim((x2_min, x2_max))
277 | plt.grid(True)
278 | plt.title(u'$\epsilon$ = %.1f m = %d clustering num %d'%(eps, min_samples, n_clusters), fontsize=16)
279 | plt.tight_layout()
280 | plt.subplots_adjust(top=0.9)
281 | plt.show()
282 | ```
283 |
284 | ```python
285 | [-1 0 1 2 3] 聚类簇的个数为: 4
286 | [-1 0 1 2 3] 聚类簇的个数为: 4
287 | [-1 0 1 2 3 4] 聚类簇的个数为: 5
288 | [-1 0] 聚类簇的个数为: 1
289 | [-1 0 1] 聚类簇的个数为: 2
290 | [-1 0 1 2 3] 聚类簇的个数为: 4
291 | ```
292 |
293 | 
294 |
295 | ### 2.2 使用自定义测试样例
296 |
297 | ```python
298 | #
299 | # 参考https://blog.csdn.net/hb707934728/article/details/71515160
300 | #
301 | # 官方示例 https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py
302 |
303 | import numpy as np
304 |
305 | import matplotlib.pyplot as plt
306 | import matplotlib.colors
307 |
308 | import sklearn.datasets as ds
309 | from sklearn.cluster import DBSCAN
310 | from sklearn.preprocessing import StandardScaler
311 |
312 |
313 | def expand(a, b):
314 | d = (b - a) * 0.1
315 | return a-d, b+d
316 |
317 |
318 | if __name__ == "__main__":
319 | N = 1000
320 | data = X_train_demo.values
321 | # 数据1的参数:(epsilon, min_sample)
322 | params = ((0.2, 5), (0.2, 10), (0.2, 15), (0.2, 20), (0.2, 25), (0.2, 30))
323 |
324 | plt.figure(figsize=(12, 8), facecolor='w')
325 | plt.suptitle(u'DBSCAN clustering', fontsize=20)
326 |
327 | for i in range(6):
328 | eps, min_samples = params[i]
329 | #参数含义:
330 | #eps:半径,表示以给定点P为中心的圆形邻域的范围
331 | #min_samples:以点P为中心的邻域内最少点的数量
332 | #如果满足,以点P为中心,半径为EPS的邻域内点的个数不少于MinPts,则称点P为核心点
333 | model = DBSCAN(eps=eps, min_samples=min_samples)
334 | model.fit(data)
335 | y_hat = model.labels_
336 |
337 | core_indices = np.zeros_like(y_hat, dtype=bool) # 生成数据类型和数据shape和指定array一致的变量
338 | core_indices[model.core_sample_indices_] = True # model.core_sample_indices_ border point位于y_hat中的下标
339 |
340 | # 统计总共有积累,其中为-1的为未分类样本
341 | y_unique = np.unique(y_hat)
342 | n_clusters = y_unique.size - (1 if -1 in y_hat else 0)
343 | print (y_unique, '聚类簇的个数为:', n_clusters)
344 |
345 | plt.subplot(2, 3, i+1) # 对第几个图绘制,2行3列,绘制第i+1个图
346 | # plt.cm.spectral https://blog.csdn.net/robin_Xu_shuai/article/details/79178857
347 | clrs = plt.cm.Spectral(np.linspace(0, 0.8, y_unique.size)) #用于给画图灰色
348 | for k, clr in zip(y_unique, clrs):
349 | cur = (y_hat == k)
350 | if k == -1:
351 | # 用于绘制未分类样本
352 | plt.scatter(data[cur, 0], data[cur, 1], s=20, c='k')
353 | continue
354 | # 绘制正常节点
355 | plt.scatter(data[cur, 0], data[cur, 1], s=30, c=clr, edgecolors='k')
356 | # 绘制边缘点
357 | plt.scatter(data[cur & core_indices][:, 0], data[cur & core_indices][:, 1], s=60, c=clr, marker='o', edgecolors='k')
358 | x1_min, x2_min = np.min(data, axis=0)
359 | x1_max, x2_max = np.max(data, axis=0)
360 | x1_min, x1_max = expand(x1_min, x1_max)
361 | x2_min, x2_max = expand(x2_min, x2_max)
362 | plt.xlim((x1_min, x1_max))
363 | plt.ylim((x2_min, x2_max))
364 | plt.grid(True)
365 | plt.title(u'$\epsilon$ = %.1f m = %d clustering num %d'%(eps, min_samples, n_clusters), fontsize=14)
366 | plt.tight_layout()
367 | plt.subplots_adjust(top=0.9)
368 | plt.show()
369 | ```
370 |
371 | 
372 |
373 | **注意:**可以看到在测试样例的两端,相比与孤立森林,DBSCAN能够很好地对“尖端”处的样本的分类。
374 |
375 | ### 2.3 核心代码
376 |
377 | **model = DBSCAN(eps=eps, min_samples=min_samples) # 构造分类器**
378 |
379 | ```python
380 | from sklearn.cluster import DBSCAN
381 | from sklearn import metrics
382 | data = X_train_demo.values
383 | eps, min_samples = 0.2, 10
384 | # eps为领域的大小,min_samples为领域内最小点的个数
385 | model = DBSCAN(eps=eps, min_samples=min_samples) # 构造分类器
386 | model.fit(data) # 拟合
387 | labels = model.labels_ # 获取类别标签,-1表示未分类
388 | # 获取其中的core points
389 | core_indices = np.zeros_like(labels, dtype=bool) # 生成数据类型和数据shape和指定array一致的变量
390 | core_indices[model.core_sample_indices_] = True # model.core_sample_indices_ border point位于labels中的下标
391 | core_point = data[core_indices]
392 | # 获取非异常点
393 | normal_point = data[labels>=0]
394 | ```
395 |
396 | ```python
397 | # 绘制剔除了异常值后的图
398 | plt.scatter(normal_point[:,0],normal_point[:,1],edgecolors='k')
399 | plt.show()
400 | ```
401 |
402 | 
403 |
404 | ### 2.4 构造过滤函数
405 |
406 | 该函数先进行了标准化,方便使用固定的参数进行分析
407 |
408 | #### 2.4.1 过滤函数
409 |
410 | ```python
411 | def filter_data(data0, params):
412 | from sklearn.cluster import DBSCAN
413 | from sklearn import metrics
414 | scaler = StandardScaler()
415 | scaler.fit(data0)
416 | data = scaler.transform(data0)
417 |
418 | eps, min_samples = params
419 | # eps为领域的大小,min_samples为领域内最小点的个数
420 | model = DBSCAN(eps=eps, min_samples=min_samples) # 构造分类器
421 | model.fit(data) # 拟合
422 | labels = model.labels_ # 获取类别标签,-1表示未分类
423 | # 获取其中的core points
424 | core_indices = np.zeros_like(labels, dtype=bool) # 生成数据类型和数据shape和指定array一致的变量
425 | core_indices[model.core_sample_indices_] = True # model.core_sample_indices_ border point位于labels中的下标
426 | core_point = data[core_indices]
427 | # 获取非异常点
428 | normal_point = data0[labels>=0]
429 | return normal_point
430 | ```
431 |
432 |
433 |
434 | #### 2.4.2 衡量分类结果
435 |
436 | $$
437 | {\displaystyle a(i)={\frac {1}{|C_{i}|-1}}\sum _{j\in C_{i},i\neq j}d(i,j)},样本点i(向量)在簇内与其它所有样本点j(向量)的平均距离
438 | $$
439 |
440 | $$
441 | {\displaystyle b(i)=\min _{k\neq i}{\frac {1}{|C_{k}|}}\sum _{j\in C_{k}, i\in C_{i}}d(i,j)},样本点i与其它簇的距离,定义为i与其它簇各个点的平均距离,取其中的最小值为b(i)
442 | $$
443 |
444 | $$
445 | {\displaystyle s(i)={\frac {b(i)-a(i)}{\max\{a(i),b(i)\}}}}$ if ${\displaystyle |C_{i}|>1},越大越好
446 | $$
447 |
448 | $$
449 | {\displaystyle s(i)=0} if {\displaystyle |C_{i}|=1}
450 | $$
451 |
452 | ```python
453 | # 轮廓系数
454 | metrics.silhouette_score(data, labels, metric='euclidean')
455 | [out]0.13250260550638607
456 | ```
457 |
458 | ```python
459 | # Calinski-Harabaz Index 系数
460 | metrics.calinski_harabaz_score(data, labels,)
461 | [out]16.414158842632794
462 | ```
463 |
464 | ### 3. OneClassSVM
465 |
466 | ```python
467 | # reference:https://scikit-learn.org/stable/auto_examples/svm/plot_oneclass.html#sphx-glr-auto-examples-svm-plot-oneclass-py
468 |
469 | import numpy as np
470 | import matplotlib.pyplot as plt
471 | import matplotlib.font_manager
472 | from sklearn import svm
473 |
474 | xx, yy = np.meshgrid(np.linspace(-5, 5, 500), np.linspace(-5, 5, 500))
475 | # Generate train data
476 | X = 0.3 * np.random.randn(100, 2)
477 | X_train = np.r_[X + 2, X - 2]
478 | # Generate some regular novel observations
479 | X = 0.3 * np.random.randn(20, 2)
480 | X_test = np.r_[X + 2, X - 2]
481 | # Generate some abnormal novel observations
482 | X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))
483 |
484 | # fit the model
485 | clf = svm.OneClassSVM(nu=0.1, kernel="rbf", gamma=0.1)
486 | clf.fit(X_train)
487 | y_pred_train = clf.predict(X_train)
488 | y_pred_test = clf.predict(X_test)
489 | y_pred_outliers = clf.predict(X_outliers)
490 | n_error_train = y_pred_train[y_pred_train == -1].size
491 | n_error_test = y_pred_test[y_pred_test == -1].size
492 | n_error_outliers = y_pred_outliers[y_pred_outliers == 1].size
493 |
494 | # plot the line, the points, and the nearest vectors to the plane
495 | Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
496 | Z = Z.reshape(xx.shape)
497 |
498 | plt.title("Novelty Detection")
499 | plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
500 | a = plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
501 | plt.contourf(xx, yy, Z, levels=[0, Z.max()], colors='palevioletred')
502 |
503 | s = 40
504 | b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c='white', s=s, edgecolors='k')
505 | b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c='blueviolet', s=s,
506 | edgecolors='k')
507 | c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='gold', s=s,
508 | edgecolors='k')
509 | plt.axis('tight')
510 | plt.xlim((-5, 5))
511 | plt.ylim((-5, 5))
512 | plt.legend([a.collections[0], b1, b2, c],
513 | ["learned frontier", "training observations",
514 | "new regular observations", "new abnormal observations"],
515 | loc="upper left",
516 | prop=matplotlib.font_manager.FontProperties(size=11))
517 | plt.xlabel(
518 | "error train: %d/200 ; errors novel regular: %d/40 ; "
519 | "errors novel abnormal: %d/40"
520 | % (n_error_train, n_error_test, n_error_outliers))
521 | plt.show()
522 | ```
523 |
524 | 
525 |
526 | ### 3.2 核心代码
527 |
528 | ```python
529 | from sklearn import svm
530 | X_train = X_train_demo.values
531 | # 构造分类器
532 | clf = svm.OneClassSVM(nu=0.2, kernel="rbf", gamma=0.2)
533 | clf.fit(X_train)
534 | # 预测,结果为-1或者1
535 | labels = clf.predict(X_train)
536 | # 分类分数
537 | score = clf.decision_function(X_train) # 获取置信度
538 | # 获取正常点
539 | X_train_normal = X_train[labels>0]
540 | ```
541 |
542 | #### 进行剔除异常点之前
543 |
544 | 
545 |
546 | #### 剔除异常点之后
547 |
548 | ```python
549 | plt.scatter(X_train_normal[:,0],X_train_normal[:,1])
550 | plt.show()
551 | ```
552 |
553 | 
554 |
555 | ## 4. Local Outlier Factor(LOF)
556 |
557 | LOF通过计算一个数值score来反映一个样本的异常程度。 这个数值的大致意思是:
558 |
559 | **一个样本点周围的样本点所处位置的平均密度比上该样本点所在位置的密度。比值越大于1,则该点所在位置的密度越小于其周围样本所在位置的密度。**
560 |
561 | ```python
562 | #
563 | # 参考https://blog.csdn.net/hb707934728/article/details/71515160
564 | #
565 | # 官方示例 https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py
566 |
567 | import numpy as np
568 |
569 | import matplotlib.pyplot as plt
570 | import matplotlib.colors
571 |
572 | from sklearn.neighbors import LocalOutlierFactor
573 |
574 | def expand(a, b):
575 | d = (b - a) * 0.1
576 | return a-d, b+d
577 |
578 |
579 | if __name__ == "__main__":
580 | N = 1000
581 | data = X_train_demo.values
582 | # 数据1的参数:(epsilon, min_sample)
583 | params = ((0.01, 5), (0.05, 10), (0.1, 15), (0.15, 20), (0.2, 25), (0.25, 30))
584 |
585 | plt.figure(figsize=(12, 8), facecolor='w')
586 | plt.suptitle(u'DBSCAN clustering', fontsize=20)
587 |
588 | for i in range(6):
589 | outliers_fraction, min_samples = params[i]
590 | #参数含义:
591 | #eps:半径,表示以给定点P为中心的圆形邻域的范围
592 | #min_samples:以点P为中心的邻域内最少点的数量
593 | #如果满足,以点P为中心,半径为EPS的邻域内点的个数不少于MinPts,则称点P为核心点
594 |
595 | model = LocalOutlierFactor(n_neighbors=min_samples, contamination=outliers_fraction)
596 | y_hat = model.fit_predict(X_train)
597 |
598 | # 统计总共有积累,其中为-1的为未分类样本
599 | y_unique = np.unique(y_hat)
600 |
601 | # clrs = []
602 | # for c in np.linspace(16711680, 255, y_unique.size):
603 | # clrs.append('#%06x' % c)
604 | plt.subplot(2, 3, i+1) # 对第几个图绘制,2行3列,绘制第i+1个图
605 | # plt.cm.spectral https://blog.csdn.net/robin_Xu_shuai/article/details/79178857
606 | clrs = plt.cm.Spectral(np.linspace(0, 0.8, y_unique.size)) #用于给画图灰色
607 | for k, clr in zip(y_unique, clrs):
608 | cur = (y_hat == k)
609 | if k == -1:
610 | # 用于绘制未分类样本
611 | plt.scatter(data[cur, 0], data[cur, 1], s=20, c='k')
612 | continue
613 | # 绘制正常节点
614 | plt.scatter(data[cur, 0], data[cur, 1], s=30, c=clr, edgecolors='k')
615 | x1_max, x2_max = np.max(data, axis=0)
616 | x1_min, x2_min = np.min(data, axis=0)
617 | x1_min, x1_max = expand(x1_min, x1_max)
618 | x2_min, x2_max = expand(x2_min, x2_max)
619 | plt.xlim((x1_min, x1_max))
620 | plt.ylim((x2_min, x2_max))
621 | plt.grid(True)
622 | plt.title(u'outliers_fraction = %.1f min_samples = %d'%(outliers_fraction, min_samples), fontsize=12)
623 | plt.tight_layout()
624 | plt.subplots_adjust(top=0.9)
625 | plt.show()
626 | ```
627 | 
628 |
629 | ### 4.1 核心代码
630 |
631 | ```python
632 | from sklearn.neighbors import LocalOutlierFactor
633 | X_train = X_train_demo.values
634 | # 构造分类器
635 | ## 25个样本点为一组,异常值点比例为0.2
636 | clf = LocalOutlierFactor(n_neighbors=25, contamination=0.2)
637 | # 预测,结果为-1或者1
638 | labels = clf.fit_predict(X_train)
639 | # 获取正常点
640 | X_train_normal = X_train[labels>0]
641 | ```
642 | #### 进行剔除异常点之前
643 | ```python
644 | plt.scatter(X_train[:,0],X_train[:,1])
645 | plt.show()
646 | ```
647 |
648 | 
649 |
650 | #### 剔除异常点之后
651 |
652 | ```python
653 | plt.scatter(X_train_normal[:,0],X_train_normal[:,1])
654 | plt.show()
655 | ```
656 |
657 | 
658 |
659 |
660 |
661 | ## 参考引用
662 |
663 | [1] https://scikit-learn.org/stable/auto_examples/ensemble/plot_isolation_forest.html#sphx-glr-auto-examples-ensemble-plot-isolation-forest-py 孤立森林官方示例
664 |
665 | [2] https://blog.csdn.net/ye1215172385/article/details/79762317 孤立森林示例
666 |
667 | [3] https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py DBSCAN官方示例
668 |
669 | [4] https://blog.csdn.net/hb707934728/article/details/71515160 DBSCAN示例
670 |
671 | [5] https://en.wikipedia.org/wiki/Silhouette_(clustering) 轮廓系数
672 |
673 | [6] https://scikit-learn.org/stable/auto_examples/svm/plot_oneclass.html#sphx-glr-auto-examples-svm-plot-oneclass-py oneclass SVM官方示例
674 |
675 | [7] https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py Local Outlier Factor官方示例
676 |
677 | [8]https://blog.csdn.net/hb707934728/article/details/71515160 LOF示例
678 |
679 |
--------------------------------------------------------------------------------
/outlier_detection/pic/DBSCAN1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/outlier_detection/pic/DBSCAN1.png
--------------------------------------------------------------------------------
/outlier_detection/pic/DBSCAN2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/outlier_detection/pic/DBSCAN2.png
--------------------------------------------------------------------------------
/outlier_detection/pic/DBSCAN3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/outlier_detection/pic/DBSCAN3.png
--------------------------------------------------------------------------------
/outlier_detection/pic/LOF1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/outlier_detection/pic/LOF1.png
--------------------------------------------------------------------------------
/outlier_detection/pic/LOF2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/outlier_detection/pic/LOF2.png
--------------------------------------------------------------------------------
/outlier_detection/pic/LOF3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/outlier_detection/pic/LOF3.png
--------------------------------------------------------------------------------
/outlier_detection/pic/isolation_forest1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/outlier_detection/pic/isolation_forest1.png
--------------------------------------------------------------------------------
/outlier_detection/pic/isolation_forest2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/outlier_detection/pic/isolation_forest2.png
--------------------------------------------------------------------------------
/outlier_detection/pic/isolation_forest3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/outlier_detection/pic/isolation_forest3.png
--------------------------------------------------------------------------------
/outlier_detection/pic/oneclassSVM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/outlier_detection/pic/oneclassSVM.png
--------------------------------------------------------------------------------
/outlier_detection/pic/oneclassSVM2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/outlier_detection/pic/oneclassSVM2.png
--------------------------------------------------------------------------------
/outlier_detection/pic/oneclassSVM3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/outlier_detection/pic/oneclassSVM3.png
--------------------------------------------------------------------------------
/outlier_detection/test.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/outlier_detection/test.pkl
--------------------------------------------------------------------------------
/python_package_time_series/image/image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/python_package_time_series/image/image.png
--------------------------------------------------------------------------------
/python_package_time_series/时间序列.md:
--------------------------------------------------------------------------------
1 | # 时间序列
2 |
3 | - **参考网址** (资源整合网址)
4 |
5 | [https://zhuanlan.zhihu.com/p/396504683](https://zhuanlan.zhihu.com/p/396504683)
6 |
7 | [https://zhuanlan.zhihu.com/p/385094015](https://zhuanlan.zhihu.com/p/385094015)
8 |
9 | [https://analyticsindiamag.com/top-10-python-tools-for-time-series-analysis/](https://analyticsindiamag.com/top-10-python-tools-for-time-series-analysis/)
10 |
11 | [https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/](https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/)
12 |
13 | [https://github.com/MaxBenChrist/awesome_time_series_in_python](https://github.com/MaxBenChrist/awesome_time_series_in_python)
14 |
15 |
16 | |package name|标签|bref introduction|
17 | |---|---|---|
18 | |[Arrow](#Arrow)|时间格式修改|方便修改时间表示形式|
19 | |[Featuretools](#Featuretools)|自动特征工程|固定特征模板,不一定有业务意义|
20 | |[TSFRESH](#Featuretools)|自动特征工程|自动抽取时间序列特征|
21 | |[PyFlux](#PyFlux)|模型,传统模型|传统的时间序列模型|
22 | |[TimeSynth](#TimeSynth)|模拟数据|用于模拟时间序列的生成|
23 | |[Sktime](#Sktime)|模型,机器学习|类sklearn调佣方式的时间序列处理器,**模型偏机器学习** |
24 | |[Darts](#Darts)|模型,传统模型,深度学习,机器学习|传统模型到深度学习模型都有,|
25 | |[Orbit](#Orbit)|模型,贝叶斯|**贝叶斯模型** ,Uber出品|
26 | |[AutoTS](#AutoTS)|模型,深度学习,AutoML|**自动化** 测试**多种模型** 并给出预测结果,包括深度学习|
27 | |[Prophet](#Prophet)|模型,季节性特征|facebook的开源时间序列处理,适合**有季节性** 的数据|
28 | |[AtsPy](#AtsPy)|模型,深度学习,传统模型,AutoML|自动化实现多个模型,传统及深度学习|
29 | |[kats](#kats)|模型|传统模型及facebook自己的模型,深度学习暂时少|
30 | |[gluon-ts](#gluon-ts)|模型,深度学习|**亚马逊** 的包,基于**MXNET** ,深度学习模型多|
31 | |[AutoGluon](#AutoGluon)|模型,AutoML,深度学习|亚马逊的包,基于MXNET,AutoML,文本、图像、表格数据都可以。|
32 | |[GENDIS](#AutoGluon)|模型,shaplet|shaplet构建距离特征,进行分类|
33 | |[Flow_Forecast](#Flow_Forecast)|模型,深度学习|深度学习为主,pytorch框架,都是常见SOTA模型|
34 | |[pandas-ta](#pandas-ta)|quant,特征工程|**技术指标** 计算,基于ta-lib搭建|
35 | |[PyTorch_Forecasting](#PyTorch_Forecasting)|模型,深度学习|将时间序列的SOTA模型基于pytorch实现|
36 | |[statsmodels](#statsmodels)|模型,传统模型|基于scipy,传统的时间序列模型,模型很全|
37 | |[stumpy](#stumpy)|特征工程|用于构造时间序列的特征,能够为某一个特定的时间序列构造特征向量。|
38 | |[TA-Lib](#TA-Lib)|quant,特征工程|**技术指标** 计算|
39 | |[ta](#TA-Lib)|quant,特征工程|**技术指标** 计算|
40 | |[tslearn](#tslearn)|模型,机器学习|机器学习模型,类似sklearn的处理方法。|
41 | |[tsmoothie](#tsmoothie)|数据预处理,模型|用于对时间序列进行平滑,去除异常点。|
42 | | [tsai](#tsai) | 模型,深度学习 | fastai和Pytorch 的高级封装,深度学习时间序列模型 |
43 |
44 | ## **tsai**
45 |
46 | ### Description
47 |
48 | 时间序列、深度学习相关的模型
49 |
50 | > `tsai` is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series tasks like classification, regression, forecasting, imputation...
51 | > `tsai` is currently under active development by timeseriesAI.
52 |
53 | > ### Available models:
54 | >
55 | > Here's a list with some of the state-of-the-art models available in `tsai`:
56 | >
57 | > - [LSTM](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/RNN.py) (Hochreiter, 1997) ([paper](https://ieeexplore.ieee.org/abstract/document/6795963/))
58 | > - [GRU](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/RNN.py) (Cho, 2014) ([paper](https://arxiv.org/abs/1412.3555))
59 | > - [MLP](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/MLP.py) - Multilayer Perceptron (Wang, 2016) ([paper](https://arxiv.org/abs/1611.06455))
60 | > - [FCN](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/FCN.py) - Fully Convolutional Network (Wang, 2016) ([paper](https://arxiv.org/abs/1611.06455))
61 | > - [ResNet](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/ResNet.py) - Residual Network (Wang, 2016) ([paper](https://arxiv.org/abs/1611.06455))
62 | > - [LSTM-FCN](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/RNN_FCN.py) (Karim, 2017) ([paper](https://arxiv.org/abs/1709.05206))
63 | > - [GRU-FCN](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/RNN_FCN.py) (Elsayed, 2018) ([paper](https://arxiv.org/abs/1812.07683))
64 | > - [mWDN](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/mWDN.py) - Multilevel wavelet decomposition network (Wang, 2018) ([paper](https://dl.acm.org/doi/abs/10.1145/3219819.3220060))
65 | > - [TCN](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/TCN.py) - Temporal Convolutional Network (Bai, 2018) ([paper](https://arxiv.org/abs/1803.01271))
66 | > - [MLSTM-FCN](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/RNN_FCN.py) - Multivariate LSTM-FCN (Karim, 2019) ([paper](https://www.sciencedirect.com/science/article/abs/pii/S0893608019301200))
67 | > - [InceptionTime](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/InceptionTime.py) (Fawaz, 2019) ([paper](https://arxiv.org/abs/1909.04939))
68 | > - [Rocket](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/ROCKET.py) (Dempster, 2019) ([paper](https://arxiv.org/abs/1910.13051))
69 | > - [XceptionTime](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/XceptionTime.py) (Rahimian, 2019) ([paper](https://arxiv.org/abs/1911.03803))
70 | > - [TabModel](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/TabModel.py) - modified from fastai's [TabularModel](https://docs.fast.ai/tabular.model.html#TabularModel)
71 | > - [OmniScale](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/OmniScaleCNN.py) - Omni-Scale 1D-CNN (Tang, 2020) ([paper](https://arxiv.org/abs/2002.10061))
72 | > - [ResCNN](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/ResCNN.py) - 1D-ResCNN (Sun , 2020) ([paper](https://arxiv.org/abs/2010.02803))
73 | > - [TST](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/TST.py) - Time Series Transformer (Zerveas, 2020) ([paper](https://dl.acm.org/doi/abs/10.1145/3447548.3467401))
74 | > - [TabTransformer](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/TabTransformer.py) (Huang, 2020) ([paper](https://arxiv.org/pdf/2012.06678))
75 | > - [XCM](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/XCM.py) - Explainable Convolutional Neural Network) (Fauvel, 2020) ([paper](https://arxiv.org/abs/2005.03645))
76 | > - [MiniRocket](https://github.com/timeseriesAI/tsai/blob/main/tsai/models/MINIROCKET.py) (Dempster, 2021) ([paper](https://arxiv.org/abs/2102.00457))
77 |
78 | ### installation
79 |
80 | ```Bash
81 | #You can install the **latest stable** version from pip using:
82 | pip install tsai
83 |
84 | #Or you can install the cutting edge version of this library from github by doing:
85 | pip install -Uqq git+https://github.com/timeseriesAI/tsai.git
86 |
87 |
88 | #Once the install is complete, you should restart your runtime and then run:
89 | from tsai.all import *
90 | ```
91 |
92 | ### document
93 |
94 | https://github.com/timeseriesAI/tsai
95 |
96 | https://timeseriesai.github.io/tsai/
97 |
98 |
99 |
100 | ## **tsmoothie**
101 |
102 | ### Description
103 |
104 | 用于对时间序列进行平滑,去除异常点。
105 |
106 | > A python library for time-series smoothing and outlier detection in a vectorized way.
107 |
108 |
109 | 
110 |
111 | 数据预处理目的:
112 |
113 | * [Time Series Smoothing for better Clustering](https://towardsdatascience.com/time-series-smoothing-for-better-clustering-121b98f308e8)
114 | * [Time Series Smoothing for better Forecasting](https://towardsdatascience.com/time-series-smoothing-for-better-forecasting-7fbf10428b2)
115 | * [Real-Time Time Series Anomaly Detection](https://towardsdatascience.com/real-time-time-series-anomaly-detection-981cf1e1ca13)
116 | * [Extreme Event Time Series Preprocessing](https://towardsdatascience.com/extreme-event-time-series-preprocessing-90aa59d5630c)
117 | * [Time Series Bootstrap in the age of Deep Learning](https://towardsdatascience.com/time-series-bootstrap-in-the-age-of-deep-learning-b98aa2aa32c4)
118 |
119 | ### installation
120 |
121 | ```Bash
122 | pip install --upgrade tsmoothie
123 | ```
124 |
125 |
126 | ### document
127 |
128 | [https://github.com/cerlymarco/tsmoothie](https://github.com/cerlymarco/tsmoothie)
129 |
130 |
131 | ## **tslearn**
132 |
133 | ### Description
134 |
135 | 传统的模型,类似sklearn的处理方法。
136 |
137 | #### 功能包括:
138 |
139 | |data|processing|clustering|classification|regression|metrics|
140 | |---|---|---|---|---|---|
141 | |[UCR Datasets](https://tslearn.readthedocs.io/en/stable/gen_modules/tslearn.datasets.html#module-tslearn.datasets)|[Scaling](https://tslearn.readthedocs.io/en/stable/gen_modules/tslearn.preprocessing.html#module-tslearn.preprocessing)|[TimeSeriesKMeans](https://tslearn.readthedocs.io/en/stable/gen_modules/clustering/tslearn.clustering.TimeSeriesKMeans.html#tslearn.clustering.TimeSeriesKMeans)|[KNN Classifier](https://tslearn.readthedocs.io/en/stable/gen_modules/neighbors/tslearn.neighbors.KNeighborsTimeSeriesClassifier.html#tslearn.neighbors.KNeighborsTimeSeriesClassifier)|[KNN Regressor](https://tslearn.readthedocs.io/en/stable/gen_modules/neighbors/tslearn.neighbors.KNeighborsTimeSeriesRegressor.html#tslearn.neighbors.KNeighborsTimeSeriesRegressor)|[Dynamic Time Warping](https://tslearn.readthedocs.io/en/stable/gen_modules/metrics/tslearn.metrics.dtw.html#tslearn.metrics.dtw)|
142 | |[Generators](https://tslearn.readthedocs.io/en/stable/gen_modules/tslearn.generators.html#module-tslearn.generators)|[Piecewise](https://tslearn.readthedocs.io/en/stable/gen_modules/tslearn.piecewise.html#module-tslearn.piecewise)|[KShape](https://tslearn.readthedocs.io/en/stable/gen_modules/clustering/tslearn.clustering.KShape.html#tslearn.clustering.KShape)|[TimeSeriesSVC](https://tslearn.readthedocs.io/en/stable/gen_modules/svm/tslearn.svm.TimeSeriesSVC.html#tslearn.svm.TimeSeriesSVC)|[TimeSeriesSVR](https://tslearn.readthedocs.io/en/stable/gen_modules/svm/tslearn.svm.TimeSeriesSVR.html#tslearn.svm.TimeSeriesSVR)|[Global Alignment Kernel](https://tslearn.readthedocs.io/en/stable/gen_modules/metrics/tslearn.metrics.gak.html#tslearn.metrics.gak)|
143 | |Conversion([1](https://tslearn.readthedocs.io/en/stable/gen_modules/tslearn.utils.html#module-tslearn.utils), [2](https://tslearn.readthedocs.io/en/stable/integration_other_software.html))| |[KernelKmeans](https://tslearn.readthedocs.io/en/stable/gen_modules/clustering/tslearn.clustering.KernelKMeans.html#tslearn.clustering.KernelKMeans)|[LearningShapelets](https://tslearn.readthedocs.io/en/stable/gen_modules/shapelets/tslearn.shapelets.LearningShapelets.html)|[MLP](https://tslearn.readthedocs.io/en/stable/gen_modules/tslearn.neural_network.html#module-tslearn.neural_network)|[Barycenters](https://tslearn.readthedocs.io/en/stable/gen_modules/tslearn.barycenters.html#module-tslearn.barycenters)|
144 | | | | |[Early Classification](https://tslearn.readthedocs.io/en/stable/gen_modules/tslearn.early_classification.html#module-tslearn.early_classification)| |[Matrix Profile](https://tslearn.readthedocs.io/en/stable/gen_modules/tslearn.matrix_profile.html#module-tslearn.matrix_profile)|
145 |
146 |
147 |
148 | ### installation
149 |
150 | There are different alternatives to install tslearn:
151 |
152 | * PyPi: `python -m pip install tslearn
`
153 | * Conda: `conda install -c conda-forge tslearn
`
154 | * Git: `python -m pip install https://github.com/tslearn-team/tslearn/archive/main.zip`
155 |
156 | ### document
157 |
158 | [https://github.com/tslearn-team/tslearn](https://github.com/tslearn-team/tslearn)
159 |
160 |
161 | ## **ta**
162 |
163 | ### Description
164 |
165 | 用于技术指标的计算。
166 |
167 | - 可以计算的技术指标
168 |
169 | #### Volume
170 |
171 | - Money Flow Index (MFI)
172 |
173 | - Accumulation/Distribution Index (ADI)
174 |
175 | - On-Balance Volume (OBV)
176 |
177 | - Chaikin Money Flow (CMF)
178 |
179 | - Force Index (FI)
180 |
181 | - Ease of Movement (EoM, EMV)
182 |
183 | - Volume-price Trend (VPT)
184 |
185 | - Negative Volume Index (NVI)
186 |
187 | - Volume Weighted Average Price (VWAP)
188 |
189 | #### Volatility
190 |
191 | - Average True Range (ATR)
192 |
193 | - Bollinger Bands (BB)
194 |
195 | - Keltner Channel (KC)
196 |
197 | - Donchian Channel (DC)
198 |
199 | - Ulcer Index (UI)
200 |
201 | #### Trend
202 |
203 | - Simple Moving Average (SMA)
204 |
205 | - Exponential Moving Average (EMA)
206 |
207 | - Weighted Moving Average (WMA)
208 |
209 | - Moving Average Convergence Divergence (MACD)
210 |
211 | - Average Directional Movement Index (ADX)
212 |
213 | - Vortex Indicator (VI)
214 |
215 | - Trix (TRIX)
216 |
217 | - Mass Index (MI)
218 |
219 | - Commodity Channel Index (CCI)
220 |
221 | - Detrended Price Oscillator (DPO)
222 |
223 | - KST Oscillator (KST)
224 |
225 | - Ichimoku Kinkō Hyō (Ichimoku)
226 |
227 | - Parabolic Stop And Reverse (Parabolic SAR)
228 |
229 | - Schaff Trend Cycle (STC)
230 |
231 | #### Momentum
232 |
233 | - Relative Strength Index (RSI)
234 |
235 | - Stochastic RSI (SRSI)
236 |
237 | - True strength index (TSI)
238 |
239 | - Ultimate Oscillator (UO)
240 |
241 | - Stochastic Oscillator (SR)
242 |
243 | - Williams %R (WR)
244 |
245 | - Awesome Oscillator (AO)
246 |
247 | - Kaufman's Adaptive Moving Average (KAMA)
248 |
249 | - Rate of Change (ROC)
250 |
251 | - Percentage Price Oscillator (PPO)
252 |
253 | - Percentage Volume Oscillator (PVO)
254 |
255 | #### Others
256 |
257 | - Daily Return (DR)
258 |
259 | - Daily Log Return (DLR)
260 |
261 | - Cumulative Return (CR)
262 |
263 | ### installation
264 |
265 | ```Bash
266 | pip install --upgrade ta
267 | ```
268 |
269 |
270 | ### document
271 |
272 | [https://github.com/bukosabino/ta](https://github.com/bukosabino/ta)
273 |
274 | [https://technical-analysis-library-in-python.readthedocs.io/en/latest/](https://technical-analysis-library-in-python.readthedocs.io/en/latest/)
275 |
276 |
277 | ## **TA-Lib**
278 |
279 | ### Description
280 |
281 | 用于计算金融时间序列的各项指标。
282 |
283 | ```Bash
284 | from talib import abstract
285 |
286 | # directly
287 | SMA = abstract.SMA
288 |
289 | # or by name
290 | SMA = abstract.Function('sma')
291 | ```
292 |
293 |
294 | ### installation
295 |
296 | 安装问题参考:
297 |
298 | [https://github.com/mrjbq7/ta-lib](https://github.com/mrjbq7/ta-lib)
299 |
300 | ```Bash
301 | You can install from PyPI:
302 |
303 | $ pip install TA-Lib
304 | Or checkout the sources and run `setup.py` yourself:
305 |
306 | $ python [setup.py](http://setup.py) install
307 |
308 |
309 | It also appears possible to install via [Conda Forge](https://anaconda.org/conda-forge/ta-lib):
310 |
311 | $ conda install -c conda-forge ta-lib
312 | ```
313 |
314 |
315 | ### document
316 |
317 | [https://github.com/mrjbq7/ta-lib](https://github.com/mrjbq7/ta-lib)
318 |
319 |
320 | ## **stumpy**
321 |
322 | ### Description
323 |
324 | 用于构造时间序列的特征,能够为某一个特定的时间序列构造特征向量。
325 |
326 | 特征向量含义:[https://stumpy.readthedocs.io/en/latest/Tutorial_The_Matrix_Profile.html](https://stumpy.readthedocs.io/en/latest/Tutorial_The_Matrix_Profile.html)
327 |
328 | ### installation
329 |
330 | ```Bash
331 | python -m pip install stumpy
332 | ```
333 |
334 |
335 | ```Bash
336 | conda install -c conda-forge stumpy
337 | ```
338 |
339 |
340 | ### document
341 |
342 | [https://github.com/TDAmeritrade/stumpy](https://github.com/TDAmeritrade/stumpy)
343 |
344 | [https://stumpy.readthedocs.io/en/latest/Tutorial_The_Matrix_Profile.html](https://stumpy.readthedocs.io/en/latest/Tutorial_The_Matrix_Profile.html)
345 |
346 |
347 | ## **statsmodels **
348 |
349 | ### **Description**
350 |
351 | 基于scipy,传统的时间序列模型,模型很全
352 |
353 | > Contains a submodule for classical time series models and hypothesis tests
354 |
355 |
356 | ### installation
357 |
358 | [https://www.statsmodels.org/dev/install.html](https://www.statsmodels.org/dev/install.html)
359 |
360 | ```Bash
361 | pip install statsmodels
362 | ```
363 |
364 |
365 | ```Bash
366 | conda install -c conda-forge statsmodels
367 | ```
368 |
369 |
370 | ### document
371 |
372 | [https://www.statsmodels.org/devel/](https://www.statsmodels.org/devel/)
373 |
374 | [https://github.com/statsmodels/statsmodels#documentation](https://github.com/statsmodels/statsmodels#documentation)
375 |
376 |
377 | ## **PyTorch_Forecasting**
378 |
379 | ### Description
380 |
381 | 将时间序列的SOTA模型基于pytorch实现
382 |
383 | > *PyTorch Forecasting* is a PyTorch-based package for forecasting time series with state-of-the-art network architectures. It provides **a high-level API for training networks** on pandas data frames and leverages [PyTorch Lightning](https://pytorch-lightning.readthedocs.io/) for scalable training on (multiple) GPUs, CPUs and for automatic logging.
384 |
385 |
386 | #### 适用模型:
387 |
388 | [https://github.com/jdb78/pytorch-forecasting#available-models](https://github.com/jdb78/pytorch-forecasting#available-models)
389 |
390 | |Name|Covariates|Multiple targets|Regression|Classification|Probabilistic|Uncertainty|Interactions between series|Flexible history length|Cold-start|Required computational resources (1-5, 5=most)|
391 | |---|---|---|---|---|---|---|---|---|---|---|
392 | |[RecurrentNetwork](https://pytorch-forecasting.readthedocs.io/en/latest/api/pytorch_forecasting.models.rnn.RecurrentNetwork.html#pytorch_forecasting.models.rnn.RecurrentNetwork)|x|x|x| | | | |x| |2|
393 | |[DecoderMLP](https://pytorch-forecasting.readthedocs.io/en/latest/api/pytorch_forecasting.models.mlp.DecoderMLP.html#pytorch_forecasting.models.mlp.DecoderMLP)|x|x|x|x| |x| |x|x|1|
394 | |[NBeats](https://pytorch-forecasting.readthedocs.io/en/latest/api/pytorch_forecasting.models.nbeats.NBeats.html#pytorch_forecasting.models.nbeats.NBeats)| | |x| | | | | | |1|
395 | |[DeepAR](https://pytorch-forecasting.readthedocs.io/en/latest/api/pytorch_forecasting.models.deepar.DeepAR.html#pytorch_forecasting.models.deepar.DeepAR)|x|x|x| |x|x| |x| |3|
396 | |[TemporalFusionTransformer](https://pytorch-forecasting.readthedocs.io/en/latest/api/pytorch_forecasting.models.temporal_fusion_transformer.TemporalFusionTransformer.html#pytorch_forecasting.models.temporal_fusion_transformer.TemporalFusionTransformer)|x|x|x|x| |x| |x|x|4|
397 |
398 |
399 |
400 | ### installation
401 |
402 | ```Bash
403 | pip install pytorch-forecasting
404 | # 或者
405 | conda install pytorch-forecasting pytorch -c pytorch>=1.7 -c conda-forge
406 |
407 | ```
408 |
409 |
410 | ### document
411 |
412 | [https://github.com/jdb78/pytorch-forecasting](https://github.com/jdb78/pytorch-forecasting)
413 |
414 | [https://pytorch-forecasting.readthedocs.io/en/latest/](https://pytorch-forecasting.readthedocs.io/en/latest/)
415 |
416 |
417 |
418 | ## **pandas-ta**
419 |
420 | ### Description
421 |
422 | 基于**TA Lib** 的封装,金融时间序列的技术指标,比如macd等。
423 |
424 | > An easy to use Python 3 Pandas Extension with 130+ Technical Analysis Indicators
425 |
426 |
427 | ### installation
428 |
429 | pip install pandas_ta
430 |
431 | 或者最新版
432 |
433 | pip install -U git+https://github.com/twopirllc/pandas-ta
434 |
435 | ### document
436 |
437 | [https://github.com/twopirllc/pandas-ta](https://github.com/twopirllc/pandas-ta)
438 |
439 | 指标说明:
440 |
441 | [https://github.com/twopirllc/pandas-ta#indicators-by-category](https://github.com/twopirllc/pandas-ta#indicators-by-category)
442 |
443 |
444 | ## **Flow_Forecast**
445 |
446 | ### Description
447 |
448 | 基本都是深度学习模型,基于pytorch
449 |
450 | > Flow Forecast is a deep learning for time series **forecasting** , **classification** , and **anomaly detection** framework built in **PyTorch**
451 |
452 |
453 |
454 | > **Models currently supported**
455 |
456 | 1. Vanilla **LSTM** : A basic LSTM that is suitable for multivariate time series forecasting and transfer learning.
457 | 2. Full **transformer** : The full original transformer with all 8 encoder and decoder blocks. Requires passing the target in at inference.
458 | 3. Simple** Multi-Head Attention** : A simple multi-head attention block and linear embedding layers. Suitable for transfer learning.
459 | 4. Transformer with a linear decoder: A transformer with n-encoder blocks (this is tunable) and a linear decoder.
460 | 5. DA-RNN: A well rounded model with which utilizes a LSTM + attention.
461 | 6. [Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting](https://arxiv.org/abs/1907.00235):
462 | 7. [Transformer XL](https://arxiv.org/abs/1901.02860)**:**
463 | 8. [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436)
464 | 9. [DeepAR](https://arxiv.org/abs/1704.04110)
465 |
466 |
467 | ### installation
468 |
469 | pip install flood-forecast
470 |
471 | ### document
472 |
473 | [https://github.com/AIStream-Peelout/flow-forecast](https://github.com/AIStream-Peelout/flow-forecast)
474 |
475 | [https://flow-forecast.atlassian.net/wiki/spaces/FF/overview](https://flow-forecast.atlassian.net/wiki/spaces/FF/overview)
476 |
477 | [https://flow-forecast.atlassian.net/wiki/spaces/FF/pages/364019713/Training+Models](https://flow-forecast.atlassian.net/wiki/spaces/FF/pages/364019713/Training+Models)
478 |
479 |
480 | ## **AutoGluon**
481 |
482 | ### Description
483 |
484 | AutoML的包,用于自动化的学习,包括文本、图像、表格数据;
485 |
486 | ### 示例
487 |
488 | ```Python
489 | from autogluon.tabular import TabularDataset, TabularPredictor
490 | train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
491 | test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
492 | predictor = TabularPredictor(label='class').fit(train_data, time_limit=120) # Fit models for 120s
493 | leaderboard = predictor.leaderboard(test_data)
494 | ```
495 |
496 |
497 | ### installation
498 |
499 | ```Bash
500 | # First install package from terminal:
501 | python3 -m pip install -U pip
502 | python3 -m pip install -U setuptools wheel
503 | python3 -m pip install -U "mxnet<2.0.0"
504 | python3 -m pip install autogluon # autogluon==0.3.1
505 | ```
506 |
507 |
508 | ### document
509 |
510 | [https://github.com/awslabs/autogluon](https://github.com/awslabs/autogluon)
511 |
512 | [https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-quickstart.html](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-quickstart.html)
513 |
514 |
515 | ## **GENDIS**
516 |
517 | ### Description
518 |
519 | 能够自动识别出shaplet(时间序列的子序列),并将与各个序列的距离作为特征,根据特征进行序列的分类处理。
520 |
521 | ### installation
522 |
523 | [https://github.com/IBCNServices/GENDIS#installation](https://github.com/IBCNServices/GENDIS#installation)
524 |
525 | ### document
526 |
527 | [https://github.com/IBCNServices/GENDIS](https://github.com/IBCNServices/GENDIS)
528 |
529 |
530 | ## **gluon-ts**
531 |
532 | ### Description
533 |
534 | 有**较多深度学习模型** 的模块,亚马逊的包,基于MXNET。
535 |
536 | 模型参考:[https://github.com/awslabs/gluon-ts/tree/master/src/gluonts/model](https://github.com/awslabs/gluon-ts/tree/master/src/gluonts/model)
537 |
538 | ### installation
539 |
540 | ```Bash
541 | pip install --upgrade mxnet~=1.7 gluonts
542 | ```
543 |
544 |
545 | ### document
546 |
547 | [https://github.com/awslabs/gluon-ts](https://github.com/awslabs/gluon-ts)
548 |
549 | [https://github.com/awslabs/gluon-ts/tree/master/src/gluonts](https://github.com/awslabs/gluon-ts/tree/master/src/gluonts)
550 |
551 |
552 | ## **kats**
553 |
554 | ### Description
555 |
556 | facebook轻量级,可以工业使用的框架,支持模型SARIMA,Prophet,Holt-Winters
557 |
558 | 支持模型:[https://github.com/facebookresearch/Kats/tree/main/kats/models](https://github.com/facebookresearch/Kats/tree/main/kats/models)
559 |
560 | > Kats is a toolkit to analyze time series data, a **lightweight** , easy-to-use, and generalizable framework to perform time series analysis. Time series analysis is an essential component of Data Science and Engineering **work at industry, ** from understanding the key statistics and characteristics, detecting regressions and anomalies, to forecasting future trends. Kats aims to provide the **one-stop shop for time series analysis** , including **detection, forecasting, feature extraction/embedding** , multivariate analysis, etc.
561 |
562 |
563 |
564 | > ## Important links
565 |
566 | * Homepage: [https://facebookresearch.github.io/Kats/](https://facebookresearch.github.io/Kats/)
567 | * Kats Python package: [https://pypi.org/project/kats/0.1.0/](https://pypi.org/project/kats/0.1.0/)
568 | * Facebook Engineering Blog Post: [https://engineering.fb.com/2021/06/21/open-source/kats/](https://engineering.fb.com/2021/06/21/open-source/kats/)
569 | * Source code repository: [https://github.com/facebookresearch/kats](https://github.com/facebookresearch/kats)
570 | * Contributing: [https://github.com/facebookresearch/Kats/blob/master/CONTRIBUTING.md](https://github.com/facebookresearch/Kats/blob/master/CONTRIBUTING.md)
571 | * Tutorials: [https://github.com/facebookresearch/Kats/tree/master/tutorials](https://github.com/facebookresearch/Kats/tree/master/tutorials)
572 |
573 |
574 | ### installation
575 |
576 | ```Bash
577 | pip install --upgrade pip
578 | pip install kats
579 | ```
580 |
581 |
582 | If you need only a small subset of Kats, you can install a minimal version of Kats with
583 |
584 | ```Bash
585 | MINIMAL_KATS=1 pip install kats
586 | ```
587 |
588 |
589 | ### document
590 |
591 |
592 |
593 | ## **AtsPy**
594 |
595 | ### Description
596 |
597 | 自动化实现多个模型。
598 |
599 | > Easily **develop state of the art time series models** to forecast univariate data series. Simply load your data and select which models you want to test. This is the largest repository of **automated structural and machine learning time series models** . Please get in contact if you want to contribute a model. This is a fledgling project, all advice appreciated.
600 |
601 |
602 | #### Automated Models
603 |
604 | 1. `ARIMA` - Automated ARIMA Modelling
605 | 2. `Prophet` - Modeling Multiple Seasonality With Linear or Non-linear Growth
606 | 3. `HWAAS` - Exponential Smoothing With Additive Trend and Additive Seasonality
607 | 4. `HWAMS` - Exponential Smoothing with Additive Trend and Multiplicative Seasonality
608 | 5. `NBEATS` - Neural basis expansion analysis (now fixed at 20 Epochs)
609 | 6. `Gluonts` - RNN-based Model (now fixed at 20 Epochs)
610 | 7. `TATS` - Seasonal and Trend no Box Cox
611 | 8. `TBAT` - Trend and Box Cox
612 | 9. `TBATS1` - Trend, Seasonal (one), and Box Cox
613 | 10. `TBATP1` - TBATS1 but Seasonal Inference is Hardcoded by Periodicity
614 | 11. `TBATS2` - TBATS1 With Two Seasonal Periods
615 |
616 | ### installation
617 |
618 | ```Bash
619 | pip install atspy
620 | ```
621 |
622 |
623 | ### document
624 |
625 | [https://github.com/firmai/atspy](https://github.com/firmai/atspy)
626 |
627 |
628 | ## **Prophet**
629 |
630 | ### Description
631 |
632 | facebook的开源时间序列处理,适用于**有季节性** 的数据。
633 |
634 | > It works best with time series that have strong seasonal effects and several seasons of historical data. (github文档说明)
635 |
636 |
637 | > Prophet是Facebook研究团队开发的知名时间序列软件包,于2017年首次发布,适用于具有**强烈季节性影响** 的数据和**多个季节** 的历史数据。它具有高度的用户友好性和可定制性,只需进行最少的设置。
638 |
639 |
640 | ```Bash
641 | # Loading the library
642 | import pandas as pd
643 | import matplotlib.pyplot as plt
644 | from fbprophet import Prophet
645 |
646 |
647 | # Loading the data from the repo:
648 | df = pd.read_csv("https://raw.githubusercontent.com/facebook/prophet/master/examples/example_wp_log_peyton_manning.csv")
649 |
650 | # Fitting the model
651 | model = Prophet()
652 | model.fit(df) #fit the model.
653 |
654 | # Predict
655 | future = model.make_future_dataframe(periods=730) # predicting for ~ 2 years
656 | forecast = model.predict(future) # Predict future
657 |
658 | # Plot results
659 | fig1 = model.plot(forecast) # Plot the fit to past data and future forcast.
660 | fig2 = model.plot_components(forecast) # Plot breakdown of components.
661 | plt.show()
662 | forecast # Displaying various results in table format.
663 | ```
664 |
665 |
666 | ### installation
667 |
668 | pip install prophet
669 |
670 | ### document
671 |
672 | [https://github.com/facebook/prophet](https://github.com/facebook/prophet)
673 |
674 | [https://machinelearningmastery.com/time-series-forecasting-with-prophet-in-python/](https://machinelearningmastery.com/time-series-forecasting-with-prophet-in-python/)
675 |
676 | [https://facebook.github.io/prophet/](https://facebook.github.io/prophet/)
677 |
678 | [https://pypi.org/project/prophet/](https://pypi.org/project/prophet/)
679 |
680 | ## **AutoT** S
681 |
682 | ### Description
683 |
684 | AutoTS 是一个自动化的时间序列预测库,可以使用简单的代码训练多个时间序列模型,此库的一些最佳功能包括:
685 |
686 | - 利用**遗传规划** 优化方法寻找最优时间序列预测模型。
687 |
688 | - 提供**置信区间** 预测值的下限和上限。
689 |
690 | - 它**训练各种各样的模型** ,如统计的,机器学习以及深度学习模型
691 |
692 | - 它还可以执行**最佳模型的自动集成**
693 |
694 | - 它还可以通过学习**最优NaN插补和异常值去除** 来处理混乱的数据
695 |
696 | - 它可以运行**单变量和多变量** 时间序列
697 |
698 |
699 | ```Python
700 | # also: _hourly, _daily, _weekly, or _yearly
701 | from autots.datasets import load_monthly
702 |
703 | df_long = load_monthly(long=True)
704 |
705 | from autots import AutoTS
706 |
707 | model = AutoTS(
708 | forecast_length=3,
709 | frequency='infer',
710 | ensemble='simple',
711 | max_generations=5,
712 | num_validations=2,
713 | )
714 | model = model.fit(df_long, date_col='datetime', value_col='value', id_col='series_id')
715 |
716 | # Print the name of the best model
717 | print(model)
718 | ```
719 |
720 |
721 | ### installation
722 |
723 | ```Bash
724 | pip install autots
725 | ```
726 |
727 |
728 | ### document
729 |
730 | [https://winedarksea.github.io/AutoTS/build/html/source/tutorial.html](https://winedarksea.github.io/AutoTS/build/html/source/tutorial.html)
731 |
732 | [https://github.com/winedarksea/AutoTS](https://github.com/winedarksea/AutoTS)
733 |
734 |
735 | ## **Orbit**
736 |
737 | ### Description
738 |
739 | Uber开发的时间序列预测包,使用贝叶斯方法,比较特别
740 |
741 | #### 目前支持模型
742 |
743 | > Currently, it supports concrete implementations for the **following models** :
744 | - Exponential Smoothing (ETS)
745 | - Local Global Trend (LGT)
746 | - Damped Local Trend (DLT)
747 | - Kernel Time-based Regression (KTR)
748 |
749 |
750 | #### 支持优化方法
751 |
752 | > It also supports the following **sampling/optimization methods ** for model estimation/inferences:
753 |
754 | Markov-Chain Monte Carlo (MCMC) as a full sampling method
755 | Maximum a Posteriori (MAP) as a point estimate method
756 | Variational Inference (VI) as a hybrid-sampling method on approximate distribution
757 |
758 |
759 | ### installation
760 |
761 | ```Bash
762 | pip install orbit-ml
763 | ```
764 |
765 |
766 | ### document
767 |
768 | For details, check out our documentation and tutorials:
769 |
770 | * HTML (stable): [https://orbit-ml.readthedocs.io/en/stable/](https://orbit-ml.readthedocs.io/en/stable/)
771 | * HTML (latest): [https://orbit-ml.readthedocs.io/en/latest/](https://orbit-ml.readthedocs.io/en/latest/)
772 |
773 |
774 | ## **Darts**
775 |
776 | ### Description
777 |
778 | 传统模型到深度学习模型都有。支持多元时间序列。
779 |
780 | ### 支持的模型类型:
781 |
782 | |Model|Univariate|Multivariate|Probabilistic|Multiple-series training|Past-observed covariates support|Future-known covariates support|Reference|
783 | |---|---|---|---|---|---|---|---|
784 | |`ARIMA`|x| |x| | | | |
785 | |`VARIMA`|x|x| | | | | |
786 | |`AutoARIMA`|x| | | | | | |
787 | |`ExponentialSmoothing`|x| |x| | | | |
788 | |`Theta` and `FourTheta`|x| | | | | |[Theta](https://robjhyndman.com/papers/Theta.pdf) & [4 Theta](https://github.com/Mcompetitions/M4-methods/blob/master/4Theta method.R)|
789 | |`Prophet`|x| |x| | |x|[Prophet repo](https://github.com/facebook/prophet)|
790 | |`FFT` (Fast Fourier Transform)|x| | | | | | |
791 | |`RegressionModel` (incl `RandomForest`, `LinearRegressionModel` and `LightGBMModel`)|x|x| |x|x|x| |
792 | |`RNNModel` (incl. LSTM and GRU); equivalent to DeepAR in its probabilistic version|x|x|x|x| |x|[DeepAR paper](https://arxiv.org/abs/1704.04110)|
793 | |`BlockRNNModel` (incl. LSTM and GRU)|x|x| |x|x| | |
794 | |`NBEATSModel`|x|x| |x|x| |[N-BEATS paper](https://arxiv.org/abs/1905.10437)|
795 | |`TCNModel`|x|x|x|x|x| |[TCN paper](https://arxiv.org/abs/1803.01271), [DeepTCN paper](https://arxiv.org/abs/1906.04397), [blog post](https://medium.com/unit8-machine-learning-publication/temporal-convolutional-networks-and-forecasting-5ce1b6e97ce4)|
796 | |`TransformerModel`|x|x| |x|x| | |
797 | |Naive Baselines|x| | | | | | |
798 |
799 |
800 |
801 |
802 | ### 项目features
803 |
804 | > **Forecasting Models:** A large collection of forecasting models; from statistical models (such as ARIMA) to deep learning models (such as N-BEATS). See table of models below.
805 |
806 | **Data processing:** Tools to easily apply (and revert) common transformations on time series data (scaling, boxcox, …)
807 |
808 | **Metrics:** A variety of metrics for evaluating time series' goodness of fit; from R2-scores to Mean Absolute Scaled Error.
809 |
810 | **Backtesting:** Utilities for simulating historical forecasts, using moving time windows.
811 |
812 | **Regression Models:** Possibility to predict a time series from lagged versions of itself and of some external covariate series, using arbitrary regression models (e.g. scikit-learn models).
813 |
814 | **Multiple series training:** All neural networks, as well as `RegressionModel`s (incl. `LinearRegressionModel` and `RandomForest`) support being trained on multiple series.
815 |
816 | **Past and Future Covariates support:** Some models support past-observed and/or future-known covariate time series as inputs for producing forecasts.
817 |
818 | **Multivariate Support:** Tools to create, manipulate and forecast multivariate time series.
819 |
820 | **Probabilistic Support:** `TimeSeries` objects can (optionally) represent stochastic time series; this can for instance be used to get confidence intervals.
821 |
822 | **Filtering Models:** Darts offers three filtering models: `KalmanFilter`, `GaussianProcessFilter`, and `MovingAverage`, which allow to filter time series, and in some cases obtain probabilistic inferences of the underlying states/values.
823 |
824 |
825 | ### installation
826 |
827 | pip install darts
828 |
829 | ### document
830 |
831 | [https://analyticsindiamag.com/hands-on-guide-to-darts-a-python-tool-for-time-series-forecasting/](https://analyticsindiamag.com/hands-on-guide-to-darts-a-python-tool-for-time-series-forecasting/)
832 |
833 | [https://github.com/unit8co/darts](https://github.com/unit8co/darts)
834 |
835 |
836 | ## **Sktime**
837 |
838 | ### Description
839 |
840 | 类sklearn的时间序列处理包
841 |
842 | > **About:** [Sktime](https://analyticsindiamag.com/sktime-library/) is a unified python framework that provides API for machine learning with time series data. The framework also provides scikit-learn compatible tools to build, tune and validate time series models for multiple learning problems, including time series classification, time series regression and forecasting.
843 |
844 |
845 | ```Bash
846 | from sktime.datasets import load_airline
847 | from sktime.forecasting.base import ForecastingHorizon
848 | from sktime.forecasting.model_selection import temporal_train_test_split
849 | from sktime.forecasting.theta import ThetaForecaster
850 | from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
851 |
852 | y = load_airline()
853 | y_train, y_test = temporal_train_test_split(y)
854 | fh = ForecastingHorizon(y_test.index, is_relative=False)
855 | forecaster = ThetaForecaster(sp=12) # monthly seasonal periodicity
856 | forecaster.fit(y_train)
857 | y_pred = forecaster.predict(fh)
858 | mean_absolute_percentage_error(y_test, y_pred)
859 | >>> 0.08661467738190656
860 |
861 | ```
862 |
863 |
864 | ### installation
865 |
866 | pip
867 |
868 | ```Markdown
869 | pip install sktime
870 |
871 | ```
872 |
873 |
874 | 安装额外依赖
875 |
876 | ```Bash
877 | pip install sktime[all_extras]
878 | ```
879 |
880 |
881 |
882 | conda安装
883 |
884 | ```Bash
885 | conda install -c conda-forge sktime
886 | ```
887 |
888 |
889 | 安装依赖
890 |
891 | ```Bash
892 | conda install -c conda-forge sktime-all-extras
893 | ```
894 |
895 |
896 | ### document
897 |
898 | [https://www.sktime.org/en/latest/api_reference.html](https://www.sktime.org/en/latest/api_reference.html)
899 |
900 | [https://github.com/alan-turing-institute/sktime](https://github.com/alan-turing-institute/sktime)
901 |
902 | ## **TimeSynth**
903 |
904 | ### Description
905 |
906 | 可以用于生成时间序列的模拟数据
907 |
908 | #### Signal Types
909 |
910 | - Harmonic functions(**sin, cos or custom functions** )
911 |
912 | - Gaussian processes with different kernels
913 |
914 | - Constant
915 |
916 | - Squared exponential
917 |
918 | - Exponential
919 |
920 | - Rational quadratic
921 |
922 | - Linear
923 |
924 | - Matern
925 |
926 | - Periodic
927 |
928 | - Pseudoperiodic signals
929 |
930 | - Autoregressive(p) process
931 |
932 | - Continuous autoregressive process (CAR)
933 |
934 | - Nonlinear Autoregressive Moving Average model (NARMA)
935 |
936 | ### installation
937 |
938 | ```Bash
939 | git clone https://github.com/TimeSynth/TimeSynth.git
940 | cd TimeSynth
941 | python setup.py install
942 | ```
943 |
944 |
945 | ### document
946 |
947 | [https://github.com/TimeSynth/TimeSynth](https://github.com/TimeSynth/TimeSynth)
948 |
949 |
950 | ## **PyFlux**
951 |
952 | ### Description
953 |
954 | 提供传统的时间序列方法
955 |
956 | > **About:** [PyFlux](https://analyticsindiamag.com/pyflux-guide-python-library-for-time-series-analysis-and-prediction/) is an open source library for time series analysis and prediction. In this library, users can choose from **a flexible range of modelling** and** inference options** , and use the output for forecasting and retrospection. The library allows for a probabilistic approach to time series modelling. The latest release version of PyFlux is available on PyPi. Python 2.7 and Python 3.5 are supported, but development occurs primarily on 3.5.
957 |
958 |
959 | * [ARIMA models](http://pyflux.readthedocs.io/en/latest/arima.html)
960 | * [ARIMAX models](http://pyflux.readthedocs.io/en/latest/arimax.html)
961 | * [Dynamic Autoregression models](http://pyflux.readthedocs.io/en/latest/docs/dar.html)
962 | * [Dynamic Paired Comparison models](http://pyflux.readthedocs.io/en/latest/gas_rank.html)
963 | * [GARCH models](http://pyflux.readthedocs.io/en/latest/garch.html)
964 | * [Beta-t-EGARCH models](http://pyflux.readthedocs.io/en/latest/egarch.html)
965 | * [EGARCH-in-mean models](http://pyflux.readthedocs.io/en/latest/egarchm.html)
966 | * [EGARCH-in-mean regression models](http://pyflux.readthedocs.io/en/latest/egarchmreg.html)
967 | * [Long Memory EGARCH models](http://pyflux.readthedocs.io/en/latest/lmegarch.html)
968 | * [Skew-t-EGARCH models](http://pyflux.readthedocs.io/en/latest/segarch.html)
969 | * [Skew-t-EGARCH-in-mean models](http://pyflux.readthedocs.io/en/latest/segarchm.html)
970 | * [GAS models](http://pyflux.readthedocs.io/en/latest/gas.html)
971 | * [GASX models](http://pyflux.readthedocs.io/en/latest/gasx.html)
972 | * [GAS State Space models](http://pyflux.readthedocs.io/en/latest/gas_llm.html)
973 | * [Gaussian State Space models](http://pyflux.readthedocs.io/en/latest/llm.html)
974 | * [Non-Gaussian State Space models](http://pyflux.readthedocs.io/en/latest/nllm.html)
975 | * [VAR models](http://pyflux.readthedocs.io/en/latest/var.html)
976 |
977 | ### installation
978 |
979 | ```Bash
980 | pip install pyflux
981 | ```
982 |
983 |
984 | ### document
985 |
986 | [https://github.com/RJT1990/pyflux](https://github.com/RJT1990/pyflux)
987 |
988 |
989 | ## **TSFRESH**
990 |
991 | ### Description
992 |
993 | > **About:** TSFRESH or Time Series Feature extraction based on scalable hypothesis tests is a Python package with various feature extraction methods and a robust feature selection algorithm. The package automatically calculates a large number of time series characteristics and contains methods to evaluate the explaining power and importance of such characteristics for regression or classification tasks. Advantages include:
994 | * It is compatible with sklearn, pandas and numpy
995 | * It allows anyone to easily add their favorite features
996 | * It both runs on the local machine or even on a cluster
997 |
998 |
999 | ### installation
1000 |
1001 | ```Bash
1002 | pip install tsfresh
1003 | ```
1004 |
1005 |
1006 | ```Bash
1007 | docker pull nbraun/tsfresh
1008 | ```
1009 |
1010 |
1011 | ### document
1012 |
1013 | [https://github.com/blue-yonder/tsfresh](https://github.com/blue-yonder/tsfresh)
1014 |
1015 |
1016 | ## **Featuretools**
1017 |
1018 | ### Description
1019 |
1020 | 时间序列相关的自动化特征工程。
1021 |
1022 | > **About:** Featuretools is an open source Python library for automated feature engineering. The framework excels at transforming temporal and relational datasets into feature matrices for machine learning. Featuretools references generated features through the feature name. In order to make features easier to understand, Featuretools offers two additional tools, featuretools.graph_feature() and featuretools.describe_feature(), to help explain what a feature is and the steps Featuretools took to generate it.
1023 |
1024 |
1025 | ### installation
1026 |
1027 | pip install featuretools
1028 |
1029 | [https://featuretools.alteryx.com/en/stable/install.html](https://featuretools.alteryx.com/en/stable/install.html)
1030 |
1031 | ### document
1032 |
1033 | [https://featuretools.alteryx.com/en/stable/getting_started/getting_started_index.html](https://featuretools.alteryx.com/en/stable/getting_started/getting_started_index.html)
1034 |
1035 | [https://github.com/alteryx/featuretools](https://github.com/alteryx/featuretools)
1036 |
1037 |
1038 | ## **Arrow**
1039 |
1040 | ### Description
1041 |
1042 | > **Arrow** is a Python library that offers a sensible and human-friendly approach to creating, manipulating, formatting and converting dates, times and timestamps
1043 |
1044 |
1045 | 用于时间的格式转换,可以转换为便于人阅读的格式。
1046 |
1047 | ### installation
1048 |
1049 | pip install -U arrow
1050 |
1051 | ### document
1052 |
1053 | [https://github.com/arrow-py/arrow](https://github.com/arrow-py/arrow)
1054 |
1055 | ```Python
1056 | >>> utc = utc.shift(hours=-1)
1057 | >>> utc
1058 |
1059 |
1060 | >>> local = utc.to('US/Pacific')
1061 | >>> local
1062 |
1063 |
1064 | >>> local.timestamp()
1065 | 1368303838.970460
1066 |
1067 | >>> local.format()
1068 | '2013-05-11 13:23:58 -07:00'
1069 |
1070 | >>> local.format('YYYY-MM-DD HH:mm:ss ZZ')
1071 | '2013-05-11 13:23:58 -07:00'
1072 |
1073 | >>> local.humanize()
1074 | 'an hour ago'
1075 |
1076 | >>> local.humanize(locale='ko-kr')
1077 | '한시간 전'
1078 | ```
1079 |
1080 |
1081 |
1082 | ithub: [https://github.com/winedarksea/AutoTS](https://github.com/winedarksea/AutoTS)
1083 |
1084 | pip install --upgrade pip
1085 | pip install kats
1086 |
1087 |
--------------------------------------------------------------------------------
/zhou_zhihua_minmap/机器学习——周志华.mmap:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cjn-chen/machine_learn_reading_notes/9a15e5c02008308387d6c6ee9f3e6ee6c98be746/zhou_zhihua_minmap/机器学习——周志华.mmap
--------------------------------------------------------------------------------